CUDA learning(1)–start from hello-world

No Comments

learn from

first comes some concepts:

  • Heterogeneous Computing
  • Blocks
  • Threads
  • Indexing
  • Shared memory
  • __syncthreads()
  • Asynchronous operation
  • Handling errors
  • Managing devices

Now we have to understand what “Heterogeneous Computing” is.From the 8th page,we can easily realize that it contains device code,host code,parallel code and serial code.And the next content is more intuitive.Data is around CPU and GPU,and codes are execurated in GPU.


(ps: GigaThread™ :this engine provides up to 10x faster context switching compared to previous generation architectures, concurrent kernel execution, and improved thread block scheduling.)

so we add the device code,then the “hello world” program looks like this :

#include //do not forget this!
__global__ void mykernel(void) {


int main(void)



printf("Hello World!\n");

return 0;


__global__: cuda C/C++ keyword,indicates:

  • Runs on the device
  • Is called from host code

and nvcc separates source code into host and device components(where’s comment nvcc?)

Device functions (e.g. mykernel()) processed by NVIDIA compiler and Host functions (e.g. main()) processed by standard host compiler

(ps:sorry to tell you that for archlinux users, you can install nvidia toolkit by “pacman -S cuda

and maybe you have to restart to use nvcc)

function mykernel does nothing here,and we’ll tell what “<<<1,1>>>” does in a moment.

now I have to accomplish my parallelism learning that was left before:

reference book: CSAPP

(ps: found nothing worth to write now ,maybe I’ll write something in subsequent sections)



Categories: Programming Tags: 标签:


邮箱地址不会被公开。 必填项已用*标注