从全局函数调用设备函数(Calling a device function from global function)

编程入门 行业动态 更新时间:2024-10-18 14:16:14
从全局函数调用设备函数(Calling a device function from global function)

我应该如何在'print'函数中使用'do_sth'函数(查看代码)? 为什么在没有使用cudaMemcpy的情况下,GPU可以看到'N'(查看代码)variable / constant?

__device__ void do_sth(char *a, int N) { int idx = blockIdx.x * blockDim.x + threadIdx.x; if(idx < N) { a[idx] = a[idx]; } } __global__ void print(char *a, int N) { //question_1: why there is an access to N, it is now in GPU memory, how? int idx = blockIdx.x * blockDim.x + threadIdx.x; //do_sth<<<nblock2,blocksize2>>>(a,N); //error_1: a host function call can not be configured //do_sth(&&a,N); //error_2: expected an expression if(idx<N) { a[idx]=a[idx]; } }

How should I acces 'do_sth' function in 'print' function (look at the code)? Why there is 'N' (look at the code) variable/constant visible to GPU without using cudaMemcpy?

__device__ void do_sth(char *a, int N) { int idx = blockIdx.x * blockDim.x + threadIdx.x; if(idx < N) { a[idx] = a[idx]; } } __global__ void print(char *a, int N) { //question_1: why there is an access to N, it is now in GPU memory, how? int idx = blockIdx.x * blockDim.x + threadIdx.x; //do_sth<<<nblock2,blocksize2>>>(a,N); //error_1: a host function call can not be configured //do_sth(&&a,N); //error_2: expected an expression if(idx<N) { a[idx]=a[idx]; } }

最满意答案

__global__函数(又名“内核”)已经驻留在GPU上。 所有参数(变量a和N )都会在调用时通过共享或常量内存(取决于您的设备类型),因此您可以直接访问这些变量。 参数大小的限制 - 费米前期卡上的256B和费米卡上的 16KB(?) 4KB,所以如果要传输大量数据,则无法避免使用cudaMemcpy函数。

__global__函数参数不应该被修改。

当从__global__调用__device__ ,您不会在三个括号中指定配置参数。 __device__函数将由内核调用的所有线程调用。 请注意,您可以从if语句中调用函数,以防止某些线程执行它。

在当前版本的CUDA中,内核执行期间不可能产生更多的线程。

在CUDA C ++中没有一个&&操作符(在正常的C ++中没有这样的操作符,当新标准出现时不能确定它)

__global__ function (aka "kernel") resides on the GPU already. All its parameters (variables a and N) are passed through shared or constant memory (depending on your device type) upon the call, so you can directly access those variables. There is a limit of parameters size - 256B on pre-Fermi cards and 16KB(?) 4KB on Fermi, so if you have big chunks of data to transfer, you cannot avoid cudaMemcpy functions.

__global__ function parameters should not be modified.

When calling __device__ from __global__ you do not specify the configuration parameters in the triple brackets. The __device__ function will be called by all threads that reach the call from the kernel. Note that you can call functions from within if statements, to prevent some threads from executing it.

In current version of CUDA it is impossible to spawn more threads during kernel execution.

There is no unary && operator in CUDA C++ (there was no such operator in normal C++, not sure about it now when the new standard emerges)

更多推荐

本文发布于:2023-07-23 18:38:00,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/1235649.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:函数   全局   设备   Calling   function

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!