cudaGetExportTable(CUDA运行时库)中抛出的异常'cudaError

编程入门 行业动态 更新时间:2024-10-24 02:39:15
本文介绍了cudaGetExportTable(CUDA运行时库)中抛出的异常'cudaError_enum'?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧! 问题描述

我使用DDT调试基于MPI的CUDA程序。当从 cudaMalloc 和 cudaThreadSynchronize (UPDATED:使用 cudaDeviceSynchronize 给出相同的错误)。

为什么libcudart抛出一个异常(我使用的是C API,而不是C ++ API),然后我可以通过它的 cudaError_t 返回值或 CHECKCUDAERROR ?

(我使用CUDA 4.2 SDK for Linux) p>

输出:

过程9: cudaError_enum'过程9:终止调用 过程20:抛出'cudaError'的实例后调用终止过程20:递归调用

我的代码:

cudaThreadSynchronize (); CHECKCUDAERROR(cudaThreadSynchronize());

其他代码片段:

const size_t t; //从参数到函数 void * p = NULL; const cudaError_t r = cudaMalloc(& p,t); if(r!= cudaSuccess){ ERROR(cudaMalloc failed。); }

部分回溯:

过程9: cudaDeviceSynchronize() - > cudaGetExportTable() - > __cxa_throw 过程20: cudaMalloc() - > cudaGetExportTable() - > cudaGetExportTable() - > __cxa_throw

内存调试错误:

进程0,2,4,6,9,15-17,20-21:在Malloc_cuda_gx(cudamalloc.cu:35)中检测到内存错误: dmalloc bad管理结构列表。

此行是上面显示的cudaMalloc代码片段。另外:

过程1,3,5,10-11,13-14,18-19,23:在vfprintf中检测到来自/lib64/libc.so.6的内存错误: dmalloc bad admin结构列表。

此外,当在每个节点上运行3个核心/ gpus而不是每个节点4个gpus时,dmalloc检测到类似内存错误,但是当没有在调试模式下,代码运行完全正常每个节点3 gpus(就我所知)。

解决方案

使用gcc重新编译。 (我使用icc编译我的代码。)

当这样做时,调试时出现异常,但继续经过它,我得到真正的CUDA错误:过程9:gadget_cuda_gx.cu:116:gadget_cuda_gx.cu:919中的错误:CUDA错误:cudaThreadSynchronize():未指定的启动失败$($)

b $ b进程20:cudamalloc.cu:38:错误所有支持CUDA的设备正忙或不可用,cudaMalloc无法分配856792字节= 0.817101 Mb

Valgrind显示我的代码没有内存损坏或泄漏(使用gcc或icc编译),但在libcudart中发现了一些泄漏。

更新:仍未修复。看起来是在回答#2中向此主题报告的相同问题: cudaMemset在__device__变量上失败。运行时不工作,因为它应该,看起来...

I am debugging a MPI-based CUDA program with DDT. My code aborts when the CUDA runtime library (libcudart) throws an exception in the (undocumented) function cudaGetExportTable, when called from cudaMalloc and cudaThreadSynchronize (UPDATED: using cudaDeviceSynchronize gives the same error) in my code.

Why is libcudart throwing an exception (I am using the C API, not the C++ API) before I can detect it in my code with its cudaError_t return value or with CHECKCUDAERROR?

(I'm using CUDA 4.2 SDK for Linux.)

Output:

Process 9: terminate called after throwing an instance of 'cudaError_enum' Process 9: terminate called recursively Process 20: terminate called after throwing an instance of 'cudaError' Process 20: terminate called recursively

My code:

cudaThreadSynchronize(); CHECKCUDAERROR("cudaThreadSynchronize()");

Other code fragment:

const size_t t; // from argument to function void* p=NULL; const cudaError_t r=cudaMalloc(&p, t); if (r!=cudaSuccess) { ERROR("cudaMalloc failed."); }

Partial Backtrace:

Process 9: cudaDeviceSynchronize() -> cudaGetExportTable() -> __cxa_throw Process 20: cudaMalloc() -> cudaGetExportTable() -> cudaGetExportTable() -> __cxa_throw

Memory debugging errors:

Processes 0,2,4,6-9,15-17,20-21: Memory error detected in Malloc_cuda_gx (cudamalloc.cu:35): dmalloc bad admin structure list.

This line is the cudaMalloc code fragment shown above. Also:

Processes 1,3,5,10-11,13-14,18-19,23: Memory error detected in vfprintf from /lib64/libc.so.6: dmalloc bad admin structure list.

Also, when running on 3 cores/gpus per node instead of 4 gpus per node, dmalloc detects similar memory errors, but when not in debug mode, the code runs perfectly fine with 3 gpus per node (as far as I can tell).

解决方案

Recompile with gcc. (I was using icc to compile my code.)

When you do this, the exception appears when debugging, but continuing past it, I get real CUDA errors:

Process 9: gadget_cuda_gx.cu:116: ERROR in gadget_cuda_gx.cu:919: CUDA ERROR: cudaThreadSynchronize(): unspecified launch failure Process 20: cudamalloc.cu:38: ERROR all CUDA-capable devices are busy or unavailable, cudaMalloc failed to allocate 856792 bytes = 0.817101 Mb

Valgrind reveals no memory corruption or leaks in my code (either compiling with gcc or icc), but does find a few leaks in libcudart.

UPDATE: Still not fixed. Appears to be the same problem reported in answer #2 to this thread: cudaMemset fails on __device__ variable. The runtime isn't working like it should, it seems...

更多推荐

cudaGetExportTable(CUDA运行时库)中抛出的异常'cudaError

本文发布于:2023-06-03 12:16:33,感谢您对本站的认可!
本文链接:https://www.elefans.com/category/jswz/34/473750.html
版权声明:本站内容均来自互联网,仅供演示用,请勿用于商业和其他非法用途。如果侵犯了您的权益请与我们联系,我们将在24小时内删除。
本文标签:抛出   异常   CUDA   cudaGetExportTable   cudaError

发布评论

评论列表 (有 0 条评论)
草根站长

>www.elefans.com

编程频道|电子爱好者 - 技术资讯及电子产品介绍!