cuda - Check failed: error == cudaSuccess (77 vs. 0) an illegal memory access was encountered -
i'm debugging lengthy code involves cuda operations. i' getting above mentioned error during call cudamemcpy(...,...,cudamemcpyhosttodevice) i'm not sure speficially related that. 
here code snippet:
    int num_elements = 8294400; // --> tried "1" here didn't work either!     float *checkarray = new float[num_elements];     float *checkarray_gpu;     cuda_check(cudamalloc(&checkarray_gpu, num_elements * sizeof(float)));     cuda_check(cudamemcpy(checkarray_gpu, checkarray, num_elements * sizeof(float), cudamemcpyhosttodevice));     cuda_check(cudamemcpy(checkarray, checkarray_gpu, num_elements * sizeof(float), cudamemcpydevicetohost));     where cuda_check macro printing cuda error (this part of existing code , works fine other cudamemcpy oder cudamalloc calls not part of problem). strangely code snippet executed separately in toy *.cu example works fine.
so assumption due previous cuda operations in program, there have been errors have not been reported cause bug in code snippet above. be? there way check if there unreported error involving cuda?
my other estimate might come specific graphic card i'm using. have nvidia titan x pascal, cuda 8.0 , cudnn v5.1. tried compile code using some special compiler flags
-arch=sm_30 \  -gencode=arch=compute_20,code=sm_20 \  -gencode=arch=compute_30,code=sm_30 \  -gencode=arch=compute_50,code=sm_50 \  -gencode=arch=compute_52,code=sm_52 \  -gencode=arch=compute_52,code=compute_52 \  -gencode=arch=compute_60,code=sm_60 \  -gencode=arch=compute_61,code=sm_61 \  -gencode=arch=compute_62,code=sm_62 \ but didn't far. here current simplified makefile completeness:
nvcc = nvcc cuda_inc = -i/usr/local/cuda/include  cuda_lib = -l/usr/local/cuda/lib64 target = myprogramm opts = -std=c++11 $(target).so: $(target).o     $(nvcc) $(opts) -shared $(target).o $(cuda_lib) -o $(target).so $(target).o: $(target).cu headers/some_header.hpp      $(nvcc) $(opts) $(cuda_inc) -xcompiler -fpic -c $(target).cu  has idea how bottom of this?
edit:
 cuda-memcheck idea, error apparantly happens earlier during call of kernel_set_value:
========= invalid __global__ write of size 4 =========     @ 0x00000298 in void kernel_set_value<float>(unsigned long, unsigned long, float*, float) =========     thread (480,0,0) in block (30,0,0) =========     address 0x0005cd00 out of bounds =========     saved host backtrace driver entry point @ kernel launch time =========     host frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 (culaunchkernel + 0x2c5) [0x209035] [...] =========     host frame:/media/.../myprogramm.so (_zn5boost6python6detail6invokeiipfvrkssens0_15arg_from_pythonis4_eeeep7_objectns1_11invoke_tag_ilb1elb0eeerkt_rt0_rt1_ + 0x2d) [0x3e5eb] [...] ========= ========= program hit cudaerrorlaunchfailure (error 4) due "unspecified launch failure" on cuda api call cudamemcpy.  =========     saved host backtrace driver entry point @ error =========     host frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x2f4e33] =========     host frame:/media/.../myprogramm.so [0x7489f] f0703 16:23:54.840698 26207 myprogramm.cu:411] check failed: error == cudasuccess (4 vs. 0)  unspecified launch failure [...] =========     host frame:python (py_main + 0xb5e) [0x66d92] =========     host frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xf5) [0x21f45] =========     host frame:python [0x177c2e] ========= *** check failure stack trace: *** ========= error: process didn't terminate ========= internal error (20) ========= no cuda-memcheck results found but function kernel_set_value works fine in toy example. there special consider when using kernel_set_value. it's source code , it's respective helper functions.  
#define cuda_num_threads 512 #define max_num_blocks 2880  inline int cuda_get_blocks(const size_t n) {   return min(max_num_blocks, int((n + size_t(cuda_num_threads) - 1) / cuda_num_threads)); }  inline size_t cuda_get_loops(const size_t n) {   size_t total_threads = cuda_get_blocks(n)*cuda_num_threads;   return (n + total_threads -1)/ total_threads; }  template <typename dtype> __global__ void kernel_set_value(size_t cuda_num_loops, size_t n, dtype* gpudst, dtype value){   const size_t idxbase = size_t(cuda_num_loops) * (size_t(cuda_num_threads) * size_t(blockidx.x) + size_t(threadidx.x));   if (idxbase >= n) return;   (size_t idx = idxbase; idx < min(n,idxbase+cuda_num_loops); ++idx ){     gpudst[idx] = value;   } } 
so final solution compile code without -gencode=arch=compute_xx,code=sm_xx-style flags. took me forever find out. actual error codes missleading (error == cudasuccess (77 vs. 0) illegal memory access, (4 vs. 0) unspecified launch failure or (8 vs. 0) invalid device function
Comments
Post a Comment