I ran into a bug when I use the Intel OpenCL SDK (verison 7.0.0.2567) on Visual Studio 2015. I defined a struct "Obj" of an array of 5 int. Then I pass five variable of type "Obj" to my opencl kernel program as "__private" variables. If I build my kernel in debug mode (with argument "-g -s filepath"), then some of my variables can not be passed correctly. The code example is on https://github.com/flm8620/intel_opencl_bug/blob/master/main.cpp
Kernel program is:
struct Obj { int a[5]; }; __kernel void test( __global double* output, __private struct Obj param1, __private struct Obj param2, __private struct Obj param3, __private struct Obj param4, __private struct Obj param5 //__private struct Obj param6 ) { int gl = get_global_id(0); const int N = 5; if (gl == 0) { for (int i = 0; i < N; i++) output[i] = param1.a[i]; for (int i = 0; i < N; i++) output[i + N * 1] = param2.a[i]; for (int i = 0; i < N; i++) output[i + N * 2] = param3.a[i]; for (int i = 0; i < N; i++) output[i + N * 3] = param4.a[i]; for (int i = 0; i < N; i++) output[i + N * 4] = param5.a[i]; } }
To verify this, I copy passed variables in kernel to a output variable and print it out at host side:
int main() { bool debug = true; find_cl(debug); const int N = 5; struct Obj { cl_int a[N]; }; Obj param1{ {1,12,123,1234,12345} }; Obj param2{ {1,12,123,1234,12345} }; Obj param3{ {1,12,123,1234,12345} }; Obj param4{ {1,12,123,1234,12345} }; Obj param5{ {1,12,123,1234,12345} }; double output[N * 5]; cl::Buffer output_b(context, CL_MEM_WRITE_ONLY, N * 5 * sizeof(double)); kernel.setArg(0, output_b); kernel.setArg(1, param1); kernel.setArg(2, param2); kernel.setArg(3, param3); kernel.setArg(4, param4); kernel.setArg(5, param5); cl::CommandQueue queue(context, device); queue.enqueueNDRangeKernel(kernel, cl::NullRange, { size_t(1) }, { size_t(1) }); queue.enqueueReadBuffer(output_b, false, 0, N * 5 * sizeof(double), output); queue.finish(); for (int i = 0; i < N * 5; i++) std::cout << output[i] << std::endl; return 0; }
The output is
Detected 3 platforms : NVIDIA CUDA Intel(R) OpenCL Experimental OpenCL 2.1 CPU Only Platform Found CPU platform: Intel(R) OpenCL, has devices: 1: Intel(R) Xeon(R) CPU E3-1231 v3 @ 3.40GHz Use device: Intel(R) Xeon(R) CPU E3-1231 v3 @ 3.40GHz 1 12 123 1234 12345 1 12 123 1234 12345 1 12 123 1234 12345 1 12 123 1234 0 123 1234 0 0 3.90955e+07
I don't think it's linked to struct align because the first four arguments are correctly passed. If I pass six instead of five variables, the program will crash.
If I change the first line in main() to
bool debug = false;
Then everything works.