Hello, I'm new to Intel GPU and I'm trying to do some OpenCL programming on Graphics.
1. Is there the wavefront concept on Intel GPU? What is the proper work group size?
On AMD GPU, code is actually executed in groups of 64 threads. On Nvidia GPU, this number is 32. On
Intel GPU, is this number is the number of EUs in one subslice multiplied by 7? I use the template
in visual studio for OpenCL, and the group size is NULL. I don't know whether this influence the
performance.
clEnqueueNDRangeKernel(ocl->commandQueue, ocl->kernel, 2, NULL, globalWorkSize, NULL, 0, NULL,
NULL);
2. If I run a kernel many times, will the cache contains the data, just like C programming?
For example, If I run the following code, when the clEnqueueNDRangeKernel first start the kernel,
data will be introduced from memory into cache. Then, If I run the kernel second time, and the data
is the same. Can it reuse the data in the cache? I mean it doesn't need to get the data from the
memory? Just like usual C/C++ programming. Or in another situation, clEnqueueNDRangeKernel will
empty the cache and need to reload the data again?
for(int i=0; i<100; i++){
clEnqueueNDRangeKernel(..Add...);
}
23 __kernel void Add(__global uint* pA, __global uint* pB, __global uint* pC){
.....
31 pC[id] = pA[id] + pB[id];
32 }
3.When using clCreateBuffer, what's the difference between the flags "CL_MEM_READ_ONLY |
CL_MEM_USE_HOST_PTR" and "CL_MEM_READ_ONLY "? Because I think they use the same memory. Then should
they have the same speed?
I only find the "The Compute Architecture of Intel® Processor Graphics Gen7.5 and Gen8.0". If there
is some "GPU OpenCL programming guide", please let me know.
Thanks!