Hi,
I have tried to use both Intel CPU cores and HD graphics cores simultaneously under Intel OpenCL SDK. The first thing I tried is a simple memory copy kernel to see whether the transfer from global to private memory (and vice versa) occurs simultaneously for both Intel CPU cores and HD graphics cores. Here are parts of my source codes.
#define FLOAT float
__kernel void assign(__global FLOAT *x, __global FLOAT *y)
{
size_t idx = get_global_id(0);
y[idx] = x[idx];
}
...
-------------- Time measurement START --------------
// Enqueue NDRange to CPU
clEnqueueNDRangeKernel(cqCommandQueue_cpu, ckKernel[1], 1, NULL, &GWS2, &LWS2, 0, NULL, &ev_list[0]);
clFlush(cqCommandQueue_cpu);
// Enqueue NDRange to GPU
clEnqueueNDRangeKernel(cqCommandQueue_gpu, ckKernel[0], 1, NULL, &GWS, &LWS, 0, NULL, &ev_list[1]);
clFlush(cqCommandQueue_gpu);
err = clWaitForEvents(2, ev_list);
CheckErr(err);
-------------- Time measurement STOP --------------
...
The above program assigns the vector x into the vector y in parallel. Assume we have n length vector x and y. Since assigning each vector element is independent from each other, I thought that a simple load balancing is possible between Intel CPU cores and HD graphics cores. Unfortunately, above two kernels run almost serial compared with the time that I measured the transfer time of the CPU cores and HD graphics cores separately.
'k' numbers of assignments to CPU takes T_c seconds, and 'n-k' numbers of assignments to HD graphics takes T_g seconds. I expected max(T_c,T_g) time for a result of above program so that guarantees concurrent assignments. But above program shows about T_c+T_g seconds which means above kernel execution is almost serial.
My conclusion is that the Intel CPU cores and HD graphics cores shares global memory bandwidth, so this could not be in parallel. But I am not sure it is impossible to be in parallel. Could anyone give me a comment whether it is impossible or not?
Thanks in advance.
* Test machine
CPU : i7-3770K (HD graphics 4000)
OS : Windows 7 SP1 64bit, VS2012
SDK : Intel OpenCL SDK 2013