Quantcast
Channel: Intel® Software - OpenCL*
Viewing all articles
Browse latest Browse all 1182

Incorrect kernel execution with barrier(CLK_LOCAL_MEM_FENCE)

$
0
0

Consider the following kernel:

__kernel void test(__global float2 *output, __global float2 *input)
{
    __local float lmem[8];
    float2 a;
    const size_t tid = get_global_id(0);
    if(tid / 8 == 0)
    {
        a = input[tid];
    }
    else
    {
        return;
    }
    lmem[tid] = -a.x;
    barrier(CLK_LOCAL_MEM_FENCE);
    a.x = lmem[tid];
    barrier(CLK_LOCAL_MEM_FENCE);
    output[tid] = a;
}

If I execute it with global size == local_size == 16 and pass an array of 16 float2 elements as input:

input = [  0.+0.j   1.+0.j   2.+0.j   3.+0.j   4.+0.j   5.+0.j   6.+0.j   7.+0.j
   8.+0.j   9.+0.j  10.+0.j  11.+0.j  12.+0.j  13.+0.j  14.+0.j  15.+0.j]

and a zero-filled buffer as output, I expect the first 8 elements of the output to have their real parts negated in the output array, while the rest of it remaining untouched:

output = [-0.+0.j -1.+0.j -2.+0.j -3.+0.j -4.+0.j -5.+0.j -6.+0.j -7.+0.j 0.+0.j
  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j]

This is what happens on Ubuntu 12.04 x64, nVidia CUDA 5 platform, Tesla C2050 device. But on the same operating system, Intel OpenCL XE SDK 2013 3.0.67279, and Intel Xeon E5620 the whole resulting buffer remains untouched:

output = [ 0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j
  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j  0.+0.j]

The output coincides with the reference CUDA output if I do any of the following:

  1. Comment out the barriers;
  2. Use float arrays instead of float2
  3. Initialize "a" inside the kernel instead of reading it from input (i.e. as "a = (float2)(tid, 0)").

Has anyone encountered such behavior? Is it a bug, or am I making incorrect assumptions about how barriers work?


Viewing all articles
Browse latest Browse all 1182

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>