Quantcast
Channel: Intel® Software - OpenCL*
Viewing all articles
Browse latest Browse all 1182

Using lookup table on reduction kernel produces incorrect results on GPU

$
0
0

I have a kernel that takes in an array of integers and returns the index of the smallest element.

#ifndef LOCAL_SIZE
#define LOCAL_SIZE 8
#endif // LOCAL_SIZE

kernel void test( global int* in, global int* out )
{
  int id = get_local_id(0);

  local int indx[LOCAL_SIZE];

  int temp = id;
  for (int i = id; i < 1024; i += LOCAL_SIZE)
  {
    temp = in[i] < in[temp] ? i : temp;
  }
  indx[id] = temp;
  barrier(CLK_LOCAL_MEM_FENCE);

  for(int i = LOCAL_SIZE / 2; i!= 0; i>>=1)
  {
    if(id < i)
    {
      printf( "%4d: %3d, %4d: %3d\n", indx[id], in[indx[id]], indx[id + i], in[indx[id + i]] );
      indx[id] = in[indx[id]] < in[indx[id+ i]] ? indx[id] : indx[id + i];
    }
    barrier(CLK_LOCAL_MEM_FENCE);
    if(id == 0) printf("\n");
  }
  out[0] = indx[0];
}

 

Before the first barrier each work item finds its smallest value and places it into a local buffer. Everything works fine here.

 

In the for loop the results from each work item is reduced further to find the result. However the second to last iteration fails on GPU everytime: in[indx[id]] and in[indx[id + i]] both return the same value.

Operating system: Windows 7 Enterprise

Device Driver Version: 10.18.14.4280

Device: Intel HD 4600 & Processor Intel i5-4590

Works fine on CPU and Nvidia GTX 970

I've attached the kernel and host code to reproduce

AttachmentSize
Downloadapplication/x-7z-compressedReductionLookUpTest.7z1.28 KB

Viewing all articles
Browse latest Browse all 1182

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>