Quantcast
Channel: Intel® Software - OpenCL*
Viewing all articles
Browse latest Browse all 1182

Why do small changes make in kernel code wrong results?

$
0
0

The following kernel (my_kernel()) which is written based on my_function() calculates wrong results on Intel GPU.
The same code works well on Intel CPU and AMD GPU platform.
If you change the type of array index (idx) from unsigned long to unsigned int, the kernel calculates correct results, but I think that both should calculate correct results.
Which is caused this problem? By my code or Intel OpenCL SDK?

const char* kernel_str ="__kernel \n""void my_kernel(__global const unsigned char* src, \n""               __global unsigned char*       dst, \n""               const unsigned long           elements) \n""{ \n""  const unsigned long gid = get_global_id(0); \n""  const unsigned long idx = 3 * gid; // NG\n""  //const unsigned int idx = 3 * gid; // OK\n""  if (! (gid < elements)) { \n""    return; \n""  } \n""  float r = ((float)src[idx] + (1.5f * (float)src[idx + 2])) - 18.0f;\n""  float g = (((float)src[idx] - (0.4f * (float)src[idx + 1])) - (0.7f * (float)src[idx + 2])) + 14.0f;\n""  float b = ((float)src[idx] + (1.8f * (float)src[idx + 1])) - 23.0f;\n""  r = clamp(r, 0.0f, 255.0f); \n""  g = clamp(g, 0.0f, 255.0f); \n""  b = clamp(b, 0.0f, 255.0f); \n""  dst[idx + 0] = convert_uchar(r); \n""  dst[idx + 1] = convert_uchar(g); \n""  dst[idx + 2] = convert_uchar(b); \n""}";


void my_function(const cl_uchar*     src,
                 cl_uchar*           dst,
                 const unsigned long elements)
{
  for (unsigned long gid = 0; gid < elements; ++gid) {
    const unsigned long idx = 3 * gid;
    float r = ((float)src[idx] + (1.5f * (float)src[idx + 2])) - 18.0f;
    float g = (((float)src[idx] - (0.4f * (float)src[idx + 1])) - (0.7f * (float)src[idx + 2])) + 14.0f;
    float b = ((float)src[idx] + (1.8f * (float)src[idx + 1])) - 23.0f;
    r = std::min(std::max(0.0f, r), 255.0f); // clamp
    g = std::min(std::max(0.0f, g), 255.0f);
    b = std::min(std::max(0.0f, b), 255.0f);
    dst[idx + 0] = (cl_uchar)r;
    dst[idx + 1] = (cl_uchar)g;
    dst[idx + 2] = (cl_uchar)b;
  }
}

I am using these environments.
OS : Windows 10 Pro 64bit
Device Name : Intel(R) Iris(TM) Pro Graphics 580
Device Driver Version : 21.20.16.4542
Intel OpenCL SDK : 2016 R3

OS : Windows 10 Pro 64bit
Device Name : Intel(R) HD Graphics 530
Device Driver Version : 20.19.15.4501
Intel OpenCL SDK : 2016 R2

I attach a source code and VC project.

Best regards,

AttachmentSize
Downloadapplication/zipIntelGPUCalcError.zip5.04 KB

Viewing all articles
Browse latest Browse all 1182

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>