Why do small changes make in kernel code wrong results?

The following kernel (my_kernel()) which is written based on my_function() calculates wrong results on Intel GPU.
The same code works well on Intel CPU and AMD GPU platform.
If you change the type of array index (idx) from unsigned long to unsigned int, the kernel calculates correct results, but I think that both should calculate correct results.
Which is caused this problem? By my code or Intel OpenCL SDK?

const char* kernel_str ="__kernel \n""void my_kernel(__global const unsigned char* src, \n""               __global unsigned char*       dst, \n""               const unsigned long           elements) \n""{ \n""  const unsigned long gid = get_global_id(0); \n""  const unsigned long idx = 3 * gid; // NG\n""  //const unsigned int idx = 3 * gid; // OK\n""  if (! (gid < elements)) { \n""    return; \n""  } \n""  float r = ((float)src[idx] + (1.5f * (float)src[idx + 2])) - 18.0f;\n""  float g = (((float)src[idx] - (0.4f * (float)src[idx + 1])) - (0.7f * (float)src[idx + 2])) + 14.0f;\n""  float b = ((float)src[idx] + (1.8f * (float)src[idx + 1])) - 23.0f;\n""  r = clamp(r, 0.0f, 255.0f); \n""  g = clamp(g, 0.0f, 255.0f); \n""  b = clamp(b, 0.0f, 255.0f); \n""  dst[idx + 0] = convert_uchar(r); \n""  dst[idx + 1] = convert_uchar(g); \n""  dst[idx + 2] = convert_uchar(b); \n""}";


void my_function(const cl_uchar*     src,
                 cl_uchar*           dst,
                 const unsigned long elements)
{
  for (unsigned long gid = 0; gid < elements; ++gid) {
    const unsigned long idx = 3 * gid;
    float r = ((float)src[idx] + (1.5f * (float)src[idx + 2])) - 18.0f;
    float g = (((float)src[idx] - (0.4f * (float)src[idx + 1])) - (0.7f * (float)src[idx + 2])) + 14.0f;
    float b = ((float)src[idx] + (1.8f * (float)src[idx + 1])) - 23.0f;
    r = std::min(std::max(0.0f, r), 255.0f); // clamp
    g = std::min(std::max(0.0f, g), 255.0f);
    b = std::min(std::max(0.0f, b), 255.0f);
    dst[idx + 0] = (cl_uchar)r;
    dst[idx + 1] = (cl_uchar)g;
    dst[idx + 2] = (cl_uchar)b;
  }
}

I am using these environments.
OS : Windows 10 Pro 64bit
Device Name : Intel(R) Iris(TM) Pro Graphics 580
Device Driver Version : 21.20.16.4542
Intel OpenCL SDK : 2016 R3

OS : Windows 10 Pro 64bit
Device Name : Intel(R) HD Graphics 530
Device Driver Version : 20.19.15.4501
Intel OpenCL SDK : 2016 R2

I attach a source code and VC project.

Best regards,

Attachment	Size
Download IntelGPUCalcError.zip	5.04 KB

Why do small changes make in kernel code wrong results?

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List