Hi all,
I have a strange behavior when trying to access a memory location in __constant memory space represented by an array of structs.
I separated the case as minimal C++/host and OpenCL/kernel codes and attached them with the post.
However, let me give you some insights:
I have an OpenCL kernel with the following struct:
typedef struct __attribute__((packed)) buffer_1_struct { uint s2d1; uint s2d2; ulong s2d3; char s2d4[2]; char s2d5[2]; ulong s2d6; ulong s2d7; } struct2_t;
From the host side, I create an array of this structure where each element is 36 bytes (packed) and pass it as buffer to the kernel. In the attached files, I create the array with two elements.
When i read the second array-element and try to access the struct-element (s2d3) at index (8) on the GPU, i get zero value. This how i access it usually:
((__constant struct2_t*)buffer02)[get_global_offset(2)].s2d3.
Where the problem is observed when get_global_offset(2) = 1.
However, when i access it by byte-based memory indicies, i manage to retrieve the data correctly in the GPU. Here how i access it:
*((__constant ulong*)(((__constant char*)buffer02)+36+8))
Surprisingly, both ways point to the same address and i cast them using the same address-pointer-type but when i view the values they are different.
Here is what happens as an OpenCL code snippet:
#define STRUCT_2_SIZE (sizeof(struct2_t)) #define STRUCT_2_s2d3_idx (2*sizeof(uint)) ...... printf("z-offset=%d\n",get_global_offset(2)); printf("struct-2-size=%d\n",STRUCT_2_SIZE); __constant ulong* adr1 = ((__constant ulong*)(((__constant char*)buffer02)+STRUCT_2_SIZE+STRUCT_2_s2d3_idx)); __constant ulong* adr2 = &((__constant struct2_t*)buffer02)[get_global_offset(2)].s2d3; printf("adr1=%d\n",adr1); printf("adr2=%d\n",adr2); if(adr1 == adr2) printf("The two addresses are equal !\n"); else printf("The two addresses are diffierent !\n"); printf("val1=%d\n",*adr1); printf("val2=%d\n",*adr2); if(*adr1 == *adr2) printf("The two values are equal !\n"); else printf("The two values are diffierent !\n");
The full code is attached and here is the output:
z-offset=1 struct-2-size=36 adr1=1770913836 adr2=1770913836 The two addresses are equal ! val1=5632 val2=0 The two values are different !
This happens with the following notes:
1- It happens only in GPU. if you check the attached code in CPU, it works fine.
2- This code is in a kernel function (func2) and the problem happens only when i call some other functions, with some sequence, before this. Check the attached code.
3- The attached code shows the minimal case. Removal of some code lines causes the problem to disappear.
4- I use SDK version 7.0.0.2511 running in Windows 10 and building with x64 OpenCL library. 5- My machine has an Intel Core i5 6200U CPU (with embedded Intel® HD Graphics 520 GPU).
I hope anyone from Intel can advise regarding this case or report it is a bug that will be resolved.
Remarks,