Quantcast
Channel: Intel® Software - OpenCL*
Viewing all articles
Browse latest Browse all 1182

"intel_sub_group_block_read8" fails to get correct values if the image is created from a buffer

$
0
0

intel_sub_group_block_read8(src, coord) doesn't get correct values, if the src 2D image is created through method 1 like the following steps:
cl_mem buf_from_hostptr = clCreateBuffer( context, CL_MEM_ALLOC_HOST_PTR | CL_MEM_COPY_HOST_PTR, N * M * sizeof(float), src, &err );
cl_image_desc desc;
...
desc.buffer = buf_from_hostptr;
clCreateImage( context, 0, &mbr_imageFormat, &desc, NULL, &err );

If I create the src 2D image through method 2 which is created from src array directly:
mi_src0 = clCreateImage( context, CL_MEM_USE_HOST_PTR, &mbr_imageFormat, &desc, src, &err );
It can work correctly.

I have a test app and you can get the code through "git clone https://github.com/kangshan0910/buffer2image.git", run "Make" and you will get the test binary.

The test will create a 8x8 matrix. In the opencl kernel, each work item in the subgroup will call intel_sub_group_block_read8 to read one column data.
The 8x8 matrix is:
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07
1.00 1.01 1.02 1.03 1.04 1.05 1.06 1.07
2.00 2.01 2.02 2.03 2.04 2.05 2.06 2.07
3.00 3.01 3.02 3.03 3.04 3.05 3.06 3.07
4.00 4.01 4.02 4.03 4.04 4.05 4.06 4.07
5.00 5.01 5.02 5.03 5.04 5.05 5.06 5.07
6.00 6.01 6.02 6.03 6.04 6.05 6.06 6.07
7.00 7.01 7.02 7.03 7.04 7.05 7.06 7.07

For method 1, run "./test -b", its output is:
matrix size: 8x8
group_xy(0,0) local_xy(00,00) data=0.00,0.00,2.00,2.00,4.00,4.00,6.00,6.00
group_xy(0,0) local_xy(01,00) data=0.01,0.01,2.01,2.01,4.01,4.01,6.01,6.01
group_xy(0,0) local_xy(02,00) data=0.02,0.02,2.02,2.02,4.02,4.02,6.02,6.02
group_xy(0,0) local_xy(03,00) data=0.03,0.03,2.03,2.03,4.03,4.03,6.03,6.03
group_xy(0,0) local_xy(04,00) data=0.04,0.04,2.04,2.04,4.04,4.04,6.04,6.04
group_xy(0,0) local_xy(05,00) data=0.05,0.05,2.05,2.05,4.05,4.05,6.05,6.05
group_xy(0,0) local_xy(06,00) data=0.06,0.06,2.06,2.06,4.06,4.06,6.06,6.06
group_xy(0,0) local_xy(07,00) data=0.07,0.07,2.07,2.07,4.07,4.07,6.07,6.07
This is incorrect.

For method 2, execute "./test", its output is:
matrix size: 8x8
group_xy(0,0) local_xy(00,00) data=0.00,1.00,2.00,3.00,4.00,5.00,6.00,7.00
group_xy(0,0) local_xy(01,00) data=0.01,1.01,2.01,3.01,4.01,5.01,6.01,7.01
group_xy(0,0) local_xy(02,00) data=0.02,1.02,2.02,3.02,4.02,5.02,6.02,7.02
group_xy(0,0) local_xy(03,00) data=0.03,1.03,2.03,3.03,4.03,5.03,6.03,7.03
group_xy(0,0) local_xy(04,00) data=0.04,1.04,2.04,3.04,4.04,5.04,6.04,7.04
group_xy(0,0) local_xy(05,00) data=0.05,1.05,2.05,3.05,4.05,5.05,6.05,7.05
group_xy(0,0) local_xy(06,00) data=0.06,1.06,2.06,3.06,4.06,5.06,6.06,7.06
group_xy(0,0) local_xy(07,00) data=0.07,1.07,2.07,3.07,4.07,5.07,6.07,7.07
This is correct.

My main clinfo output is:
Number of platforms 3
Platform Name Intel(R) OpenCL HD Graphics
Platform Vendor Intel(R) Corporation
Platform Version OpenCL 2.1
Platform Profile FULL_PROFILE
Platform Extensions cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_depth_images cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_media_block_io cl_intel_driver_diagnostics cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_khr_fp64 cl_khr_subgroups cl_khr_il_program cl_intel_spirv_device_side_avc_motion_estimation cl_intel_spirv_media_block_io cl_intel_spirv_subgroups cl_khr_spirv_no_integer_wrap_decoration cl_khr_mipmap_image cl_khr_mipmap_image_writes cl_intel_unified_shared_memory_preview cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_device_side_avc_motion_estimation cl_intel_advanced_motion_estimation cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_intel_va_api_media_sharing
Platform Host timer resolution 1ns
Platform Extensions function suffix INTEL
...
Platform Name Intel(R) OpenCL HD Graphics
Number of devices 1
Device Name Intel(R) Gen9 HD Graphics NEO
Device Vendor Intel(R) Corporation
Device Vendor ID 0x8086
Device Version OpenCL 2.1 NEO
Driver Version 20.01.15264
Device OpenCL C Version OpenCL C 2.0
Device Type GPU
...

TCE Open Date: 

Monday, March 2, 2020 - 16:33

Viewing all articles
Browse latest Browse all 1182

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>