Quantcast
Channel: Intel® Software - OpenCL*
Viewing all articles
Browse latest Browse all 1182

Tip for people porting from CUDA: __syncthreads() != barrier(CLK_LOCAL_MEM_FENCE)

$
0
0

( The following is based on some recent experiments on a GEN8 IGP )

FYI -- one gotcha to watch out for when porting from CUDA to Intel IGP is that the OpenCL barrier()/work_group_barrier() operation doesn't support either work items or subgroups exiting early.

For example, if a subgroup returns early and the remaining work items synchronize in a barrier() then your kernel is going to hang on the IGP.

Early exit of some threads (work items) at the end of a grid is a pretty common use case in CUDA.

Fortunately, OpenCL 2.0 has a feature that doesn't exist in CUDA and it might help you workaround this issue... Non-Uniform Work Groups.

 


Viewing all articles
Browse latest Browse all 1182

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>