Quantcast
Channel: Intel® Software - OpenCL*
Viewing all articles
Browse latest Browse all 1182

barrier(CLK_LOCAL_MEM_FENCE) weird deadlock in kernel's local array when reduction

$
0
0

The code run well by clCreateProgramWithSource, BUT dealock by clCreateProgramWithBinary. The size of work group is (8,8), I reproduce the deadlock on Intel Graphics win8.1+ HD4600/530 and win10+HD630.

local int SumAlpha[16*16/4];
int index = 1*get_local_id(0)+ 16/2*get_local_id(1);
......
SumAlpha[index] = sum_alpha;

barrier(CLK_LOCAL_MEM_FENCE);
// do reduction in shared mem
for(int i = (16*16/8); i > 0; i >>= 1)
{
    if(index < i)
    {
        SumAlpha[index] += SumAlpha[index + i];
    }
    barrier(CLK_LOCAL_MEM_FENCE); // unroll loop show deadlock when i==8
}

the cmd which create spir is 

"..\vendors\OpenCL\Intel\bin\ioc32.exe -cmd=build -input="xxx" -device=gpu -spir32="FileName%" -bo="-cl-std=CL1.2 -I %ProjetcDir%\cl"

And I tried '-cl-opt-disable', same deadlock. I tried below equivalence, also deadlock.

a = SumAlpha[index];
b = SumAlpha[index + i];
SumAlpha[index] = a+ b;

I can use instruction dot to avoid reduction, while I want to know more. Did I miss something? Or is there any way to dump out the gen assembly of both way to compare their difference? Greatly Thanks.


Viewing all articles
Browse latest Browse all 1182

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>