I have a kernel with a "required subgroup size" of 8.
My test is launching a grid of 24 global work items and 8 local work items (only for testing purposes).
After much debugging, the sub_group_broadcast() function was determined to be the culprit.
Replacing it with work_group_broadcast() resulted in a working kernel.
Is this a known bug?
All of the other sub_group_XXX() functions appear to be working.
-Allan
Platform: Win10 x64, HD 530, 21.20.16.4552.