Given a kernel that uses no barriers, does this recommendation still hold for GEN8 and beyond?
https://software.intel.com/en-us/node/540442
NOTE
A bare minimum SLM allocation size is 4k per workgroup, so even if your kernel requires less bytes per work-group, the actual allocation still will be 4k. To accommodate many potential execution scenarios try to minimize local memory usage to fit the optimal value of 4K per workgroup. Also notice that the granularity of SLM allocation is 1K.