Will the Intel HD Graphics OpenCL compiler support "1 work item" work groups that are float8 vectors?
Example:
__kernel __attribute__((vec_type_hint(float8),reqd_work_group_size(1,1,1))) void __kernel(__global const float8* const restrict in, __global float8* const restrict out) { ... // lots and lots of float8 vector registers }
The goal is to occupy as many float8 registers as possible in a single work item. The kernel I'm designing can benefit from float4 swizzling ops and I'm assuming float8 is the narrowest width that matches the 128x8 register file found in Ivy and Haswell architectures.
Questions:
- Does the HD Graphics OpenCL compiler support allocating as many as 128 registers on IvyBridge and Haswell?
- If this isn't supported, why no?
- If this isn't support then what is the best work group size to acquire the most possible registers per work item?
Thanks, I'm very impressed with the HD Graphics architecture. The EUs and sub-slices appear to have *huge* amounts of resources compared to other low power GPUs.