512 items is too small for Haswell. A workgroup of 512 items cannot cover a Haswell half-slice.
The minimum acceptable max work group size for Haswell should be at least 10 EUs x 7 h/w threads x 8 work items = 560.
A 512 item SIMD8 work group size can only occupy 9 EUs.
What am I missing?