I have a question regarding the use of vector data types inside OpenCL kernels. Since I am working in OpenCL I have heard about the advantages of using vector data types to bulk load/store data from/to device memory, to take chance of SSE and/or AVX instructions available on CPUs. However, looking to the CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT/LONG/FLOAT property of several GPUs (including Intel HD, AMD and NVIDIA graphics processors) all of them present a value of 1. So it seems that in current GPU architectures they do not take any advantage of the use of vector data types, as pointed in the following stackoverflow discussion: https://stackoverflow.com/questions/16258930/speedup-when-using-float4-o...
Therefore, would you recommend (excluding CPUs) to use vector data types to store data on global memory?. I am currently working on a Monte Carlo code for particle transport and I use float4 data types to store particle information (position, energy, etc.), its attributes are "codified" on these data types and therefore I usually have to extract them addressing the vector components, for example:
// store is a float4 data type!!
float position = store.xyz
float energy = store.w
Maybe it would be more advisable (on a performance point of view) to just use plain int or float data types?.
Thanks for your help!