Hello,
I have a question concerning the usage of structs. My current kernel accesses two buffers using a struct in the following way:
struct pair { float first; float second; }; inline const float f(const struct pair param) { return param.first * param.second; } inline const struct pair access_func(__global float const * const a, __global float const * const b, const int i) { struct pair res = { a[i], b[i] }; return res; } // slow __kernel ...(__global float const * const a, __global float const * const b) { // ... x = f( access_func( a, b, i ) ); // ... }
When I alter the kernel in the following way it runs much faster:
// fast __kernel ...(__global float const * const a, __global float const * const b) { // ... x = a[i] * b[ i ]; // ... }
Is there a way to let the compiler do this optimization? The NVIDIA compiler seems to be able to do this, since I don't see a difference in runtime on a GPU.
Thanks in advance!
Thread Topic:
Question