Hello,
I have a question concerning the usage of structs. My current kernel accesses two buffers using a struct in the following way:
struct pair {
float first;
float second;
};
inline const float f(const struct pair param) {
return param.first * param.second;
}
inline const struct pair access_func(__global float const * const a, __global float const * const b, const int i) {
struct pair res = {
a[i],
b[i]
};
return res;
}
// slow
__kernel ...(__global float const * const a, __global float const * const b)
{
// ...
x = f( access_func( a, b, i ) );
// ...
}When I alter the kernel in the following way it runs much faster:
// fast
__kernel ...(__global float const * const a, __global float const * const b)
{
// ...
x = a[i] * b[ i ];
// ...
}Is there a way to let the compiler do this optimization? The NVIDIA compiler seems to be able to do this, since I don't see a difference in runtime on a GPU.
Thanks in advance!
Thread Topic:
Question