Quantcast
Channel: Intel® Software - OpenCL*
Viewing all articles
Browse latest Browse all 1182

structs and compiler optimizations

$
0
0

Hello,

I have a question concerning the usage of structs. My current kernel accesses two buffers using a struct in the following way:

struct pair {
    float first;
    float second;
};

inline const float f(const struct pair param) {
    return param.first * param.second;
}

inline const struct pair access_func(__global float const * const a, __global float const * const b, const int i) {
    struct pair res = {
            a[i],
            b[i]
    };
    return res;
}

// slow
__kernel ...(__global float const * const a, __global float const * const b)
{
 // ...

 x = f( access_func( a, b, i ) );

 // ...
}

When I alter the kernel in the following way it runs much faster:

// fast
__kernel ...(__global float const * const a, __global float const * const b)
{
 // ...

 x = a[i] * b[ i ];

 // ...
}

Is there a way to let the compiler do this optimization? The NVIDIA compiler seems to be able to do this, since I don't see a difference in runtime on a GPU.

Thanks in advance!

Thread Topic: 

Question

Viewing all articles
Browse latest Browse all 1182

Trending Articles