Quantcast
Channel: Intel® Software - OpenCL*
Viewing all articles
Browse latest Browse all 1182

AVX2 and FMA3 support

$
0
0

The FAQ states "Yes, Intel OpenCL* SDK 2013 introduces performance improvements that include full code generation on the Intel Advanced Vector Extensions (Intel AVX and Intel AVX2)."

I'm trying to get it to produce code that utilises the AVX2 FMA3 instructions.

I'm using the Kernel Builder (CPU - 64 bit AVX2) i.e. target set for AVX2 instruction set.

-------------
__kernel void dofma(const global float *a, const global float *b, const global float *c, global float *out)
{
uint gid= get_global_id(0);
float fa = a[gid];
float fb = b[gid];
float fc = c[gid];
fa = mad(fa,fb,fc);
out[gid] = fa;
}
------------------

Gives code that uses vmulps and vaddps but not VFMADD213 type code

using fa = fma(fa,fb,fc);
produces alot more code and a function call for the fma which results in very low performance.


Viewing all articles
Browse latest Browse all 1182

Trending Articles