Quantcast
Channel: Intel® Software - OpenCL*
Viewing all articles
Browse latest Browse all 1182

Any way to coax HD Graphics IGP into SIMD4 or SIMD4x2 mode?

$
0
0

I asked a similar question last year and want to know if there is any way to coax the compiler into mapping "vectorized" code onto the IGP?

More specifically, I'd like to launch a workgroup where each work item is a SIMD4 or SIMD4x2 vector and the number of vector registers per work item might approach 128.

I have a few interesting near-embarrassingly parallel kernels that require a few rounds of inter-lane communication across SIMD lanes.  The kernels map well to AVX2 and GPU architectures with swizzle/permute/shuffle support.  The kernels also work fine without a swizzle but bounce everything through local memory.  Avoiding local mem provides a decent speedup.

With the current scalar-per-thread code generation (SIMD8/16/32?) and no access to inter-subgroup swizzles, all communication needs to be bounced through shared.... while executing in SIMD4/4x2 mode would presumably allow me to use the SIMD swizzle support.

I assume the answer is till "no" but vectorizer "knobs" might be a useful feature to add to future versions of the IGP compiler.

 


Viewing all articles
Browse latest Browse all 1182

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>