Quantcast
Channel: Intel® Software - OpenCL*
Viewing all articles
Browse latest Browse all 1182

performance of half2 vector vs. half scalars per SIMD8 lane?

$
0
0

Basic question for GEN8+ experts:

In a SIMD8 kernel, does the GEN8+ EU achieve maximum fp16 throughput with half2 vectors per SIMD lane or are independent half scalars going to be better/worse/same? 

I am also wondering why assigning 4 half2 vectors with constants results in 8 scalar half MOVs?

Given a struct made up of 4 half2 vectors:

       a.x = 0;
       a.y = 0;
       a.z = 0;
       a.w = 1;

this is what gets generated:

         mov      (8|M0)         r79.0<1>:hf   0x3C00:hf
         mov      (8|M0)         r79.8<1>:hf   0x3C00:hf
         mov      (8|M0)         r78.0<1>:hf   0x0:hf
         mov      (8|M0)         r78.8<1>:hf   0x0:hf
         mov      (8|M0)         r77.0<1>:hf   0x0:hf
         mov      (8|M0)         r77.8<1>:hf   0x0:hf
         mov      (8|M0)         r76.0<1>:hf   0x0:hf
         mov      (8|M0)         r76.8<1>:hf   0x0:hf

I was expecting to see a 32-bit MOV initializing each half2 member.

 


Viewing all articles
Browse latest Browse all 1182

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>