Quantcast
Channel: Intel® Software - OpenCL*
Viewing all articles
Browse latest Browse all 1182

GEN instruction explanation?

$
0
0

I'm storing 8x64-bit quad-words (SIMD8) to SLM and am trying to understand some curious GEN sequences.

The OpenCL line of code in question is a store to a doubly indexed array in SLM:

shared.m0[2][local_id] = r1;

Why does this indexed store to SLM result in 4-6 "mov" operations and two sends?

I assume some MOV operations are necessary to prepare a SEND "message"?

But why are there two SEND ops? 

send     (8|M0)         null:ud       r27:ud            0xA       0x40F0020 //  hdc.dc0  wr:2h, rd:0, wr.scrdwfc: 0x70020
send     (8|M0)         null:ud       r59:ud            0xC       0x6026CFE //  hdc.dc1  wr:3, rd:0, wr.usurf msc:44, to SLM

I understand the second SEND but what is the first doing that's necessary?  Is it a queue barrier of some sort?

Also, why are there so many MOV operations for this 8x64-bit SIMD8 store?

 


Viewing all articles
Browse latest Browse all 1182

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>