Quantcast
Channel: Intel® Software - OpenCL*
Viewing all articles
Browse latest Browse all 1182

vload4 vs 4 individual memory accesses : bank conflicts

$
0
0

What is the advantage of vload4 over 4 single memory accesses?

Suppose I am loading memory from local memory. Below are two kernels. The second kernel should exhibit no bank conflict.

Does the first have bank conflicts? Because, if one vload is executed per clock, then there should be conflicts in a half wave.

 

void kernel1() {

     int start = get_global_id(0)*4;

     int4 test = vload4(start,localBuffer);

}

 

 

void kernel2() {

     int4 test;

     int start = get_global_id(0)*4;

     test.x = localBuffer[start];

     test.y = localBuffer[start+1];

     test.z = localBuffer[start+2];

     test.w = localBuffer[start+3];

  }


Viewing all articles
Browse latest Browse all 1182

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>