Quantcast
Channel: Intel® Software - OpenCL*
Viewing all articles
Browse latest Browse all 1182

Memcpy performance using opencl kernel

$
0
0

Hi,

I have written a simple memcpy kernel as written below:

I am analyzing its performance on GPU using vtune.

__kernel void deinterlace_Y(__read_only image2d_t YIn, __write_only image2d_t YOut)
{

/* Doing operation of Memcpy */

int2 coord_src = (int2)(get_global_id(0), get_global_id(1));

const sampler_t smp = CLK_FILTER_NEAREST | CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE;

uint4 pixel4 = read_imageui(YIn, smp, coord_src);

write_imageui(YOut, coord_src, pixel4);

}

I observe the below stats for Execution units:

    EU Array

Active Stalled Idle

24.6% 18.1% 57.2%

Also my computing threads started number is 24,525,023, which is quite high.I don't know how to reduce the number of threads started here and result in increased performance.

I can't understand how to improve its performance. I have gone through this link on optimizationshttps://int2-software.intel.com/en-us/articles/optimizing-simple-opencl-kernels. At this link all the optimizations are related to buffers where we can read 16 elements from memory in one go. But in my case since I am using Texture memory reads or image API's I don't know the way to increase the performance


Viewing all articles
Browse latest Browse all 1182

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>