Quantcast
Channel: Intel® Software - OpenCL*
Viewing all articles
Browse latest Browse all 1182

code optimization + while-loop + local memory counter + CPU = infinite run of the kernel, demo inside

$
0
0

Hi,

encountered some strange bug and prepared the demonstration.

1. Simple kernel:
--------------------------

#pragma OPENCL EXTENSION cl_intel_printf : enable

__kernel void glitch( 
	__local uint *scratch
) {

	uint k,v;
	k=0;
	v=0; 

	printf(" START\n");
	uint id = (uint)get_global_id(0);
	if (id==0) {
		scratch[k] = v;
	}
	barrier( CLK_LOCAL_MEM_FENCE );
	while ( scratch[k]<2 ) {
		if (id==0) {
			scratch[k]++;
		}
		barrier( CLK_LOCAL_MEM_FENCE );
	}
	printf(" FINISH\n");
}

---------------------------

2. Run with natural grid dimensions: single group 16x1x1 , or 256x1x1
and local memory size, for example, 1024

3. what expected:
the items are started (you see 16 lines of START)
the first of them increments the counter in local memory
and all items are finished (you see 16 lines of FINISH)

4. but FINISH repeat forever!
maybe some stack corruption?
When you switch off the printf and remove its pragma,
the kernel simply does not return which is the same.

---------------------------
cases:

Intel CPU with "-cl-opt-disable" runs fine

Nvidia GPU - no problem with or without optimization.

Intel CPU with optimization and workgroup sizes 8 or 1 runs fine. 2, 4, 16, 64, 256 are bad.
(8 may be the special case: the number of threads in my CPU, i7-4790)

Intel OpenCL SDK and CPU driver are re-installed today.

 

---------------------------

additional demonstration:
I added two #define's (see 1st and 2nd lines of the code)

GLITCH=0 whows how to overcome with nonzero index for local memory array
and initial value of counter from get_global_id(1)

zero index is bad.

initial value of counter = explicitly written 0 is bad.

GLITCH2 shows that the item counter does not go after 8:

 FINISH: z=0 y=0 x=0
 FINISH: z=0 y=0 x=1
 FINISH: z=0 y=0 x=2
 FINISH: z=0 y=0 x=3
 FINISH: z=0 y=0 x=4
 FINISH: z=0 y=0 x=5
 FINISH: z=0 y=0 x=6
 FINISH: z=0 y=0 x=7
 FINISH: z=0 y=0 x=0
 FINISH: z=0 y=0 x=1
 FINISH: z=0 y=0 x=2
 FINISH: z=0 y=0 x=3
 FINISH: z=0 y=0 x=4
 FINISH: z=0 y=0 x=5
 FINISH: z=0 y=0 x=6
 FINISH: z=0 y=0 x=7
 FINISH: z=0 y=0 x=0
...etc.

but if you don't call printf with x,y,z, then again everything is OK.

-----
Rather strange.
Would like to know if it works for you?
I don't see any rough errors in my code...

Regards, Petr

 

AttachmentSize
Downloadapplication/zipoptimization-glitch.zip1.91 KB

Viewing all articles
Browse latest Browse all 1182

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>