Windows 8.1 64-bit, Intel HD 4600 (both latest release and beta drivers), the following snippet from an OpenCL kernel produces an incorrect result: if (k_delta == 38443432 && jj==4620) printf((__constant char *)"cl_barrett32_87_gs: jj=%x kdelta=%x mulhi=%x\n", (uint)jj, (uint)k_delta, (uint)mul_hi((uint)jj,(uint)k_delta));
the output is:
cl_barrett32_87_gs: jj=120c kdelta=24a99a8 mulhi=0
It is my understanding that mul_hi should not produce a zero result here.
I also have a (likely) related multiplication bug:
facdist = (ulong) (2 * NUM_CLASSES) * (ulong) exponent;
fails with the upper 32-bits being zero where NUM_CLASSES is a #define for 4620 and exponent is a value in the 50 million area.