dot product kernel doesn't work on CPUs

Hi,

I'm new to OpenCL and I have implemented a program to compute the
dot product. The program works as expected if I use a GPU and it
returns a wrong result if I use a CPU with more than one work-item
in a work-group. I was able to find the reason for the problem
using only two work-items per work-group and one work-group
per NDrange. I have two work-items before and after the reduction
operation if I use a GPU and only one work-item after the
reduction operation if I use a CPU so that the partial sum of the
work-group will not be stored. The program uses libOpenCL.so.1 from
opencl-1.2-sdk-6.3.0.1904, opencl_runtime_16.1.1_x64_sles_6.4.0.25,
and the OpenCL driver from CUDA-8.0. Does somebody know why I have
only one work-item after the reduction operation? Is something
wrong with my kernel (most likely) or have I detected a problem with
the Intel OpenCL implementation for CPUs (very unlikely)?

loki introduction 230 gcc dot_prod_OpenCL_orig.c errorCodes.c -lOpenCL
loki introduction 231 a.out

Try to find first GPU on available platforms.
...
******** Using platform 1 ********
Use device Quadro K2200.

before reduction: local_id = 0
before reduction: local_id = 1
after reduction: local_id = 0
after reduction: local_id = 1
sum = 6.000000e+01

loki introduction 232 gcc dot_prod_OpenCL.c errorCodes.c -lOpenCL
loki introduction 233 a.out

Try to find first CPU on available platforms.
******** Using platform 0 ********
Use device Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz.

before reduction: local_id = 0
before reduction: local_id = 1
after reduction: local_id = 1
sum = 2.265776e-316

loki introduction 234 strace a.out |& grep ocl
open("/usr/local/intel/opencl-1.2-6.4.0.25/lib64/libintelocl.so", O_RDONLY|O_CLOEXEC) = 5
open("/usr/local/intel/opencl-1.2-6.4.0.25/lib64/__ocl_svml_l9.so", O_RDONLY|O_CLOEXEC) = 3
loki introduction 235

dot_prod_OpenCL.h
-----------------

#define   VECTOR_SIZE          10
#define WORK_ITEMS_PER_WORK_GROUP 2   /* power of two   required   */
#define WORK_GROUPS_PER_NDRANGE   1

dotProdKernel.cl
----------------

#if defined (cl_khr_fp64) || defined (cl_amd_fp64)
#include "dot_prod_OpenCL.h"

__kernel void dotProdKernel (__global const double * restrict a,
                   __global const double * restrict b,
                   __global double * restrict partial_sum)
{
    /* Use local memory to store each work-items running sum.       */
    __local double cache[WORK_ITEMS_PER_WORK_GROUP];

double temp = 0.0;
int cacheIdx = get_local_id (0);

    for (int tid = get_global_id (0);
   tid < VECTOR_SIZE;
   tid += get_global_size (0))
    {
      temp += a[tid] * b[tid];
    }
    cache[cacheIdx] = temp;

    /* Ensure that all work-items have completed, before you add up the
     * partial sums of each work-item to the sum of the work-group
     */
    barrier (CLK_LOCAL_MEM_FENCE);

    /* Each work-item will add two values and store the result back to
     * "cache". We need "log_2 (WORK_ITEMS_PER_WORK_GROUP)" steps to
     * reduce all partial values to one work-group value.
     * WORK_ITEMS_PER_WORK_GROUP must be a power of two for this
     * reduction.
     */
    printf ("before reduction: local_id = %u\n", get_local_id (0));
    for (int i = get_local_size (0) / 2; i > 0; i /= 2)
    {
      if (cacheIdx < i)
      {
   cache[cacheIdx] += cache[cacheIdx + i];
   barrier (CLK_LOCAL_MEM_FENCE);
      }
    }
    printf ("after reduction: local_id = %u\n", get_local_id (0));
    /* store the partial sum of this work-group               */
    if (cacheIdx == 0)
    {
      partial_sum[get_group_id (0)] = cache[0];
    }
}
#else
#error "Double precision floating point not supported."
#endif

Thank you very much for any help in advance.

Kind regards

Siegmar

dot product kernel doesn't work on CPUs

Trending Articles

Scuffham Amps - S-GEAR 2.6.0 VST, AAX, STANDALONE x86 x64 (R2R NO iLok2, +NO...

Practice Sheet of Right form of verbs for HSC Students

VHSE First (1st) Allotment 2025 - vhscap.kerala.gov.in

UNIVERSE LEAGUE – UNIVERSE LEAGUE – WAR (We Are Ready) – EP [iTunes Plus M4A]

City Hunter Teledrama – Episode 18 – 07th May 2016

Comment on Proposed Criteria for Identifying Predatory Conferences by Luke...

Bureau of Internal Revenue: Regional Offices (Directory)

Kendrick Lamar – Not Like Us (2024) [24Bit-88.2kHz] [PMEDIA] ⭐️

Inception 2010 Hindi Dual Audio 650MB BRRip 720p ESubs HEVC

East Hull MD admits sexual assaults after another victim comes forward

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

R. v. Sargeant, 2023 ONSC 6406 (CanLII)

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Who’s been sentenced at Northampton Magistrates’ Court

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Family cries out as traditional ruler allegedly abducts brother, extorts N2.5m

Long-Running Conflict In Springfield (MA) Gangland Sphere Has Manzi Family &...

Wondershare Filmora X v10.1.20.16 x64

Man arrested after fracas in flat

Man charged in ongoing Sexual Assault Investigation Derek Nyilas, 46, Faces...