Quantcast
Channel: Intel® Software - OpenCL*
Viewing all articles
Browse latest Browse all 1182

Matrix mulitplication - Hanged with Intel GPU

$
0
0

I am experimenting opencl with examples from the book OpenCL in Action. I found different behaviors on different devices. Below is the kernel code about matrix multiplication I am in trouble with.

__kernel void matrix_mult(__global float4 *a_mat, 
      __global float4 *b_mat, __global float *c_mat) {

   float sum;

   int num_rows = get_global_size(0);
   int vectors_per_row = num_rows/4;
   int start = get_global_id(0) * vectors_per_row;
   a_mat += start;
   c_mat += start*4;

   for(int i=0; i<num_rows; i++) {
      sum = 0.0f;
      for(int j=0; j<vectors_per_row; j++) {
         sum += dot(a_mat[j], b_mat[i * vectors_per_row + j]);
      }
      c_mat[i] = sum;
   }
}

I am testing with Intel Opencl SDK 2019 on Intel CPU i7-4600U and Intel GPU HD Graphics 4400. Both devices (CPU and GPU) can complete the kernel successfully on matrix size 1024x1024 floats (the kernel is executed with global size set to 1024). However, if I increase the matrix size to 2048x2048 (the kernel is then executed with global size set to 2048), kernel execution can still be completed using CPU. However kernel execution hangs on GPU without return. 

The issue seems devices specific. If I commented out the line inside the for loop (i.e. the line with sum+= dot…), then Intel GPU can complete the kernel execution.

I wonder the issue may be related to the conflict of global memory access of a_mat and b_mat across different processing elements.

May any experts offer me any advice to figure out a solution?

 

 


Viewing all articles
Browse latest Browse all 1182

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>