Linux Iris Kernel Launch Overhead

Hi,

Again, I've been trying to characterize when it makes sense to off load computation from CPU to IGP (i7-5775c CPU vs Iris Pro IGP). I noticed that for very simple kernels (e.g. a single fma, or min/max operation) that the CPU would greatly outperform IGP by up to 50%, and upon investigating it seems that kernel launch overhead has a lot to do with it. Some results to explain:

FMA Kernel (using FMA_LOOP = 1):

void kernel fmaKernel(global float * out){
float sum = out[get_global_id(0)];
for(int i = 0; i < FMA_LOOP; i++){
sum=fma(1.02345, FMA_LOOP.f, sum);
};
out[get_global_id(0)]=sum;
}

Do Nothing Kernel:

void kernel doNothing(global float * out){
return;
}

As a side note, I have forced my IGP to remain at full (1.15 GHz) clock speed. Ditto CPUs (3.7 GHz).

These results all reflect 2D square images (e.g. 32x32, 64x64, 128x128, 512x512, so on), with one thread per pixel.

So I'm led to believe that there's much higher overhead in creating IGP than CPU threads, which is surprising giving that the IGP is supposed to excel at handling lots of threads, and further this overhead is preventing the IGP from really shining in these experiments. I'm guessing what I could do is have each OpenCL thread process more than one pixel, however that requires complicating OpenCL kernels which don't have such great overhead on CPU.

So my question to the forum/Intel is why the overhead is so much greater on IGP than CPU? Or is there something with my experiments that is simply making it appear to be the case, when it can be explained otherwise? I've attempting measuring with OpenCL timers vs. wall clock and don't really see a difference.

Config:
Ubuntu 14.04 LTS
Intel OpenCL 1.2-5.0.0.43 (CPU-x64)
Intel OpenCL 1.2-1.0 (Graphics Driver for HD Graphics, Iris, Iris Pro)

Run benchmark:
./runBench.sh

For my system platform=0=IGP and platform=1=CPU.

Attachment	Size
Download simpleOcl.tar	1.41 MB

Linux Iris Kernel Launch Overhead

Trending Articles

Scuffham Amps - S-GEAR 2.6.0 VST, AAX, STANDALONE x86 x64 (R2R NO iLok2, +NO...

Practice Sheet of Right form of verbs for HSC Students

VHSE First (1st) Allotment 2025 - vhscap.kerala.gov.in

UNIVERSE LEAGUE – UNIVERSE LEAGUE – WAR (We Are Ready) – EP [iTunes Plus M4A]

City Hunter Teledrama – Episode 18 – 07th May 2016

Comment on Proposed Criteria for Identifying Predatory Conferences by Luke...

Bureau of Internal Revenue: Regional Offices (Directory)

Kendrick Lamar – Not Like Us (2024) [24Bit-88.2kHz] [PMEDIA] ⭐️

Inception 2010 Hindi Dual Audio 650MB BRRip 720p ESubs HEVC

East Hull MD admits sexual assaults after another victim comes forward

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

R. v. Sargeant, 2023 ONSC 6406 (CanLII)

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Who’s been sentenced at Northampton Magistrates’ Court

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Family cries out as traditional ruler allegedly abducts brother, extorts N2.5m

Long-Running Conflict In Springfield (MA) Gangland Sphere Has Manzi Family &...

Wondershare Filmora X v10.1.20.16 x64

Man arrested after fracas in flat

Man charged in ongoing Sexual Assault Investigation Derek Nyilas, 46, Faces...