I have two opencl source files. a.cl and b.cl I can compile them both into four spirv's. Is it possible to compile both into two spirv's? I have tried to compile them into .ir's then link them into one .ir and then build again to -spirv32="end.spirv" but that didn't work...
ioc64 merge cl source files into one spirv for x64 and x86
OpenCL SDK 2017 R2: intercept layer appears second instead of first
FYI,
I've found that my Win10 + OpenCL SDK 2017 R2 system can't perform "Application Analysis" unless I explicitly force selection of the second (identical) Intel HD Graphics platform and device.
Finding the first matching platform ("Intel") and device ("Graphics") results in no profiling data.
Perhaps the platform should have a name like "Intel(R) OpenCL Interceptor" so it can be explicitly found.
I haven't looked at the github intercept layer repo to see if this feature exists as I'd like to keep using the SDK R2 as is.
Output of my "find by name" routine is attached.
-ASM
0: Intel(R) OpenCL <-- NO PROFILING>>> 0: Intel(R) HD Graphics 630 [ 24.20.100.6025 ] 1: Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz [ 7.6.0.698 ] 1: Intel(R) OpenCL <-- THIS ONE SUPPORT HOST/DEVICE PROFILING 0: Intel(R) HD Graphics 630 [ 24.20.100.6025 ] 1: Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz [ 7.6.0.698 ] 2: NVIDIA CUDA 0: Quadro M1200 [ 398.28 ]
Error in opening opencl code builder in Eclipse Neon 3
I had completed the Intel SDK installation for OpenCL using this thread https://software.intel.com/en-us/articles/sdk-for-opencl-gsg and Installed the
Eclipse IDE for C/C++ Developers
Version: Neon.3 Release (4.6.3)
and i am trying to run the eclipse IDE i am facing the following error "Cannot get machine list. Could not load required libraries. Please make sure to set the correct path under the Code Builder for Opencl preference page"can anyone faced this error and please help on solving the issue.
Also i configured the LD_LIBRARY_PATH given in this thread https://software.intel.com/en-us/openclsdk-devguide-configuring-the-offl...
Thanks
ARUN
How to Locally Debug OpenCL Kernels on an Intel GPU?
My goal is to debug OpenCL kernels on an Intel GPU using Visual Studio 2017. My Intel GPU is not being used for graphics, so I should be able to use a single machine according to the “Developer Guide for Intel SDK for OpenCL Applications.” Are there any instructions or guides on how to debug a kernel on an Intel GPU on a setup like mine? I have been trying to piece together relevant portions of old guides and stackoverflow posts, but nothing seems to work. It appears that there have been several changes to the Visual Studio support in the past couple years, so no source of information has a complete explanation of the current state of development for this scenario.
The “Debugging OpenCL Kernels on GPU” steps in the Developer Guide for Intel SDK for OpenCL Applications only goes into detail for remote debug sessions. When I try to follow the steps in it, I just get errors about gdbserver not working:
INTEL_GT_DEBUGGER: (7594099) Starting gdbserver on localhost
INTEL_GT_DEBUGGER: (7594286) gdbserver exited with 0.
INTEL_GT_DEBUGGER: (7595134) Attempt 1/3 failed: One or more errors occurred.
INTEL_GT_DEBUGGER: (7596136) Attempt 2/3 failed: One or more errors occurred.
INTEL_GT_DEBUGGER: (7597138) Attempt 3/3 failed: One or more errors occurred.
INTEL_GT_DEBUGGER: (7597155) Unable to connect to remote target localhost.
Please make sure that the target machine is accessible.
I have also found that Intel OpenCL platforms are completely unavailable within debug sessions unless I disable “Enable OpenCL API Debugger” in Code-Builder. Is that expected behavior?
Where to download OPENCL SDK 2019 Beta?
I tried Intel System Studio. But it just waste me hours. After the long installation, I didn't find any ocl compiler.
OpenCL SDK setup fails - Universal C Runtime not identified
Hi,
I'm trying to install intel_sdk_for_opencl_setup_6.1.0.1600.exe on Windows 7 64 bit.
However the setup says the https://support.microsoft.com/en-us/kb/2999226 is not installed.
The universal C runtime is already installed, but Intel SDK somehow fails to identify it, and unable to proceed with the setup.
Can anyone please provide some urgent assistance?.
The log files for the installation are attached!.
Best,
Mork.
Do OpenCL drivers from cv sdk (l_openvino_toolkit_p_2018.1.249.tgz) support Atom(TM) x5-Z8350 ?
Hi,
After installing OpenCL NEO driver using script install_NEO_OCL_driver.sh from SDK l_openvino_toolkit_p_2018.1.249.tgz we observe such output of clinfo tool.
$ clinfo
Number of platforms 0
The kernel 4.14.20 was also installed using script install_4_14_kernel.sh. We're using up to date Ubuntu 16.04 LTS.
$ uname -a
Linux ubuntu-upboard 4.14.20-041420-generic #201802162247 SMP Fri Feb 16 22:48:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.4 LTS"
The CPU is Intel(R) Atom(TM) x5-Z8350 CPU @ 1.44GHz.
We tried beignet-opencl-icd drivers instead and they look working (at least for clinfo). But with beignet drivers we still unable to load model to clDNN plugin. In the beignet case the classification_sample in GPU mode finishes with:
[ ERROR ] failed to create engine: No OpenCL device found which would match provided configuration:
Intel(R) HD Graphics Cherryview: missing out of order support
So do we have a chance to get classification_sample working in GPU mode with SDK l_openvino_toolkit_p_2018.1.249.tgz on our platform with x5-Z8350?
Thanks.
Offline compile fails with CL_DEVICE_NOT_FOUND
Hi everyone,
I am trying to play with the OpenCL SDK for Intel. I am able to compile an OpenCL program with gcc and list all the available platforms and run a simple addition kernel.
However, if I try to use the offline compiler ioc64, this is what I get:
Failed to get OpenCL device...: -1 (CL_DEVICE_NOT_FOUND)
Build failed!
As I said, the simpleAdd.c program works, and clinfo recognizes all the different platforms.
-- What I installed --
I installed the neo runtime driver from here (I also installed the libva support):
https://github.com/intel/compute-runtime
And the sdk from here:
https://registrationcenter.intel.com/en/products/postregistration/?sn=C6...
Worth mentioning: during the installation, the installer was warning me that my GPU is not supported.
-- My configuration --
I have a coffee lake i7 8700k CPU and I am running on Ubuntu mate 18.07.
Any idea or suggestions about why the offline compiler is misbehaving? They link with the same libraries, so I really don't understand why the difference.
I also saw this other thread:
https://software.intel.com/en-us/forums/opencl/topic/605224
But none of the solutions mentioned there helped me.
Thanks a lot for any suggestion,
Giuseppe
OpenCL SDK 7.0.0.2567 vs ieframe.dll
Hello
I installed this SDK yesterday and noticed that YouTube and other sites stopped working. Both in IE11 and Edge. IE11 reports ieframe.dll error. Edge complains about wrong site without explanation. My poor solution was to uninstall SDK, everything got fixed except missing SDK... Is there any good solution for this?
i5-4690 CPU / GPU, Windows 10 PRO, x64
Edward
OpenCL source file does not compile with Intel OpenCL SDK
Hi to everyone,
I am facing (another) strange problem with one of my codes. I have been working since some time on a MC code for particle transport using OpenCL. In the last weeks we are facing a strange problem with one of the source files, and specifically, with a function inside it.
The problem is that the compilation process hangs when an Intel SDK is used. We have tested on several systems and always when the Intel platform is targeted the compilation takes forever and does not finish (I leave the PC during the weekend and the compilation did not finished). More specifically, this happens when the Intel CPU is targeted (strangely with the GPU compiles and then the program executes without issues)
Doing some testing we realized that the problem is with the following function inside the *.cl source file:
void howfar( // Particle information particle_t *p, // Geometry data. __global int3 *ngrid, __global float *xbounds, __global float *ybounds, __global float *zbounds, // Output information int *idisc, int *irnew, float *ustep) { float dist = 0.0f; // distance to boundary along particle trajectory if (p->ir == 0) { // the particle is outside the geometry *idisc = 1; // terminate history return; } else { // in the geometry, do transport checks int ijmax = ngrid[0].x * ngrid[0].y; int imax = ngrid[0].x; /* First we need to decode the region number of the particle in terms of the region indices in each direction. */ int irx = (p->ir - 1) % imax; int irz = (p->ir - 1 - irx) / ijmax; int iry = ((p->ir - 1 - irx) - irz * ijmax) / imax; /* Check in z-direction. */ if (p->u.z > 0.0f) { // going towards outer plane dist = (zbounds[irz + 1] - p->r.z) / p->u.z; if (dist < *ustep) { *ustep = dist; if (irz != (ngrid[0].z - 1)) { *irnew = p->ir + ijmax; } else { *irnew = 0; // leaving geometry } } } else if (p->u.z < 0.0f) { // going towards inner plane dist = -(p->r.z - zbounds[irz]) / p->u.z; if (dist < *ustep) { *ustep = dist; if (irz != 0) { *irnew = p->ir - ijmax; } else { *irnew = 0; // leaving geometry } } } /* Check in x-direction. */ if (p->u.x > 0.0f) { // going towards positive plane dist = (xbounds[irx + 1] - p->r.x) / p->u.x; if (dist < *ustep) { *ustep = dist; if (irx != (ngrid[0].x - 1)) { *irnew = p->ir + 1; } else { *irnew = 0; // leaving geometry } } } else if (p->u.x < 0.0f) { // going towards negative plane dist = -(p->r.x - xbounds[irx]) / p->u.x; if (dist < *ustep) { *ustep = dist; if (irx != 0) { *irnew = p->ir - 1; } else { *irnew = 0; // leaving geometry } } } /* Check in y-direction. */ if (p->u.y > 0.0f) { // going towards positive plane dist = (ybounds[iry + 1] - p->r.y) / p->u.y; if (dist < *ustep) { *ustep = dist; if (iry != (ngrid[0].y - 1)) { *irnew = p->ir + imax; } else { *irnew = 0; // leaving geometry } } } else if (p->u.y < 0.0f) { // going towards negative plane dist = -(p->r.y - ybounds[iry]) / p->u.y; if (dist < *ustep) { *ustep = dist; if (iry != 0) { *irnew = p->ir - imax; } else { *irnew = 0; // leaving geometry } } } } return; }
For testing purposes, if I delete everything inside (empty function) and/or remove the call to the function in the kernel the compilation process finishes without problems. Aditionally, if I target NVIDIA or AMD platforms the code compiles and executes without issues, and even in macOS using the Apple OpenCL framework (with Intel CPU and GPU) the code also compiles and executes without problems. I attached a sample code that can be executed and/or compiled using CodeBuilder.
Unfortunally I have no clue of what is going on. The function is not the most complex that I have seen and really I am not able to see the problem, and I have no output during the compilation process that could give me a clue of what is happening. Thanks for your help!
Does OpenCL context go to "idling stage" if it has nothing to do
Hi all, I have a project that uses OpenCL for computation. Below behavior is quite strange to me, any help is appreciated!
I can't post my code in detail here, but the pseudo-code is:
// STEP 1: Uploading input from CPU to GPU (using clEnqueueWriteBuffer) // STEP 2: Running several kernels for computation // STEP 3: Do some CPU code (probably 100ms or more) // STEP 4: Uploading another input from CPU to GPU (using clEnqueueWriteBuffer)
The input size (in bytes) in STEP 1 is the same as that in STEP 4. It took ~0.5ms to transfer data in step 1, while ~10ms to transfer data in STEP 4. I also called sync (clFinish) before and after each step. Any ideas why this could happen? I suspect that Intel driver put my OpenCL context/queue to "idle-stage" and it needs a little time to "wake" things up.
P.s: the performance of step 1 and step 4 are the same in NVIDIA & AMD devices.
pointer aritmitic issue in new graphics driver
Hi
I have just tracked down a bug in a opencl kernel i have written. The code had been working fine until one of the users got a graphics driver update (versione 20.19.15.4835).
The code had worked for about 1 year on a wide assortment of CPU's and integrated and dedicated GPU's, both when compiled with x64 and x86. The old code still works on the CPU when compiled with either x64 or x86, and on the integrated gpu when compiled with x86. But when run on integrated graphics cards, with the newest driver, in x64 mode, it failes.
i have been able to track it down to this line of code:
float x1 = (xCoords + turbines * windDirIndex)[rel.downstream];
Seemingly randomly, this line would return 0 instead of the content in xCoords. Changing the code to the following fixes the bug.
float x1 = xCoords[rel.downstream + turbines * windDirIndex];
The variable types are as follows:
xCoords: global float*
turbines: ushort
windDirIndex: ushort
Can anybody explain why the two lines have different behaviour in this very specific case?
ubuntu opencl 6.1.2 not supported for ubuntu?
Trying to get a CPU only version of OpenCL working for Ubuntu just so I can play around with the language. Using an AWS Ubuntu 16.04.5 instance. Per https://software.intel.com/en-us/articles/opencl-drivers#cpu-lin-u I downloaded http://registrationcenter-download.intel.com/akdlm/irc_nas/12556/opencl_... but when I install it I get the error pop-up shown in the attached image file (sorry I don't know how to inline it ;^(). Basically it says it doesn't support my OS, only RedHat and CentOS are supported.
system info:
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.5 LTS
Release: 16.04
Codename: xenial
$ uname -a
Linux ip-172-31-41-172 4.4.0-1065-aws #75-Ubuntu SMP Fri Aug 10 11:14:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
I'm also curious in that it says I need to get lsb-core >= 4.0 but it looks like I have 9.something (which I think should be >= 4 ??)
$ apt list lsb-core
Listing... Done
lsb-core/xenial-updates 9.20160110ubuntu0.2 amd64
I'm new to this OpenCL stuff, so any help/suggestions would be appreciated.
OpenCL implementation details
Hello.
I've been doing a lot of experiments with OpenCL in the last two months or so.
More specifically, I've been using the NOpenCL library ( created by Tunnel Vision Labs ) to perform OpenCL tasks in C# applications, on a low-end portable ( Intel i7-4510U CPU / Intel HD Graphics 4400 + AMD Radeon R7 M260 ).
Being an application developer, most of my work won't fit a SIMD model. However, the performance gains of using the 4400 GPU instead of the CPU ( even when using kernels with several branching points ) are so significant that the issue becomes irrelevant.
Unfortunately, all the OpencCL-related documentation I've read so far is quite elusive about the relationship of its concepts with the specific hardware, and I couldn't find a single piece of information about how Intel chose to implement OpenCL in its GPU line(s). As such, I must say I'm completely blind when I'm preparing the command queues. Are workgroups in some way related with the 4400 20 pipelines ? How do compute units fit in the picture ? By establishing a local work size of 1, am I in some way forcing the use of a single thread inside a compute unit ?
I'd say in SIMD problems this type of questions is probably useless, as long as one follows some general rule about the division of the task size. In any other case, it would perhaps be important to be aware of the penalties involved and the best strategies to minimize them. And to do that, it would be important to understand some OpenCL implementation details on specific chips or architectures.
So, if someone could share one or more links to relevant documentation on these issues, I'd be very grateful.
Thanks,
Helder Vieira
Does Intel OpenCL on CPU require consecutive memory accesses of neighboring threads for vectorization?
Hello everyone,
does Intel OpenCL on CPU require consecutive memory accesses of neighboring threads (=in same work group) for vectorization?
I have an hashing-based OpenCL kernel that has mandatory non-consecutive memory accesses (the threads use a calculated hash-value as an memory index, the hashing makes it unpredictable). So far, I'm always getting reported a
"Kernel <kernel_name> was not vectorized"
in the OpenCL build log. I suspect that this is due adjacent threads not accessing consecutive memory addresses. Is that correct? Or can I motivate the Intel OpenCL platform to generate gather/scatter (or intermittent scalar loops) instructions?
A clarification on whether the Intel OpenCL platform can handle this kind of memory access pattern in general would be greatly appreciated.
Bizarre constant memory access for structure in Intel GPU
Hi all,
I have a strange behavior when trying to access a memory location in __constant memory space represented by an array of structs.
I separated the case as minimal C++/host and OpenCL/kernel codes and attached them with the post.
However, let me give you some insights:
I have an OpenCL kernel with the following struct:
typedef struct __attribute__((packed)) buffer_1_struct { uint s2d1; uint s2d2; ulong s2d3; char s2d4[2]; char s2d5[2]; ulong s2d6; ulong s2d7; } struct2_t;
From the host side, I create an array of this structure where each element is 36 bytes (packed) and pass it as buffer to the kernel. In the attached files, I create the array with two elements.
When i read the second array-element and try to access the struct-element (s2d3) at index (8) on the GPU, i get zero value. This how i access it usually:
((__constant struct2_t*)buffer02)[get_global_offset(2)].s2d3.
Where the problem is observed when get_global_offset(2) = 1.
However, when i access it by byte-based memory indicies, i manage to retrieve the data correctly in the GPU. Here how i access it:
*((__constant ulong*)(((__constant char*)buffer02)+36+8))
Surprisingly, both ways point to the same address and i cast them using the same address-pointer-type but when i view the values they are different.
Here is what happens as an OpenCL code snippet:
#define STRUCT_2_SIZE (sizeof(struct2_t)) #define STRUCT_2_s2d3_idx (2*sizeof(uint)) ...... printf("z-offset=%d\n",get_global_offset(2)); printf("struct-2-size=%d\n",STRUCT_2_SIZE); __constant ulong* adr1 = ((__constant ulong*)(((__constant char*)buffer02)+STRUCT_2_SIZE+STRUCT_2_s2d3_idx)); __constant ulong* adr2 = &((__constant struct2_t*)buffer02)[get_global_offset(2)].s2d3; printf("adr1=%d\n",adr1); printf("adr2=%d\n",adr2); if(adr1 == adr2) printf("The two addresses are equal !\n"); else printf("The two addresses are diffierent !\n"); printf("val1=%d\n",*adr1); printf("val2=%d\n",*adr2); if(*adr1 == *adr2) printf("The two values are equal !\n"); else printf("The two values are diffierent !\n");
The full code is attached and here is the output:
z-offset=1 struct-2-size=36 adr1=1770913836 adr2=1770913836 The two addresses are equal ! val1=5632 val2=0 The two values are different !
This happens with the following notes:
1- It happens only in GPU. if you check the attached code in CPU, it works fine.
2- This code is in a kernel function (func2) and the problem happens only when i call some other functions, with some sequence, before this. Check the attached code.
3- The attached code shows the minimal case. Removal of some code lines causes the problem to disappear.
4- I use SDK version 7.0.0.2511 running in Windows 10 and building with x64 OpenCL library. 5- My machine has an Intel Core i5 6200U CPU (with embedded Intel® HD Graphics 520 GPU).
I hope anyone from Intel can advise regarding this case or report it is a bug that will be resolved.
Remarks,
Can Intel OpenCL SDK Support ATOM N2800 or E3825
Can Intel OpenCL SDK Support ATOM N2800 or E3825, cpu only, using linux 32bit OS?
Thanks
How to use OpenCL GPU Kernel debugger for Windows?
Hi,
I want to get the GPU kernel debugger set up and running, but not finding much info on how to do this and what is/isn't supported.
Can you point to any articles, KBs or anything describing:
1. what version of Intel OpenCL SDK is required for GPU kernel debugging on Windows (Skylake chipset)?
2. description of how to setup proper system configuration for GPU kernel debugging, and supported configurations (stand-alone, 2 system,...?)
3. any additional info on using the GPU kernel debugging feature to debug OpenCL kernels - tutorial, tech note, video??
Thanks, Colin
OpenCL SDK installation is the worst.
<RANT>
I've been trying to download and get a new system up and running with opencl for CPU and GPU on ubuntu 16.04. I finally got some drivers running, or at least detected with clinfo, however the GPU required superuser privelages. Now I am trying to install the SDK, as I hope that is where useful CL CPP header files are.
except when I download the .gz file involved. It comes with a binary file that is not executable? WTF is this? When I look for installation instructions I get some sales pitch about all these great features but no usable recipe for installing the SDK!!!
As you may tell, I'm a little frustrated as this is a garbage experience. there was another company that provided similar experiences for their products, I think it was called Microsoft. I dont use them anymore.
</RANT>
why does clinfo require superuser priveleges to display info about gpu devices.?
where is a relevant and working recipie for installing the SDK (or whatever provides the cl header files for c and cpp)?
why is it so hard for a company to communicate, clearly and succinctly, basic information for using their hardware.
***Runtime error: reached an uninitialized image function***
Installed intel_sdk_for_opencl_2017_7.0.0.2568_x64/ for testing my kernels with CPU (and 2.1) and I got the error in the topic.
I must be doing something wrong but I can't figure it out, before reaching the kernel in the backtrace I run other more complex kernels which uses read_imagef(...).
The code works fine with intel-gpu (1.2) driver and Nvidia-gpu (1.2?) drivers.
Any ideas?
Trying to make debug info (-g -s <SRC>) available, causes seq-faults in earlier kernels (probably me failing to use gdb in the correct way), I don't get any debug symbols in that crash either.
The complete file is here https://github.com/dgud/wings/blob/master/shaders/img_lib.cl
__kernel void rgba_to_normal(__read_only image2d_t inImg, const int w, const int h, __global float4 *outImg)
{
const sampler_t sampler=CLK_NORMALIZED_COORDS_FALSE|CLK_ADDRESS_REPEAT;
int x = (int) get_global_id(0);
int y = (int) get_global_id(1);
if(x < w && y < h) {
float4 cx0 = read_imagef(inImg, sampler, (float2)(x, y));
outImg[y*w+x] = cx0*2.0f-1.0f;
}
}
(gdb) bt
#0 0x00007f77761ba6e1 in trap_function ()
#1 0x00007f7824e5dff7 in rgba_to_normal ()
#2 0x00007f7756b35dd9 in ?? () from /opt/intel/opencl/exp-runtime-2.1/lib64/libOclCpuBackEnd.so
#3 0x00007f775f125559 in ?? () from /opt/intel/opencl/exp-runtime-2.1/lib64/libcpu_device_2_1.so
#4 0x00007f775fa60c8a in ?? () from /opt/intel/opencl/exp-runtime-2.1/lib64/libtask_executor_2_1.so
#5 0x00007f775fa62280 in ?? () from /opt/intel/opencl/exp-runtime-2.1/lib64/libtask_executor_2_1.so
#6 0x00007f775fa62589 in ?? () from /opt/intel/opencl/exp-runtime-2.1/lib64/libtask_executor_2_1.so
#7 0x00007f775f5eb0c5 in tbb::internal::custom_scheduler<tbb::internal::IntelSchedulerTraits>::local_wait_for_all (this=0x7f775d3b3e80, parent=..., child=0x0) at ../../src/tbb/custom_scheduler.h:474
#8 0x00007f775f5e68f2 in tbb::internal::arena::process (this=0x7f7755577510, s=...) at ../../src/tbb/arena.cpp:96
#9 0x00007f775f5e4c48 in tbb::internal::market::process (this=0x7f7755577510, j=...) at ../../src/tbb/market.cpp:495
#10 0x00007f775f5e0949 in tbb::internal::rml::private_server::remove_server_ref (this=<optimized out>, $`6=<optimized out>) at ../../src/tbb/private_server.cpp:275
#11 tbb::internal::rml::private_server::request_close_connection (this=0x7f7755577510) at ../../src/tbb/private_server.cpp:192
#12 0x00007f775f5e08d6 in tbb::internal::rml::private_worker::thread_routine (arg=0x7f7755577510) at ../../src/tbb/private_server.cpp:228
#13 0x00007f7869fa66ba in start_thread (arg=0x7f7755578700) at pthread_create.c:333
#14 0x00007f7869ad441d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109