Just Released! Intel® SDK for OpenCL™ Applications 2017

August 10, 2017, 6:33 pm

Latest and popular articles on Intel Technologies

≫ Next: Opencl calls on Skylake leaks handles (Showstopper)

≪ Previous: Code Builder Server crash -- Intel® SDK for OpenCL™ Applications 2017

The new 2017 release is here! This update adds support for additional operating systems and platforms, and compatible integration into more recent IDEs for flexibility and to stay up to date. It also provides new tool features that help you speed your development and improve performance for create high-performance image and video processing pipelines.

For more information, please see the What's New blog.

↧

Opencl calls on Skylake leaks handles (Showstopper)

August 14, 2017, 2:19 pm

Latest and popular articles on Intel Technologies

≫ Next: Do we still need to install the 16.1.1 CPU Runtime with OpenCL SDK 2017?

≪ Previous: Just Released! Intel® SDK for OpenCL™ Applications 2017

We have a commercial product that runs on OpenGL framework, it uses OpenGL-OpenCL Interop to do format conversion. We started doing this on Haswell CPUs and everything worked fine, then we ported this to Skylake with Intel HD 530, our products did not show any issue, but recently we noticed there is a serious issue with newer version of display drivers only on Skylake CPUs.

The issue is that using clEnqueueAcquireGLObjects and clEnqueueReleaseGLObjects never releases handles. With every call, the count of handles increases, and after 2 hours there will be about 500K handles taken and Windows starts to slow down until it freezes.

To confirm this issue I created a simple app with an OpenGL texture and setup OpenCL device and put the aforementioned methods in a forever loop, without doing anything else in OpenGL and OpenCL number of handles increased rapidly.

It is worth mentioning that OpenCL documents says we have to call these methods before and after every time we use an OpenGL object. It seems with the Intel implementation we can avoid calling these methods every time and we can acquire the handle only once at the beginning and release it after we are done with the object - doing so would fix our problem since it wouldn't increase the handles with every call. Unfortunately this works on Haswell but not on Skylake, this issue might have the same cause.

Our products are deployed on Windows 10 but to gather more information on this issue, I also tested it on Windows 7 and witnessed the same result.

The same code still works fine on older CPUs with latest drivers and on Skylake with driver version 15407.4279 , but newer version of the drivers on Skylake all have this issue.

This is a show stopper for our product because we already have many customers that are using it and we are worried that they encounter this issue and also we are concerned about releasing our new version with this problem.

↧

Do we still need to install the 16.1.1 CPU Runtime with OpenCL SDK 2017?

August 16, 2017, 9:19 am

Latest and popular articles on Intel Technologies

≫ Next: Problems with Eclipse Code-Builder Plugin

≪ Previous: Opencl calls on Skylake leaks handles (Showstopper)

Just wondering if the 16.1.1 runtime is still recommended or necessary with the OpenCL SDK 2017?

↧

Problems with Eclipse Code-Builder Plugin

August 17, 2017, 8:03 am

Latest and popular articles on Intel Technologies

≫ Next: Optimizer hangs on non-terminating while loop

≪ Previous: Do we still need to install the 16.1.1 CPU Runtime with OpenCL SDK 2017?

Hi everyone,

I am using the Code-Builder plugin for Eclipse for the first time. I use Eclipse CDT Oxygen for Windows and tried to add CodeBuilder_6.4.0. But after launching Eclipse I get mutliple errors: "Cannot get machine list: Failed to initialize code builder API" and "Could not create the view: org.eclipse.linuxtools.callgraph.callgraphview". I can still load a session but I cannot select any device (CPU/GPU).

So I switched to CodeBuilder_6.3.0 which I still had from a previous installation on Ubuntu on a different machine which did not give me any error messages. I tried to set everything up following your instructions (https://software.intel.com/en-us/code-builder-user-manual-configuring-bu...). Unfortunately I fail when changing the session options. I would like to add a few build options (e.g. -cl-mad-enable) but when I hit "OK" to close the dialog nothing happens. Even if I don't change anything inside the options at all I cannot confirm them. Fortunately I can change machine, platform and device in the menu bar directly, but for the build options I need the session options.

I am using a Dell Desktop PC with an Intel Core i7-7700 (which I want to use with OpenCL) with Windows 10.

If there is no way to fix the session options, can I enter them manually into the cbsession file? It looks like it but the TabOptX fields are a bit cryptic and I don't know what belongs where or if I have to stick to a specific syntax, e.g. when adding multiple build options.

Thanks a lot in advance!

Kind regards,

Alex

↧

Optimizer hangs on non-terminating while loop

August 21, 2017, 1:28 pm

Latest and popular articles on Intel Technologies

≫ Next: Support for 7th Generation CPUs?

≪ Previous: Problems with Eclipse Code-Builder Plugin

Compilation of the following kernel loops indefinitely:

kernel void A() {
  int a = get_global_id(0);
  while (a < 512) {}
}

Passing argument -cl-opt-disable to clBuildProgram() prevents this.

Reproduced on:

/opt/intel/opencl-1.2-6.4.0.25/lib64/libintelocl.so
/opt/intel/opencl-1.2-4.4.0.117/lib64/libintelocl.so

↧

Support for 7th Generation CPUs?

August 25, 2017, 8:53 am

Latest and popular articles on Intel Technologies

≫ Next: Problem Using OpenCL in Ubuntu with Detailed Report

≪ Previous: Optimizer hangs on non-terminating while loop

When will there be support for 7th generation CPUs with the Intel OpenCL SDK/runtime? Code I have been running on an i7-6950X and on a i7-4800MQ is now unusable on an i9-7900X.

This is clearly an Intel issue, because the AMD APP SDK (3.0) is able to run the code on both processors as well as on GPU. I have tried various combinations of SDKs and runtimes (16.1.1 and 14.2). The platform/device is listed, but attempting to create a context fails.

Yes, I read the release notes for the 16.1.1 runtime that that says that 6th generation CPUs are supported and no mention of 7th generation. However, it is somewhat amusing that AMD supports current Intel processors and Intel doesn't.

↧

Problem Using OpenCL in Ubuntu with Detailed Report

August 27, 2017, 5:35 am

Latest and popular articles on Intel Technologies

≫ Next: opencl memory leaking on Intel HD530 graphics with driver 4664/4678

≪ Previous: Support for 7th Generation CPUs?

Hi,Everybody

1--From the scripti "python sys_analyzer_linux.py -v " I got the output

--------------------------
Hardware readiness checks:
--------------------------
 [ OK ] Processor name: Intel(R) Core(TM) i7-5675C CPU @ 3.10GHz
 [ INFO ] Intel Processor
 [ INFO ] Processor brand: Core
 [ INFO ] Processor arch: Broadwell
--------------------------
OS readiness checks:
--------------------------
 [ INFO ] GPU PCI id     : 1622
 [ INFO ] GPU description: BDW GT3 ULT
 [ OK ] GPU visible to OS
 [ INFO ] no nomodeset in GRUB cmdline (good)
 [ INFO ] Linux distro   : Ubuntu 16.04
 [ INFO ] Linux kernel   : 4.4.0-92-generic
 [ INFO ] glibc version  : 2.23
 [ INFO ] Linux distro suitable for Generic install
 [ INFO ] gcc version    : 20160609 (>=4.8.2 suggested)
--------------------------
Media Server Studio Install:
--------------------------
 [ OK ] user in video group
 [ OK ] libva.so.1 found
 [ INFO ] Intel iHD used by libva
 [ ERROR ] vainfo not reporting codec entry points
 [ INFO ] i915 driver in use by Intel video adapter
 [ ERROR ] could not open /dev/dri/renderD128
--------------------------
Component Smoke Tests:
--------------------------
 [ OK ] Media SDK HW API level:1.0
 [ OK ] Media SDK SW API level:1.19
 [ OK ] OpenCL check:platform:Intel(R) OpenCL GPU OK CPU OK
platform:Experimental OpenCL 2.1 CPU Only Platform GPU OK CPU OK

2--And ,from the scripti "sudo python sys_analyzer_linux.py -v " I got the output

--------------------------
Hardware readiness checks:
--------------------------
 [ OK ] Processor name: Intel(R) Core(TM) i7-5675C CPU @ 3.10GHz
 [ INFO ] Intel Processor
 [ INFO ] Processor brand: Core
 [ INFO ] Processor arch: Broadwell
--------------------------
OS readiness checks:
--------------------------
 [ INFO ] GPU PCI id     : 1622
 [ INFO ] GPU description: BDW GT3 ULT
 [ OK ] GPU visible to OS
 [ INFO ] no nomodeset in GRUB cmdline (good)
 [ INFO ] Linux distro   : Ubuntu 16.04
 [ INFO ] Linux kernel   : 4.4.0-92-generic
 [ INFO ] glibc version  : 2.23
 [ INFO ] Linux distro suitable for Generic install
 [ INFO ] gcc version    : 20160609 (>=4.8.2 suggested)
--------------------------
Media Server Studio Install:
--------------------------
 [ OK ] user in video group
 [ OK ] libva.so.1 found
 [ ERROR ] libva not loading Intel iHD
 [ OK ] vainfo reports valid codec entry points
 [ INFO ] i915 driver in use by Intel video adapter
 [ OK ] /dev/dri/renderD128 connects to Intel i915
--------------------------
Component Smoke Tests:
--------------------------
 [ OK ] Media SDK HW API level:1.0
 [ OK ] Media SDK SW API level:1.19
 [ OK ] OpenCL check:platform:Intel(R) OpenCL GPU OK CPU OK
platform:Experimental OpenCL 2.1 CPU Only Platform GPU OK CPU OK

3--sam@sam-dev:~/Downloads$ which ioc64
/usr/bin/ioc64

4--When I use the commond "ioc64 -input=22.cl -device=gpu",the log as that.
No command specified, using 'build' as default
OpenCL Intel(R) Graphics device was found!
Device name: Intel(R) HD Graphics
Device version: OpenCL 1.2
Device vendor: Intel(R) Corporation
Device profile: FULL_PROFILE
fcl build 1 succeeded.
fcl build 2 succeeded.
bcl build succeeded.

simpleAdd info:
	Maximum work-group size: 256
	Compiler work-group size: (0, 0, 0)
	Local memory size: 0
	Preferred multiple of work-group size: 32
	Minimum amount of private memory: 0
	Amount of spill memory used by the kernel: 0

Build succeeded!

5--And , I use the eclipse Neon ,it report "please make sure to set the correct path under the Code Builder for OpenCl preference page"

6--sam@sam-dev:~/Downloads$ uname -r
4.4.0-92-generic

7--sam@sam-dev:/etc/alternatives/opencl-intel-tools$ ls
bin doc eclipse-plug-in include lib64 version.txt

I really need slove the problem in eclipse.If i miss some important thing ,please let me know .Thanks.

At last , I'm not sure about my LD_LIBRARY_PATH is good , when i use the echo commond to print it.It shows nothing.

sam@sam-dev:/$ echo $LD_LIBRARY_PATH

Uh....i don't know how to set the environment of LD_LIBRARY_PATH.when i use set it in /etc/environment as "LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/lib/x86_64-linux-gnu:/usr/lib64:/usr/local/lib"" .The sys_analyzer_linux will report can't find libva.so and libva.so.1.

Best regards

Sam Lu

↧

opencl memory leaking on Intel HD530 graphics with driver 4664/4678

August 29, 2017, 12:24 am

Latest and popular articles on Intel Technologies

≫ Next: OpenCL SDK Installation failed on Windows 10

≪ Previous: Problem Using OpenCL in Ubuntu with Detailed Report

Hi,

Please help me on this opencl issue:

I have a kernel program file that runs fine on AMD and NVidia graphics cards, as well as on other Intel GPUs,

such as intel HD4600. But when trying to run it on Intel HD530 graphics card, I catched a memory leaking on

it.

The memory leaking points I located is in calling the OpenCL API function: clSetKernelArg().
It occurs only in Intel HD530 with driver version 4664 and 4678, but does not occur with driver version 4352.
It occurs with only when calling clSetKernelArg() to set __local buffers of the kernel, but does not occur

when setting __global and __private buffers.

The memory leaking speed I tested is 32 Bytes when calling clSetKernelArg() to set __local buffers each time,

which leaks only in CPU host memory but not in GPU device memory regardless of how much the size of __local memory was set.

Here is my testing enviroment:
Hardware:
CPU: i5-6500
GPU: Intel HD Graphics 530

Software:
OS: Win7 64bit (inner version: 7601)
GPU Driver version: 15.45.19.4678/15.45.18.4664(found mem leak on it) / 15.40.14.4352(did not find mem leak

on it)

Here is my testing code (part of):

kernel codes:
__kernel void TestNull(
    __local uchar *localMemuchar,
    __local int *localMemint,
    __local float *localMemdouble)
{}

Host codes:
// ...init opencl context, program and kernels
size_t szW = 64;
size_t szH = 64;
for (int i = 0; i < 1000000; i++)
{
    clStatus = clSetKernelArg(kernelTestNull, 0, sizeof(cl_uchar) *szW * szH, NULL);
    clStatus = clSetKernelArg(kernelTestNull, 1, sizeof(cl_int) *szW * szH, NULL);
    clStatus = clSetKernelArg(kernelTestNull, 2, sizeof(cl_float) *szW * szH, NULL);
}
// ...release opencl context, program and kernels

Any insight into this issue is welcome and is greatly appreciated.

Thanks,

↧

OpenCL SDK Installation failed on Windows 10

August 29, 2017, 8:42 am

Latest and popular articles on Intel Technologies

≫ Next: I can not use tdm-gcc to compile opencl code in codeblocks, what's wrong?

≪ Previous: opencl memory leaking on Intel HD530 graphics with driver 4664/4678

OpenCL SDK installation is failing on Win10 due to pre-requisite of Universal C Runtime in Windows* is not installed.
I am using Win10 and there is no download support for the package from Microsoft.

I tried following version but both failed with above error

intel_sdk_for_opencl_2017_7.0.0.2511

intel_sdk_for_opencl_setup_6.3.0.1904

↧

I can not use tdm-gcc to compile opencl code in codeblocks, what's wrong?

September 2, 2017, 4:43 pm

Latest and popular articles on Intel Technologies

≫ Next: troubleshooting sys_analyzer_linux output

≪ Previous: OpenCL SDK Installation failed on Windows 10

Hi, all,

I met a problem when I use tdm-gcc to compile my code in codeblocks, it doesn't work, but the code runs correctly when I use the default complier MinGW-gcc. I don't know why and this problem bugs me three days. The example is simple:

#include <CL/cl.h>
using namespace std;

int main(int argc, char* argv[])

{

cl_uint numPlatforms;//the NO. of platforms
cl_int status = clGetPlatformIDs(0, NULL, &numPlatforms);
return 0;
}

when I use the default comipler it works without error. but when I change to tdm-gcc things go wrong as:

-------------- Build: Debug in openCL_example (compiler: GNU GCC Compiler)---------------

g++.exe -Wall -fexceptions -g -IC:\Intel\OpenCL\sdk\lib\x86 -IC:\Intel\OpenCL\sdk\include -c "D:\Program Files\codeblocks\projects\openCL_example\HelloWorld.cpp" -o obj\Debug\HelloWorld.o

g++.exe -Wall -fexceptions -g -IC:\Intel\OpenCL\sdk\lib\x86 -IC:\Intel\OpenCL\sdk\include -c "D:\Program Files\codeblocks\projects\openCL_example\tool.cpp" -o obj\Debug\tool.o

g++.exe -o bin\Debug\openCL_example.exe obj\Debug\HelloWorld.o obj\Debug\tool.o C:\Intel\OpenCL\sdk\lib\x86\OpenCL.lib
obj\Debug\HelloWorld.o

: In function `main':
D:/Program Files/codeblocks/projects/openCL_example/HelloWorld.cpp:76: undefined reference to `clGetPlatformIDs'
obj\Debug\tool.o: In function `getPlatform(_cl_platform_id*&)':

.....

please help !! thanks a lot .

↧

troubleshooting sys_analyzer_linux output

September 5, 2017, 2:38 pm

Latest and popular articles on Intel Technologies

≫ Next: is it possible to set breakpoint in kernel code for GPU

≪ Previous: I can not use tdm-gcc to compile opencl code in codeblocks, what's wrong?

I've installed drivers, and I should be good to go, right?

But I run sys_analyzer_linux.py and my bottom line has FAIL in it:

[ OK ] OpenCL check:platform:Intel(R) OpenCL GPU FAIL CPU OK

All examples I see have GPU OK CPU OK

Where do I start to fix this?

Thanks

↧

is it possible to set breakpoint in kernel code for GPU

September 7, 2017, 1:30 pm

Latest and popular articles on Intel Technologies

≫ Next: Compiling kernel on HD 505 (Apollo Lake)

≪ Previous: troubleshooting sys_analyzer_linux output

For debugging opencl kernel code for GPU, is it possible to set breakpoint in .cl file?

Thanks,

Jeffrey

↧

Compiling kernel on HD 505 (Apollo Lake)

September 8, 2017, 8:13 pm

Latest and popular articles on Intel Technologies

≫ Next: I used beignet on archlinux but met some errors.

≪ Previous: is it possible to set breakpoint in kernel code for GPU

How should I interpret this error?

I'm compiling an OpenCL 1.2 kernel from source. It compiles fine on a HD 530/x64/Skylake device.

But I'm seeing these errors when compiling the same kernel source on an HD 505 / N4200 / x64 executable.

warning: Linking two modules of different data layouts: '' is 'e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024' whereas '<origin>' is 'e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024-n8:16:32:64'

warning: Linking two modules of different target triples: ' is 'spir' whereas '<origin>' is 'vISA_32'

warning: Linking two modules of different data layouts: '' is 'e-p:32:32-i64:64-v16:16-v24:32-v32:32-v48:64-v96:128-v192:256-v256:256-v512:512-v1024:1024' whereas '<origin>' is 'e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024-n8:16:32:64'

warning: Linking two modules of different target triples: ' is 'spir' whereas '<origin>' is 'vISA_32'

↧

I used beignet on archlinux but met some errors.

September 11, 2017, 9:39 pm

Latest and popular articles on Intel Technologies

≫ Next: Just Released Intel OpenCL Applications 2017 CPU-only runtime crashes

≪ Previous: Compiling kernel on HD 505 (Apollo Lake)

[ailick@Ailick_Mj opencl]$ optirun clinfo
drm_intel_gem_bo_context_exec() failed: No space left on device
Number of platforms                               3
  Platform Name                                   Intel Gen OCL Driver
  Platform Vendor                                 Intel
  Platform Version                                OpenCL 2.0 beignet 1.3
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_khr_gl_sharing
  Platform Extensions function suffix             Intel

  Platform Name                                   NVIDIA CUDA
  Platform Vendor                                 NVIDIA Corporation
  Platform Version                                OpenCL 1.2 CUDA 8.0.0
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer
  Platform Extensions function suffix             NV

  Platform Name                                   Experimental OpenCL 2.1 CPU Only Platform
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 2.1 LINUX
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             INTEL

  Platform Name                                   Intel Gen OCL Driver
Number of devices                                 1
  Device Name                                     Intel(R) HD Graphics Haswell GT2 Mobile
  Device Vendor                                   Intel
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 1.2 beignet 1.3
  Driver Version                                  1.3
  Device OpenCL C Version                         OpenCL C 1.2 beignet 1.3
  Device Type                                     GPU
  Device Available                                Yes
  Device Profile                                  FULL_PROFILE
  Max compute units                               20
  Max clock frequency                             1000MHz
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None, None, None
  Max work item dimensions                        3
  Max work item sizes                             512x512x512
  Max work group size                             512
  Compiler Available                              Yes
  Linker Available                                Yes
  Preferred work group size multiple              16
  Preferred / native vector sizes
    char                                                16 / 8
    short                                                8 / 8
    int                                                  4 / 4
    long                                                 2 / 2
    half                                                 0 / 8        (n/a)
    float                                                4 / 4
    double                                               0 / 2        (n/a)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (n/a)
  Address bits                                    32, Little-Endian
  Global memory size                              2147483648 (2GiB)
  Error Correction support                        No
  Max memory allocation                           1610612736 (1.5GiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        8192 (8KiB)
  Global Memory cache line                        64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            65536 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   4096 bytes
    Pitch alignment for 2D image buffers          1 bytes
    Max 2D image size                             8192x8192 pixels
    Max 3D image size                             8192x8192x2048 pixels
    Max number of read image args                 128
    Max number of write image args                8
  Local memory type                               Local
  Local memory size                               65536 (64KiB)
  Max constant buffer size                        134217728 (128MiB)
  Max number of constant args                     8
  Max size of kernel argument                     1024
  Queue properties
    Out-of-order execution                        No
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      80ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                __cl_copy_region_align4;__cl_copy_region_align16;__cl_cpy_region_unalign_same_offset;__cl_copy_region_unalign_dst_offset;__cl_copy_region_unalign_src_offset;__cl_copy_buffer_rect;__cl_copy_image_1d_to_1d;__cl_copy_image_2d_to_2d;__cl_copy_image_3d_to_2d;__cl_copy_image_2d_to_3d;__cl_copy_image_3d_to_3d;__cl_copy_image_2d_to_buffer;__cl_copy_image_3d_to_buffer;__cl_copy_buffer_to_image_2d;__cl_copy_buffer_to_image_3d;__cl_fill_region_unalign;__cl_fill_region_align2;__cl_fill_region_align4;__cl_fill_region_align8_2;__cl_fill_region_align8_4;__cl_fill_region_align8_8;__cl_fill_region_align8_16;__cl_fill_region_align128;__cl_fill_image_1d;__cl_fill_image_1d_array;__cl_fill_image_2d;__cl_fill_image_2d_array;__cl_fill_image_3d;
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_khr_depth_images cl_khr_icd cl_intel_accelerator cl_intel_subgroups cl_intel_subgroups_short cl_khr_gl_sharing

  Platform Name                                   NVIDIA CUDA
Number of devices                                 1
  Device Name                                     GeForce GTX 760M
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 1.2 CUDA
  Driver Version                                  375.82
  Device OpenCL C Version                         OpenCL C 1.2
  Device Type                                     GPU
  Device Available                                Yes
  Device Profile                                  FULL_PROFILE
  Device Topology (NV)                            PCI-E, 01:00.0
  Max compute units                               4
  Max clock frequency                             719MHz
  Compute Capability (NV)                         3.0
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Compiler Available                              Yes
  Linker Available                                Yes
  Preferred work group size multiple              32
  Warp size (NV)                                  32
  Preferred / native vector sizes
    char                                                 1 / 1
    short                                                1 / 1
    int                                                  1 / 1
    long                                                 1 / 1
    half                                                 0 / 0        (n/a)
    float                                                1 / 1
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              2098724864 (1.955GiB)
  Error Correction support                        No
  Max memory allocation                           524681216 (500.4MiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        65536 (64KiB)
  Global Memory cache line                        128 bytes
  Image support                                   Yes
    Max number of samplers per kernel             32
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             4096x4096x4096 pixels
    Max number of read image args                 256
    Max number of write image args                16
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max constant buffer size                        65536 (64KiB)
  Max number of constant args                     9
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Kernel execution timeout (NV)                 Yes
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  1
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer

  Platform Name                                   Experimental OpenCL 2.1 CPU Only Platform
Number of devices                                 1
  Device Name                                     Intel(R) Core(TM) i7-4710MQ CPU @ 2.50GHz
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 2.1 (Build 10)
  Driver Version                                  1.2.0.10
  Device OpenCL C Version                         OpenCL C 2.0
  Device Type                                     CPU
  Device Available                                Yes
  Device Profile                                  FULL_PROFILE
  Max compute units                               8
  Max clock frequency                             2500MHz
  Device Partition                                (core)
    Max number of sub-devices                     8
    Supported partition types                     by counts, equally, by names (Intel)
  Max work item dimensions                        3
  Max work item sizes                             8192x8192x8192
  Max work group size                             8192
  Compiler Available                              Yes
  Linker Available                                Yes
  Preferred work group size multiple              128
  Max sub-groups per work group                   1
  Preferred / native vector sizes
    char                                                 1 / 32
    short                                                1 / 16
    int                                                  1 / 8
    long                                                 1 / 4
    half                                                 0 / 0        (n/a)
    float                                                1 / 8
    double                                               1 / 4        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              12521263104 (11.66GiB)
  Error Correction support                        No
  Max memory allocation                           3130315776 (2.915GiB)
  Unified memory for Host and Device              Yes
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   Yes
    Fine-grained system sharing                   Yes
    Atomics                                       Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Preferred alignment for atomics
    SVM                                           64 bytes
    Global                                        64 bytes
    Local                                         0 bytes
  Max size for global variable                    65536 (64KiB)
  Preferred total size of global vars             65536 (64KiB)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        262144 (256KiB)
  Global Memory cache line                        64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             480
    Max size for 1D images from buffer            195644736 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   64 bytes
    Pitch alignment for 2D image buffers          64 bytes
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 480
    Max number of write image args                480
    Max number of read/write image args           480
  Max number of pipe args                         16
  Max active pipe reservations                    32767
  Max pipe packet size                            1024
  Local memory type                               Global
  Local memory size                               32768 (32KiB)
  Max constant buffer size                        131072 (128KiB)
  Max number of constant args                     480
  Max size of kernel argument                     3840 (3.75KiB)
  Queue properties (on host)
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Local thread execution (Intel)                Yes
  Queue properties (on device)
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Preferred size                                4294967295 (4GiB)
    Max size                                      4294967295 (4GiB)
  Max queues on device                            4294967295
  Max events on device                            4294967295
  Prefer user sync for interop                    No
  Profiling timer resolution                      1ns
  Execution capabilities
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
    Sub-group independent forward progress        No
    IL version                                    SPIR-V_1.0
    SPIR versions                                 1.2
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels
  Device Extensions                               cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  Intel Gen OCL Driver
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   Success [Intel]
  clCreateContext(NULL, ...) [default]            Success [Intel]
  clCreateContext(NULL, ...) [other]              Success [NV]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  Success (1)
    Platform Name                                 Intel Gen OCL Driver
    Device Name                                   Intel(R) HD Graphics Haswell GT2 Mobile
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  Success (1)
    Platform Name                                 Intel Gen OCL Driver
    Device Name                                   Intel(R) HD Graphics Haswell GT2 Mobile

ICD loader properties
  ICD loader Name                                 OpenCL ICD Loader
  ICD loader Vendor                               OCL Icd free software
  ICD loader Version                              2.2.11
  ICD loader Profile                              OpenCL 2.1

[ailick@Ailick_Mj ~]$ sudo pacman -Qs opencl
local/beignet 1.3.1-3
    An open source OpenCL implementation for Intel IvyBridge+ iGPUs
local/clinfo 2.1.17.02.09-1
    A simple OpenCL application that enumerates all available platform and device properties
local/intel-opencl-sdk 2017_7.0.0.2511-1
    Intel SDK for OpenCL Applications Linux* 2017 And OpenCL runtime for Intel Core and Xeon processors
local/ocl-icd 2.2.11-1
    OpenCL ICD Bindings
local/opencl-headers 2:2.2.20170516-1
    OpenCL (Open Computing Language) header files
local/opencl-nvidia 1:375.82-1
    OpenCL implemention for NVIDIA

Why it always telling me that drm_intel_gem_bo_context_exec() failed: No space left on device ? If anyone could help me?

↧

Just Released Intel OpenCL Applications 2017 CPU-only runtime crashes

September 14, 2017, 6:26 am

Latest and popular articles on Intel Technologies

≫ Next: OpenCL OpenGL interop on Optimus IntelNVIDIA machine?

≪ Previous: I used beignet on archlinux but met some errors.

Hi,

I just installed the latest update of the OpenCL Applications 2017 SDK (the one released a few days ago) and the experimental OpenCL CPU-runtime is no longer working. When I run clinfo, I got a -11 error on clBuildProgram:

Platform Name: Experimental OpenCL 2.1 CPU Only Platform
Number of devices: 1
Device Type: CL_DEVICE_TYPE_CPU
Vendor ID: 8086h
Max compute units: 12
[...]
Preferred local atomic alignment: 0
5 errors generated.
ERROR: clBuildProgram(-11)

The previous Experimental runtime was working fine. I'm using Windows 10 Pro 64 bit, I've Visual Studio 2013 installed (I also installed the Intel OpenCL SDK plugins for VS) and also have the Intel OpenCL CPU-only 16.1.1 runtime (in addition to the AMD drivers with respective OpenCL runtimes).

↧

OpenCL OpenGL interop on Optimus IntelNVIDIA machine?

September 15, 2017, 8:43 am

Latest and popular articles on Intel Technologies

≫ Next: some troubles in progress of Opencl SDK installation

≪ Previous: Just Released Intel OpenCL Applications 2017 CPU-only runtime crashes

Quick question...

I have an Optimus-enabled laptop with a Skylake HD 530 integrated GPU and a Quadro M1000M discrete GPU. The laptop has both HDMI and Thunderbolt 3 outputs.

I also have an OpenCL<>GL interop application.

Optimus lets you choose which GPU your app should use for rendering but, if I understand correctly, the final surface scan-out is still handled by the HD 530.

All this seems to work fine except when I'm driving an external monitor via Thunderbolt. When connected via TBOLT, the NVIDIA GPU appears to have full control of the monitor.

So my question is, should I be able to run my OpenCL kernels on the HD 530 and render via OpenCL-toGL interop to the NVIDIA discrete GPU?

In this situation I don't care about performance so an implicit host-readback is OK.

Documentation on multi-GPU OpenCL interop seems sparse!

↧

some troubles in progress of Opencl SDK installation

September 15, 2017, 10:00 am

Latest and popular articles on Intel Technologies

≫ Next: Install OpenCL SDK @ ubuntu 16.04.4

≪ Previous: OpenCL OpenGL interop on Optimus IntelNVIDIA machine?

Hi,

my system is Ubuntu 16.04LTS, the device name is ivybidge.

I downloaded "intel_sdk_for_opencl_2017_7.0.0.2511_x64.tgz". After extracting, the folder includes several docments ,like pset folder, rpm folder, install.sh, install_GUI.sh and so on.

the question is how can i use these resources to install SDK successfully?

according to some helps , in command window , I inputed $(dirname $0)/install.sh --gui-mode $@ , but it showed an error =====bash: syntax error near unexpected token '/install.sh'

I really don't know what's the meaning................

BTW, the "getting started " doc shows following:

$ mv install_SDK_prereq_ubuntu.sh_.txt install_SDK_prereq_ubuntu.sh

$ sudo su

$ ./install_SDK_prereq_ubuntu.sh

but i can't find the file named " install_SDK_prereq_unbuntu.sh_.txt".

Thanks for any feedback!

↧

Install OpenCL SDK @ ubuntu 16.04.4

September 19, 2017, 8:47 am

Latest and popular articles on Intel Technologies

≫ Next: Internal Compiler Error (experimental 2.1 runtime)

≪ Previous: some troubles in progress of Opencl SDK installation

Dears,

just installed OpenCL SDK per instrutions https://software.intel.com/en-us/articles/sdk-for-opencl-gsg#comment-191...

root@ubuntu:/home/ubuntu/Downloads# clinfo

Number of platforms                               1
  Platform Name                                   Intel(R) OpenCL
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 2.0
  Platform Extensions function suffix             INTEL
  Platform Name                                   Intel(R) OpenCL
Number of devices                                 2
  Device Name                                     Intel(R) HD Graphics
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 2.0
  Driver Version                                  r5.0.63503
  Device OpenCL C Version                         OpenCL C 2.0
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE

and try to test the kernel if the SDK installed

root@ubuntu:/home/ubuntu/Downloads# ioc64 -version
ioc64: command not found

I did follow steps 1) and 2) the ioc64 is not present - is it install ok? how can I get ioc64?

Thanks.

P.S.

root@ubuntu:/home/ubuntu/Downloads# uname -a
Linux ubuntu 4.10.0-35-generic #39~16.04.1-Ubuntu SMP Wed Sep 13 09:02:42 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

root@ubuntu:/home/ubuntu/Downloads# lspci | grep VGA
00:02.0 VGA compatible controller: Intel Corporation Sky Lake Integrated Graphics (rev 07)

↧

Internal Compiler Error (experimental 2.1 runtime)

September 19, 2017, 12:46 pm

Latest and popular articles on Intel Technologies

≫ Next: Problem when eclipse integrate with intel opencl CodeBuilder?

≪ Previous: Install OpenCL SDK @ ubuntu 16.04.4

I'm trying to build an OpenCL kernel from a SPIR-V binary (using clCreateProgramFromIL). In the call to clBuildProgram I get the following compiler error:

**Internal compiler error** Cannot select: 0x33f62d0: v8i32 = X86ISD::VBROADCAST 0x33f56f0
  0x33f56f0: i64,ch = CopyFromReg 0x353d310, Register:i64 %vreg5
    0x33f5f40: i64 = Register %vreg5
In function: test_int2_copy
Please report the issue on Intel OpenCL forum
https://software.intel.com/en-us/forums/opencl for assistance.
 Stack dump:
0.	Running pass 'Function Pass Manager' on module 'main'.
1.	Running pass 'X86 DAG->DAG Instruction Selection' on function '@test_int2_copy'

My SPIR-V kernel is (disassembled, binary attached):

; SPIR-V
; Version: 1.0
; Generator: Khronos; 0
; Bound: 20
; Schema: 0
               OpCapability Addresses
               OpCapability Linkage
               OpCapability Kernel
               OpCapability Int64
         %19 = OpExtInstImport "OpenCL.std"
               OpMemoryModel Physical64 OpenCL
               OpEntryPoint Kernel %6 "test_int2_copy"
               OpDecorate %13 BuiltIn GlobalInvocationId
               OpDecorate %13 Constant
               OpDecorate %13 LinkageAttributes "__spirv_BuiltInGlobalInvocationId" Import
          %1 = OpTypeVoid
          %2 = OpTypeInt 32 0
          %3 = OpTypeVector %2 2
          %4 = OpTypePointer CrossWorkgroup %3
          %5 = OpTypeFunction %1 %4 %4
         %11 = OpTypeVector %2 3
         %12 = OpTypePointer UniformConstant %11
         %13 = OpVariable %12 UniformConstant
         %10 = OpConstant %2 0
          %6 = OpFunction %1 None %5
          %7 = OpFunctionParameter %4
          %8 = OpFunctionParameter %4
          %9 = OpLabel
         %14 = OpLoad %11 %13
         %15 = OpVectorExtractDynamic %2 %14 %10
         %16 = OpPtrAccessChain %4 %7 %15
         %17 = OpLoad %3 %16
         %18 = OpPtrAccessChain %4 %8 %15
               OpStore %18 %17
               OpReturn
               OpFunctionEnd