problem with clCreateCommandQueueWithProperties on Linux for NVIDIA Quadro K2200

November 1, 2016, 4:50 am

Latest and popular articles on Intel Technologies

≫ Next: Intel OpenCL r3.0-57406 on Ubuntu 16.04LTS

≪ Previous: How to protect OpenCL kernel code?

Hello,

I want to use CUDA-8.0 and OpenCL on SuSE Linux Enterprise 12.1. Unfortunately NVIDIA supports only OpenCL 1.2, so that I've installed the latest version of the Intel OpenCL SDK as well. I get a warning about a deprecated function if I use "clCreateCommandQueue ()" and my program breaks with a segmentation fault if I call "clCreateCommandQueueWithProperties ()" for my NVIDIA Quadro K2200 graphics card (it doesn't matter if I use Sun C, icc, or gcc). I assume the reason for the segmentation fault is that the graphics device supports only OpenCL 1.2 and "clCreateCommandQueueWithProperties ()" doesn't honour this fact. Now I have a more or less complicated code to create a command queue and to get my program working with different platforms (OpenCL 1.2 and OpenCL 2.x, Linux and Windows) and compilers. Why do I have to determine the OpenCL version of the device at run-time, if the OpenCL SDK reports CL_VERSION_2_0? In my opinion the library should automatically use the correct function and parameters if the device supports only OpenCL 1.2. I'm new to OpenCL so I may have misunderstood something or perhaps my program is even faulty. How can I create a command queue for OpenCL 1.2 and 2.x in a portable way, so that I don't get warnings about deprecated functions and I get a program that works if I compile it on different platforms with different compilers? I don't know how "clCreateCommandQueueWithProperties ()" will be translated for the device. Is it possible that the function doesn't work as expected if the device supports only OpenCL 1.2?

#if !defined(CL_VERSION_2_0) || defined(__NVCC__)
if (outOfOrderQueue == 1)
{
printf (" Using out-of-order clCreateCommandQueue (...)\n\n");
command_queue = clCreateCommandQueue (context, device_id,
CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, &errcode_ret);
}
else
{
printf (" Using in-order clCreateCommandQueue (...)\n\n");
command_queue = clCreateCommandQueue (context, device_id, 0, &errcode_ret);
}
#else
/* "deviceOpenCL_MajorVersion - '0'" converts char to int */
if ((deviceOpenCL_MajorVersion - '0') == 2)
{
cl_queue_properties queueProps[] =
{ CL_QUEUE_PROPERTIES,
(const cl_queue_properties) (CL_QUEUE_ON_DEVICE &&
CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE),
0
};
printf (" Using out-of-order clCreateCommandQueueWithProperties (...)\n\n");
command_queue = clCreateCommandQueueWithProperties (context,
device_id, queueProps, &errcode_ret);
}
else
{
/* CL_VERSION_2_0 reported, but device supports only OpenCL 1.2 */
if (outOfOrderQueue == 1)
{
printf (" Using out-of-order clCreateCommandQueue (...)\n\n");
command_queue = clCreateCommandQueue (context, device_id,
CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE, &errcode_ret);
}
else
{
printf (" Using in-order clCreateCommandQueue (...)\n\n");
command_queue = clCreateCommandQueue (context, device_id, 0, &errcode_ret);
}
}
#endif
CheckRetValueOfOpenCLFunction (errcode_ret);

I would be grateful for any comments entlighten my understanding of the problem and why I have to choose between clCreateCommandQueueWithProperties and clCreateCommandQueue myself when CL_VERSION_2_0 is reported. Thank you very much for any help in advance.

Siegmar

Thread Topic:

Question

↧

Intel OpenCL r3.0-57406 on Ubuntu 16.04LTS

November 1, 2016, 7:41 am

Latest and popular articles on Intel Technologies

≫ Next: neural-style/torchcl with intel-opencl-r3.0

≪ Previous: problem with clCreateCommandQueueWithProperties on Linux for NVIDIA Quadro K2200

Hi everyone,

I'm going to summarize results of my experiments with Intel OpenCL r3.0-57406 on Ubuntu 16.04 here, in assumption that community would benefit from it, save the CPU cycles etc. All tests were performed on Intel Core i7-6700 Skylake chip, and CPU-only part has been also tested on Intel Core i5-3570K IvyBridge.

To the moment, my findings are as follows:

The Intel OpenCL r3.0-57406 GPU ICD is installable and accessible using the kernel patch for 4.4 kernels supplied with the Intel OpenCL 2.0-54425, even though its release notes state that the 4.4 kernels support is deprecated.
The contents of the intel-opencl-r3.0-57406.x86_64.tar.xz archive is perfectly relocatable to the standard Debian/Ubuntu tree (i.e. /usr/lib/x86_64-linux-gnu/intel-opencl), except the /opt/intel/opencl/igdclbif.bin file which is hardcoded and therefore must be either kept in /opt/intel/opencl or symlinked there.
The libOpenCL.so.1 shared library inside this archive is modified by Intel and incompatible with the one shipped in the standard ocl-icd-libopencl1 package. Being forced to use the packaged version, clinfo doesn't report Intel OpenCL GPU driver available, and using the modified version, the clinfo output is somehow incomplete.
The libcommon_clang.so shared library inside this archive is also needed by the Intel OpenCL CPU driver (intel-opencl-cpu-r3.0-57406.x86_64.tar.xz). This is the only dependency, though, so one can safely isolate this file into a standalone intel-opencl-common self-made package (as I did in my setup).
The contents of the Intel OpenCL CPU driver archive (mentioned above) is missing the approprate etc/OpenCL/vendors/intel-cpu.icd announcement. Not a big deal, though, as one can simply run a shell command like 'echo /opt/intel/opencl/libintelocl.so > /etc/OpenCL/vendors/intel-cpu.icd' and fix this.
This archive also contains TBB shared library which is available through the standard package libtbb2, so one can safely remove the library files using command 'rm /opt/intel/opencl/libtbb*' and stay with the packaged version.
The contents of this archive is perfectly relocatable to the /usr/lib/x86_64-linux-gnu/intel-opencl, with no hardcoded paths everywhere.
In fact, the Intel OpenCL CPU driver does not need neither modified libOpenCL.so.1, nor modified Linux kernel, it runs smoothly with the generic Ubuntu Linux kernel and packaged ocl-icd-libopencl1, and it does not conflict with the Beignet Intel OpenCL GPU ICD. (I was able to run our computations in parallel using both Beignet OpenCL on GPU device and Intel OpenCL on CPU.)
I tried to adjust the kernel-4.7.patch to make it applicable against 4.8 kernels because Ubuntu Yakkety jumped straight to 4.8 kernels from 4.4. on Xenial, and I was able to successfully compile it, but the patched 4.8 kernel didn't complete the boot process with some errors on file system (so that I'm not sure if it is result of bad patching or deep difference between Xenial and Yakkety distros), and I don't have Yakkety here, so I'm ready to pass a token to those who have if anyone dares to give it a try. Ping me if you want to try the patch yourself or install the patched kernel packages.

Zone:

Code for Good

↧

neural-style/torchcl with intel-opencl-r3.0

November 1, 2016, 9:48 pm

Latest and popular articles on Intel Technologies

≫ Next: How to figure out the work group to EU mapping?

≪ Previous: Intel OpenCL r3.0-57406 on Ubuntu 16.04LTS

Hi,

I'm ran onto issue on "github.com/jcjohnson/neural-style"&& "torchcl" (github.com/hughperkins/distro branch distro-cl) with intel-opencl-r3.0.

It runs 7 times faster than Beignet 1.1.1, but processing stopped after 90-100 iterations with error code CL_OUT_OF_HOST_MEMORY (-6), whereas Beignet work stable.

On image size 500x500, computer have 32Gb of RAM, OS use ~1.5Gb, and torch use ~10Gb (~5Gb resident), but your driver returns "out of host memory". Can you explain that? In same situation Beignet use ~5Gb (~0.8Gb resident).

It look like, that error dropped at same number of iterations regardless of image size (250x250 or 500x500 does not make a difference). I don't see, that memory use significantly grows across iterations.

Does intel-opencl-r3.0 have its own logging system to figure out what triggers "out of memory" error? And what else I can do in this situation?

P.S.: That "torchcl" is written for GPUs and don't follows your recommendations for not duplicate all buffers in memory. (https://software.intel.com/en-us/articles/getting-the-most-from-opencl-1...)
And maybe it have memory/object leaks, but it somehow work with Beignet without "out of memory" errors.
So I suggest, that "intel-opencl-r3.0" may have its own issues beside of that.

HW: i3-6300
OS: Ubuntu-16.04 with kernel 4.4 (tried out 4.8 with your patch for i915, nothing changed).

↧

How to figure out the work group to EU mapping?

November 2, 2016, 10:17 am

Latest and popular articles on Intel Technologies

≫ Next: RFE: OpenCL 2.x CPU Platform SSSE3/AVX2 instruction "[v]pmulhrsw"

≪ Previous: neural-style/torchcl with intel-opencl-r3.0

Hi everyone:

Is there a way to figure out the work group to EU mapping? More specifically, is there any registers containing EU IDs available in EU, that the kernel can read during the execution?

Thanks!

Dong

Thread Topic:

How-To

↧

RFE: OpenCL 2.x CPU Platform SSSE3/AVX2 instruction "[v]pmulhrsw"

November 2, 2016, 4:42 pm

Latest and popular articles on Intel Technologies

≫ Next: Intel intel-opencl-r3.0 (SRB3) Linux and sdk for OpenCL 2016 R2

≪ Previous: How to figure out the work group to EU mapping?

Request for enhancement,

I'd like the OpenCL CPU compiler to generate a [v]pmulhrsw instruction when the proper idiom/sequence is detected.

It's an exotic but useful instruction.

The sequence that you would need to detect would be something like:

c = ((a * b) >> 14) + 1) >> 1;

Where a,b,c are all OpenCL shorts

↧

Intel intel-opencl-r3.0 (SRB3) Linux and sdk for OpenCL 2016 R2

October 24, 2016, 11:03 am

Latest and popular articles on Intel Technologies

≫ Next: Install OpenCL Eclipse plugin on Eclipse Neon

≪ Previous: RFE: OpenCL 2.x CPU Platform SSSE3/AVX2 instruction "[v]pmulhrsw"

Hi,

Will the sdk for OpenCL 2016 R2 work with the new Intel opencl-r3.0 linux driver under both ubuntu 14.04 kernel 4.7 and ubuntu 16.04 kernel 4.7?

Thanks

↧

Install OpenCL Eclipse plugin on Eclipse Neon

October 25, 2016, 4:14 am

Latest and popular articles on Intel Technologies

≫ Next: Intel® SDK for OpenCL™ Applications 2016 R3 available for download!

≪ Previous: Intel intel-opencl-r3.0 (SRB3) Linux and sdk for OpenCL 2016 R2

Hi, I'm trying to install the Intel OpenCL plugin on Eclipse Neon.

I'm using Kubuntu 16.04, the driver work fine, but I need a debugger....

Zone:

Code for Good

↧

Intel® SDK for OpenCL™ Applications 2016 R3 available for download!

November 4, 2016, 11:21 am

Latest and popular articles on Intel Technologies

≫ Next: Xeon E5-2690 v3 doesn't report correct OpenCL Max Compute Units

≪ Previous: Install OpenCL Eclipse plugin on Eclipse Neon

We are glad to announce the availability of Intel® SDK for OpenCL™ Applications 2016 R3. Please visit http://software.intel.com/intel-opencl to download the latest version. This update adds support for new OSes and platforms, more performance analysis features and various SDK improvements.

New Platforms

7th Generation Intel® Core™ Processor
7th Generation Intel® Celeron® Processor J3000 Series
7th Generation Intel® Pentium® Processor J4000 Series
7th Generation Intel® Celeron® Processor N3000 Series
7th Generation Intel® Pentium Processor N4000 Series

New OSes

Windows* 10 Anniversary Update
Yocto* Project

OpenCL™ 2.1 Support:

7th Generation Intel® Core™ Processor (Windows*)

Code Analyzer:

Hardware Counter Support
Latency Analysis

For a complete list of new features and changes in this release, read the release notes.

Download the SDK and get started today! (Drivers and runtimes provided as a separate installer)

OpenCL and the OpenCL logo are trademarks of Apple Inc. used by permission by Khronos

↧

Xeon E5-2690 v3 doesn't report correct OpenCL Max Compute Units

November 4, 2016, 3:07 pm

Latest and popular articles on Intel Technologies

≫ Next: Unable to uninstall SDK

≪ Previous: Intel® SDK for OpenCL™ Applications 2016 R3 available for download!

Here's a puzzling situation.

Different drivers and architectures (x86 vs. x64) report a different "max compute units" value on a dual Xeon E6-2690 v3 workstation (2 x HyperThreaded 12-core Xeons).

The new Visual Studio Code-Builder "Platform Info Tree" reports 32 compute units on both the 1.2 and 2.1 devices (incorrect).

The older standalone _x86_ Code-Builder reports 32 on both the 1.2 and 2.1 devices (incorrect).

The _x64_ Code-Builder reports 48 on both the 1.2 and 2.1 devices (correct).

The Nsight OpenCL System Info window reports 48 on both devices (correct).

↧

Unable to uninstall SDK

November 7, 2016, 12:46 am

Latest and popular articles on Intel Technologies

≫ Next: Braswell iGPU OpenCL Inconsistencies

≪ Previous: Xeon E5-2690 v3 doesn't report correct OpenCL Max Compute Units

While trying to upgrade to sdk v6.3, process was interrupted with the following error message:

And then i found that it's impossible to uninstall sdk v6.1 at all.

I have windows 10 with anniversary update. And as far as i remember, sdk v6.1 was installed prior to this update.

Here is some quotation from error log:

[t2b20 2016.11.05 18:21:50.940 000011ad] [billboard]: INFO: Activating: 1
[t2b20 2016.11.05 18:21:50.941 000011ae] [MSI processing]: INFO: request to configure msi: {BB7978A8-F4D5-4AED-AD20-E5A9DBC0AD6D}
[t2b20 2016.11.05 18:21:52.955 000011af] [MsiUIHandler]: ERROR: MSI Error: An installation package for the product Intel® SDK for OpenCL™ Applications 2016 R2 for Windows* cannot be found. Try the installation again using a valid copy of the installation package 'intel_sdk_for_opencl_2016_x64_setup.msi'.
[t2b20 2016.11.05 18:21:54.588 000011b0] [msistd]: INFO: Internal call: error code: 1603 (Fatal error during installation.
), call: MsiConfigureProductEx( product.c_str(), level, state, command.c_str() ), function:msistd::CInstaller::ConfigureProductEx, file:sources\stdInstallLib.cpp, line:1069

[t2b20 2016.11.05 18:21:54.588 000011b1] [MsiUIHandler]: INFO: MSI Exit code: 1603
[t2b20 2016.11.05 18:21:54.588 000011b2] [MsiUIHandler]: INFO: MSI Exit code: 1603
[t2b20 2016.11.05 18:21:54.588 000011b3] [MSI processing]: INFO: MSI remove failed, error code=1603
[t2b20 2016.11.05 18:21:54.590 000011b4] [error container]: INFO: Add error message:

Zone:

Windows*

Thread Topic:

Bug Report

↧

Braswell iGPU OpenCL Inconsistencies

November 7, 2016, 6:00 am

Latest and popular articles on Intel Technologies

≫ Next: intel_sub_group_block_read8 gets unexpected column data with large work group size

≪ Previous: Unable to uninstall SDK

Hi, I just bought an Acer(packard bell) laptop that has a N3060 SoC. According to Intel, Wiki and Notebookcheck sites, it must have 8. Generation HD graphics 400(not 4000) of Braswell architecture with 12 compute units @ 320-600 MHz.

It is working under windows-10 64 bit home edition(single language) build 10240, 4GB single-channel RAM.

I'm also developing opencl programs on another machine and wanted to migrate them to this machine.

Before that, I tested against if opencl is supported or not.

Observation:

compubench benchmark and any other opencl program sees it as a 8 compute units @200MHz constant frequency.
compubench also gives only half of benchmark points of N3050-igpu for this N3060-igpu. (igpu vs igpu, not others)
gpu-z shows it is stuck at 320MHz even under heavy graphics load.
gpu-z render test starts heavily stuttered and freezes after 10 seconds
OpenCL SDK for Intel says "opencl driver out of date" on installing phase.
No OpenCL load increases cpu temperature(according to core temp software) but it should increase since half of chip is a gpu isn't it? Especially the t-rex part of compubench should increase temperature but not happening.
It writes Intel HD Graphics everywhere in computer, drivers, but not Intel HD Graphics 400.
It seems I'm first person to do a benchmark on compubench with an N3060 CPU(its iGPU actually).
AIDA64 GPGPU gives errors.

What I tried to solve:

unsinstalling current drivers(add/remove programs-->uninstall, device manager--> uninstall) before each new install
installing latest drivers: n-series 15.40 which installs 10.18 for igpu and xx.xx for audio.
running Intel driver updater software(which scans and offers same driver above, 15.40 which installs same 10.18 for igpu). 64 bit version of course.
changed power saving to "performance mode" from both Intel HD Graphics settings and windows power savings settings
Checked an AMD gpu on compubench and it shows different frequencies as a frequency range instead of single idle frequency so I'm sure it isn't an opencl-side issue. It should be Acer-side ır Intel-side issue but I'm not sure. Also gpu-z shows that amd card having a boost to a higher frequency.

I suspect:

Acer(packard bell)'s support page is not active, maybe they temporarily deactivated 4 compute units and disabled dynamic frequency. But why didn't write "gimped igpu version" on computer board? I bought this computer because I need 12 compute units @320-600 MHz performance level but there is no info about that in laptop specs datasheet.
Latest Intel Driver(15.40---->installs 10.18 for this n-series igpu (should be Intel HD Graphics 400, not 4000 to alleviate ofuscation) ) has bugs so need to wait next driver update.
I just bought this laptop so It will update to w10 anniversary and more versions, probably will include a fix for this?
It is single-channel RAM, so GPU is auto tuned to stay at base/idle instead of upping to boost and with only 8 compute units?

Thank you for your time.

↧

intel_sub_group_block_read8 gets unexpected column data with large work group size

November 8, 2016, 1:50 am

Latest and popular articles on Intel Technologies

≫ Next: Getting Started

≪ Previous: Braswell iGPU OpenCL Inconsistencies

Hi,

I got a problem when use intel_sub_group_block_read8 to read image2D. It is a very simple usage, just continuously read uint8. The image2D is 64x32 (Width x Height), and each pixel is a uint. Part of the image2D is printed as bellow for illustration:

0 1 2 3 4 5 6 7
--------------------------------------------------
0|0x000 0x001 0x002 0x003 0x004 0x005 0x006 0x007
1|0x040 0x041 0x042 0x043 0x044 0x045 0x046 0x047
2|0x080 0x081 0x082 0x083 0x084 0x085 0x086 0x087
3|0x0c0 0x0c1 0x0c2 0x0c3 0x0c4 0x0c5 0x0c6 0x0c7
4|0x100 0x101 0x102 0x103 0x104 0x105 0x106 0x107
5|0x140 0x141 0x142 0x143 0x144 0x145 0x146 0x147
6|0x180 0x181 0x182 0x183 0x184 0x185 0x186 0x187
7|0x1c0 0x1c1 0x1c2 0x1c3 0x1c4 0x1c5 0x1c6 0x1c7

8|0x200 0x201 0x202 0x203 0x204 0x205 0x206 0x207
9|0x240 0x241 0x242 0x243 0x244 0x245 0x246 0x247
10|0x280 0x281 0x282 0x283 0x284 0x285 0x286 0x287
11|0x2c0 0x2c1 0x2c2 0x2c3 0x2c4 0x2c5 0x2c6 0x2c7
12|0x300 0x301 0x302 0x303 0x304 0x305 0x306 0x307
13|0x340 0x341 0x342 0x343 0x344 0x345 0x346 0x347
14|0x380 0x381 0x382 0x383 0x384 0x385 0x386 0x387
15|0x3c0 0x3c1 0x3c2 0x3c3 0x3c4 0x3c5 0x3c6 0x3c7

For this 64x32 image2D, I set global work size to 64x4, and each work item read a uint8

The problems is:

if I use large work group size 64x4 or 32x4 (work group size[1] is 4), I can't read the expected column data on some location. E.g. I expect "0x200 0x240 0x280 0x2c0 0x300 0x340 0x380 0x3c0" at byte coordination (0,8), but I actually get "0x4 0x44 0x84 0xc4 0x104 0x144 0x184 0x1c4" .

If work group size is 64x2 or 32x2 (work group size[1] is 2), I can get "0x200 0x240 0x280 0x2c0 0x300 0x340 0x380 0x3c0" from byte coordination (0,8).

1. ./transpose -y 4| tee wg_4.log (work group is 64x4, which gets unexpected column data at byte coordination:0,8)

br_src = 0x4 0x44 0x84 0xc4 0x104 0x144 0x184 0x1c4
|--subgrp_size(work items)=16, subgrp_size_max=16 subgrp_num_in_work_group=16 (group_size=64,4), subgrp_id=4
|--src img_coord(0,8)/byte_coord_xy(0,8)/group_xy(0,0)/local_xy(0,1),size (64,4)/

2. ./transpose | tee wg_2.log (work group is 64x2, which gets expected column data at byte coordination:0,8)

br_src = 0x200 0x240 0x280 0x2c0 0x300 0x340 0x380 0x3c0
|--subgrp_size(work items)=16, subgrp_size_max=16 subgrp_num_in_work_group=8 (group_size=64,2), subgrp_id=4
|--src img_coord(0,8)/byte_coord_xy(0,8)/group_xy(0,0)/local_xy(0,1),size (64,2)/

The sample is attached: block_read.zip, my environment: Intel(R) HD Graphics Skylake ULX GT2, Driver Version  16.5.56895

Could someone help to look at it and point out where the problem is?

Thanks

-Austin

Attachment	Size
Download block-read.zip	50.04 KB

↧

Getting Started

November 13, 2016, 9:46 am

Latest and popular articles on Intel Technologies

≫ Next: GPU not found on a Joule module

≪ Previous: intel_sub_group_block_read8 gets unexpected column data with large work group size

Hi All, I am new to Hardware Programming. Since OpenCL is C like syntax I will used it for my Hardware Development. So I came to know about the Intel SDK Development Kit available. This has something call emulator to run on the hosts. So does this mean that I will not require any Hardware to start with. And I guess this emulator in from Quartus. So what about the licensing. So to be precise can I just download the SDK and start working and run the kernels in an emuator.

Zone:

Code for Good

↧

GPU not found on a Joule module

November 15, 2016, 5:44 am

Latest and popular articles on Intel Technologies

≫ Next: Losing all autos and local symbols when single-stepping in CPU debugger

≪ Previous: Getting Started

Hi,

I tried several sample OpenCL programs and none of them found the GPU on my Joule module. I am running an Ubuntu Linux distribution and I installed the Intel SDK which I use to compile those samples. They all find the CPU but not the GPU. This is just the result of one of them (I got it from a GitHub post about installing the OpenCL SDK for Linux: https://gist.github.com/rmcgibbo/6314452)

clDeviceQuery Starting...

1 OpenCL Platforms found

CL_PLATFORM_NAME: Experimental OpenCL 2.1 CPU Only Platform
CL_PLATFORM_VERSION: OpenCL 2.1 LINUX
OpenCL Device Info:

1 devices found supporting OpenCL on: Experimental OpenCL 2.1 CPU Only Platform

Device Intel(R) Atom(TM) Processor T5700 @ 1.70GHz

CL_DEVICE_NAME: Intel(R) Atom(TM) Processor T5700 @ 1.70GHz
CL_DEVICE_VENDOR: Intel(R) Corporation
CL_DRIVER_VERSION: 1.2.0.18
CL_DEVICE_TYPE: CL_DEVICE_TYPE_CPU
CL_DEVICE_MAX_COMPUTE_UNITS: 4
CL_DEVICE_MAX_WORK_ITEM_DIMENSIONS: 3
CL_DEVICE_MAX_WORK_ITEM_SIZES: 8192 / 8192 / 8192
CL_DEVICE_MAX_WORK_GROUP_SIZE: 8192
CL_DEVICE_MAX_CLOCK_FREQUENCY: 1700 MHz
CL_DEVICE_ADDRESS_BITS: 64
CL_DEVICE_MAX_MEM_ALLOC_SIZE: 958 MByte
CL_DEVICE_GLOBAL_MEM_SIZE: 3833 MByte
CL_DEVICE_ERROR_CORRECTION_SUPPORT: no
CL_DEVICE_LOCAL_MEM_TYPE: global
CL_DEVICE_LOCAL_MEM_SIZE: 32 KByte
CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE: 128 KByte
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE
CL_DEVICE_QUEUE_PROPERTIES: CL_QUEUE_PROFILING_ENABLE
CL_DEVICE_IMAGE_SUPPORT: 1
CL_DEVICE_MAX_READ_IMAGE_ARGS: 480
CL_DEVICE_MAX_WRITE_IMAGE_ARGS: 480

CL_DEVICE_IMAGE 2D_MAX_WIDTH 16384
2D_MAX_HEIGHT 16384
3D_MAX_WIDTH 2048
3D_MAX_HEIGHT 2048
3D_MAX_DEPTH 2048
CL_DEVICE_PREFERRED_VECTOR_WIDTH_ CHAR 1, SHORT 1, INT 1, FLOAT 1, DOUBLE 1

clDeviceQuery, Platform Name = Experimental OpenCL 2.1 CPU Only Platform, Platform Version = OpenCL 2.1 LINUX, NumDevs = 1, Device = Intel(R) Atom(TM) Processor T5700 @ 1.70GHz

System Info:

Local Time/Date = 09:12:08, 11/15/2016
CPU Name: Intel(R) Atom(TM) Processor T5700 @ 1.70GHz

# of CPU processors: 4

Linux version 4.4.0-47-generic (buildd@lcy01-03) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.2) ) #68-Ubuntu SMP Wed Oct 26 19:39:52 UTC 2016

TEST PASSED

Why did it not find the GPU? Thanks a lot!

↧

Losing all autos and local symbols when single-stepping in CPU debugger

November 15, 2016, 12:57 pm

Latest and popular articles on Intel Technologies

≫ Next: Performance hint: Null local workgroup size detected...

≪ Previous: GPU not found on a Joule module

I'm using a VS2015 / Win10 workstation with the latest CPU driver and Code Builder installed (4.0.0.1).

I'm single-stepping through an OpenCL 2.1 + AVX2 CPU kernel and after a few lines all symbols in the Autos and Local windows disappear.

Single-stepping appears to continue to work because the line arrow is properly moving.

I've tried targeting both x86 and x64 and see no difference.

The build options for clBuildProgram(..) are "-g -s <kernel absolute path>".

Any ideas?

↧

Performance hint: Null local workgroup size detected...

November 16, 2016, 7:45 am

Latest and popular articles on Intel Technologies

≫ Next: Unable to Launch Debugger in VS2015 running OpenCL Samples

≪ Previous: Losing all autos and local symbols when single-stepping in CPU debugger

My application uses a 2D kernel that is called several times on different data. Running the code builder profiler, I got a lot of the following perfromance hints (on Intel HD 4600, Windows 10, Intel SDK for OpenCL Applications 2016 R3 ):

Performance hint: Null local workgroup size detected (kernel name: cellRasterize); following sizes will be used for execution: { 20, 22, 1 }

Performance hint: Null local workgroup size detected (kernel name: cellRasterize); following sizes will be used for execution: { 26, 16, 1 }

Performance hint: Null local workgroup size detected (kernel name: cellRasterize); following sizes will be used for execution: { 12, 12, 1 }

...

What does it mean? Many Thanks in advance!

Matteo.

Zone:

Thread Topic:

Help Me

↧

Unable to Launch Debugger in VS2015 running OpenCL Samples

November 16, 2016, 9:06 am

Latest and popular articles on Intel Technologies

≫ Next: iocXX -spirvXX keeps emitting old kernel

≪ Previous: Performance hint: Null local workgroup size detected...

I've downloaded and successfully run the code samples from the Intel OpenCL website under "Release" configuration.

(for example, Sobel, CapsBasic, and others).

So, I'm confident that my SDK/drivers setup is mostly working.

However, I'm not able to launch these examples in "debug" mode (to set breakpoints, step through code). When I run with debug, I am told that a dependant dll is missing. Here is the full output from a debug/win32 session when I click "Local Windows Debugger":

'CapsBasic.exe' (Win32): Loaded 'C:\_CVWork\OpenCL\intel_ocl_caps_basic_win\CapsBasic\Win32\Debug\CapsBasic.exe'. Symbols loaded.'CapsBasic.exe' (Win32): Loaded 'C:\Windows\SysWOW64\ntdll.dll'. Symbols loaded.'CapsBasic.exe' (Win32): Loaded 'C:\Windows\SysWOW64\kernel32.dll'. Symbols loaded.'CapsBasic.exe' (Win32): Loaded 'C:\Windows\SysWOW64\KernelBase.dll'. Symbols loaded.'CapsBasic.exe' (Win32): Loaded 'C:\Windows\SysWOW64\OpenCL.dll'. Cannot find or open the PDB file.'CapsBasic.exe' (Win32): Loaded 'C:\Windows\SysWOW64\advapi32.dll'. Symbols loaded.'CapsBasic.exe' (Win32): Loaded 'C:\Windows\SysWOW64\msvcrt.dll'. Symbols loaded.'CapsBasic.exe' (Win32): Loaded 'C:\Windows\SysWOW64\sechost.dll'. Symbols loaded.'CapsBasic.exe' (Win32): Loaded 'C:\Windows\SysWOW64\rpcrt4.dll'. Symbols loaded.'CapsBasic.exe' (Win32): Loaded 'C:\Windows\SysWOW64\sspicli.dll'. Symbols loaded.'CapsBasic.exe' (Win32): Loaded 'C:\Windows\SysWOW64\cryptbase.dll'. Symbols loaded.'CapsBasic.exe' (Win32): Loaded 'C:\Windows\SysWOW64\msvcp140d.dll'. Symbols loaded.'CapsBasic.exe' (Win32): Loaded 'C:\Windows\SysWOW64\vcruntime140d.dll'. Symbols loaded.The program '[11500] CapsBasic.exe' has exited with code -1073741515 (0xc0000135) 'A dependent dll was not found'.

To clarify, I do have the API debug options set in the Code-Builder->Open CL Debugger->Options... window ("Enable OpenCL kernel debugging for CPU device" box is checked, as is "Enable remote OpenCL kernel debugging for GPU device").

Appreciate any insight on how to get breakpoints / debug runs working with the sample code using VS2015.

Zone:

Thread Topic:

Help Me

↧

iocXX -spirvXX keeps emitting old kernel

November 17, 2016, 3:45 pm

Latest and popular articles on Intel Technologies

≫ Next: Big kernel performance difference between the image created from HOST_PTR and the image created from Buffer Object

≪ Previous: Unable to Launch Debugger in VS2015 running OpenCL Samples

Here's a bizarre result...

I'm building a kernel with ioc64 along with the "-ir" and "-asm" options.

The kernel builds, loads and runs fine as an optimized binary or as source.

But I added the "-spirv64" option and ioc64 is emitting a SPIR-V file from a tiny kernel from last week.

Is there a directory that isn't being cleaned out somewhere?

Any tips?

↧

Big kernel performance difference between the image created from HOST_PTR and the image created from Buffer Object

November 17, 2016, 11:18 pm

Latest and popular articles on Intel Technologies

≫ Next: Performance of "intel_sub_group_block_readN/writeN" vs "vloadN/vstoreN"

≪ Previous: iocXX -spirvXX keeps emitting old kernel

Hi,

We found the same kernel performance varies dramatically if the input image is created from different ways. With the attached test tool:

if the input image is created from a host ptr directly, the performance is good, e.g. for 8K x 8K input image:
- ./blockread
- Average kernel 2.033509 ms
if the input image is created from a buffer object (which is created from the same host ptr), the performance drops much: for the same 8K x 8K process:
- ./blockread -b
- Average kernel 3.763424 ms

The buffer pitch/base address are aligned at 4K, not sure why the performance difference is so big...

The code snippet for image creation is listed bellow

    if (create_image_from_buf) {
        buf_from_hostptr = clCreateBuffer(context, CL_MEM_READ_WRITE| CL_MEM_USE_HOST_PTR, src_size, src_ptr, &errNum);
        if (buf_from_hostptr == 0) {
            printf("clCreateBuffer failed \n");
            exit(1);
        }
        desc.buffer = buf_from_hostptr;
        // flags inherited from buffer
        img_from_buf = clCreateImage(context,0, &format, &desc,NULL,&errNum);
        if (img_from_buf == 0) {
            printf("clCreateImage failed \n");
            exit(1);
        }
    } else {
        img_from_hostptr = clCreateImage(context, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR, &format, &desc, src_ptr, &errNum);
        if (img_from_hostptr == NULL)
        {
            std::cerr << "Error creating memory objects."<< std::endl;
            return false;
        }
    }

Thanks

-Austin

Attachment	Size
Download image-with-blockread.tar.gz	31.67 KB

↧

Performance of "intel_sub_group_block_readN/writeN" vs "vloadN/vstoreN"

November 29, 2016, 12:21 am

Latest and popular articles on Intel Technologies

≫ Next: Segfault on clBuildProgram for simple CPU Kernel

≪ Previous: Big kernel performance difference between the image created from HOST_PTR and the image created from Buffer Object

Does subgroup extension API "intel_sub_group_block_readN/writeN" have better performance than "vloadN/vstoreN"? I did some testing, but don't see much difference between them. Can you elaborate the read/write performance expectation between them?

↧