Quantcast
Channel: Intel® Software - OpenCL*
Viewing all 1182 articles
Browse latest View live

Where can I download Linux OpenCL driver for Intel HD 4600 graphics?

$
0
0

Hi all,

Title says it all.  I am running Fedora 27 Linux on i7-4790K haswell (HD 4600 graphics).  It has two Nvidia gpus and OpenCL for them works fine, but I cannot find any OpenCL library for Intel integrated GPU which is Intell HD 4600.   According to https://software.intel.com/en-us/node/540387, my cpu driver isn't supported, but is there any open source one?  Or does Intel plan to develop one?

Thanks in advance!


[Advice] Efficient way to gather data from global memory

$
0
0

Hi to everyone,

I am working on a OpenCL Monte Carlo code for particle transport, and in order to alleviate the thread divergence, due to the stochastic nature of the method, I am trying to implement an event-by-event simulation, i.e. process concurrently particles that will carry out a specific process (e.g. photo-electric effect, compton scattering, etc.).

Several buffers containing particle information (energy, position, etc) are stored in global memory. Additionally, a buffer containing an index to the next event (i.e. physical process) is associated to each particle. First the most frequent next event is determined using a histogram-like kernel, as the following:

void __kernel freq_kernel(
	// Number of data per work-item to be analyzed
	int nperwork_item,

	// Particle stack
	__global int *gstack_stat,
	
	// Frequency of events
	__global int *gfreq_events) {

	int gid = get_global_id(0);
	int lid = get_local_id(0);

	int lsize = get_local_size(0);
	int gsize = get_global_size(0);

	// Local array containing frequency of events
	__local int lfreq_events[MX_EVENTS];

	// Initialize local frequency array.
	for(int i=lid; i<MX_EVENTS; i+=lsize) { 
		lfreq_events[i] = 0;
	}
	barrier(CLK_LOCAL_MEM_FENCE);

	// Compute the frequency of events local array.
	int ndata = gsize*nperwork_item;
	for(int i=gid; i<ndata; i+=gsize) { 
		atomic_add(&lfreq_events[gstack_stat[i]-1], 1);
	}
	barrier(CLK_LOCAL_MEM_FENCE);

	// Finally, save local results to global memory.
	for(int i=lid; i<MX_EVENTS; i+=lsize) { 
		atomic_add(&gfreq_events[i], lfreq_events[i]);
	}
	barrier(CLK_LOCAL_MEM_FENCE);

}

The result is an array with the frequency of each possible event. Then it is transferred to the host and the event with the highest frequency is selected as the next event to be processed. In the host we have the following code:

int next_event = 0;
int max_count = 0;

for (cl_uint idevice = 0; idevice < number_of_devices; idevice++) {
        max_count = *std::max_element(h_freq_events[idevice].begin(), h_freq_events[idevice].end());
	next_event = (int)std::distance(h_freq_events[idevice].begin(), std::max_element(h_freq_events[idevice].begin(), h_freq_events[idevice].end()));
}

Now the idea should be to determine which particles must be simulated in the next step. Essentially, I would like to obtain the indexes of such particles in order to gather its particle information (energy, position, etc...) from global memory. A serial alternative could be the following:

for (cl_uint idevice = 0; idevice < number_of_devices; idevice++) {
	for (int i = 0; i<global_work_size[idevice] * nperwork_item[idevice]; i++) {
		if (h_pstack_stat[idevice][i] == next_event) {
			h_mask[idevice].push_back(i);
		}
	}
}

So I need an advice, do you now an efficient way to obtain the needed indexes?, it is clear that calculating them on the host is very expensive and I am looking for a more efficient way. I have looked for information online without success. Maybe I should first order the indexes array, together with the others holding the particle information, and then carry out the gather operation?. Thanks for your help!.

IOC64.exe always crashes for a specific program on specific Intel Processor

$
0
0

We are facing a wierd issue with OpenCL SDK:

We have two very complex OpenCL programs abc.cl and def.cl that we are building using ioc64.exe on various Intel based machines. So far the work was developed using slightly older Core i7 laptops.. all i7-4720HQ or others from the same generation.

We are using Intel® SDK for OpenCL™ Applications version 2017 R2 (latest) On windows 10.  We are compiling for CPU (-device=CPU). This was all working fine on those old laptops. Then we decided to try this on an intel NUC with Intel Core i7-6770HQ and we found that ioc64.exe consistently crashes during the following command  

ioc64 -device=CPU -input=abc.cl

while it works fine on the same machine for other program i.e

ioc64 -device=CPU -input=def.cl   passes without any crash. 

We got curious and tried them on a brand new MSI Laptop with 7th gen Core -i7 still the same result. build for abc..cl crashes the compiler but def passes fine. and yet on older laptops all is well. 

With more analysis I found that the compile phase passes while the linking phase is what crashes it. 

We have the crash dump hosted here. https://www.dropbox.com/s/coeg84uo9tul9ct/ioc64.zip

It is really annoying that program can crash compiler... One would expect compiler to tell us whats wrong with the program. So we are left with no options but to randomly remove lines from the program to find whats wrong unless offcourse if Intel can analyze the crash dump to tell us what can we fix. 

I wonder if any one has faced similar issues ? And what could we a way out. 

Thanks in advance.

 

 

 

 

Local histograms - one big kernel launch or multiple kernel launches ?

$
0
0

Hello,

I work on implementing local histograms on images in OpenCL. I was wondering if there is a speed penalty if I start a kernel for each histogram patch (subarray) instead of starting a single kernel that will go through all image pixels, find the current patch and calculate the histogram. From a programming point of view it seems simpler to launch something like 64 kernels each on a particular patch.

Thanks

Frequent BSoD on Win10 -- 24.20.100.6025 NEO

$
0
0

When I execute my kernels, I'm frequently seeing a Blue Screen of Death with the .6025 driver on my Win10 workstation.

The processor is a Core i7-7820HQ (HD 630).  There is also a Quadro M1200 onboard.

The Win10 version is not the latest as this is a corporate machine: v1607 / 14393.2189.

The same kernels never BSoD on my home machine (NUC7i5BNK) with the latest Win10.

I'm going to revert but let me know if there is any log or dump that might help debug this.

 

There are 2 CPU in my computer.How to use them simultaniously?

$
0
0

There are 2 CPU in my computer.Each of them has several cores.

Now I can use one of them, but I want to know how to use them for a single kernel simultaniously.

Even more, if my computer has several CPUs and several GPUs, how to use them to do data parallel computing simultaniously.

BTW, I use windows 7 and OpenCL1.1 .

If the only way I can go is using windows threads? That seems too complicated to impliment!

Any suggestion will be appreciated.

Thank you in advance!

tdchen

if I can find several suitable devices

$
0
0

if I can find several suitable devices, how I can use them simultaniously?

it seems that i have to:

1)split my task into parts.

2)create a thread(windows system) for one device

3)use one thread to deal with one part simultaniously.

is it correct? is there any better way?

thank you in advance.

tdchen

 

Possible? one thread, multiple devices

$
0
0

Hi.

Is it possible to use a one-thread program to utilize multiple opencl devices simultaniously?

If so, how to do it?

Thanks in advance.

tdchen


.6025 driver: "The GFX driver doesn't support disassembly code display"

$
0
0

Why does the .6025 driver support dumping GEN assembly on an i5-7260U NUC but not an i7-7820HQ laptop (via ioc64 or CodeBuilder).

Both are Kaby Lake.

The laptop reports "The GFX driver doesn't support disassembly code display."

 

Out of Order Queues -- do they work? Enqueued Barriers with Events -- very slow?

$
0
0

Two questions:

(1) What is the expected behavior of out-of-order queues on GEN9 + NEO?

I'm issuing a number of small kernels into an out of order command queue with profiling enabled and no barriers between the NDRanges.

I'm not seeing kernels being run concurrently despite each kernel only using a fraction of a sub-slice (3 sub-slices available).

The benchmark is being run for one iteration.  

(2) What is the expected profiling behavior of enqueued barriers?

I'm enqueueing a barrier with no wait list between each NDRange and looking at the start and end time of both the barriers and kernels.

Barriers are reporting immensely long execution times (end - start) ... often in the 6-10 milliseconds when an event is attached.

Furthermore, an enqueued barrier's start time appears to begin before kernels preceding it in the out of order command queue.

This is unintuitive and the durations seem impossibly long. 

But... adding to the confusion, is that the interleaved kernel NDRanges seem to start and end back-to-back with only a few microseconds delay similar to (1).

Summary

What am I missing with out-of-order queues on GEN/NEO and are the reported durations of barriers correct?

Examples:

Each example list the order the command is issued, its type and it's start/end/duration in nanonseconds (via profiling).

Out-of-order queue with no barriers (which is not what I want):

[0  ] CL_COMPLETE   CL_COMMAND_NDRANGE_KERNEL    :      275065828645407,      275065828856573,               211166
[1  ] CL_COMPLETE   CL_COMMAND_NDRANGE_KERNEL    :      275065828858044,      275065828867044,                 9000
[2  ] CL_COMPLETE   CL_COMMAND_NDRANGE_KERNEL    :      275065828867855,      275065828913105,                45250

Out-of-order queue with barriers between kernels but with NULL for the barrier's event

[0  ] CL_COMPLETE   CL_COMMAND_NDRANGE_KERNEL    :      275228853506912,      275228853721495,               214583
[1  ] CL_COMPLETE   CL_COMMAND_NDRANGE_KERNEL    :      275228853722460,      275228853732710,                10250
[2  ] CL_COMPLETE   CL_COMMAND_NDRANGE_KERNEL    :      275228853736931,      275228853776931,                40000

Out-of-order queue with barriers that record an event for profiling:

[0  ] CL_COMPLETE   CL_COMMAND_NDRANGE_KERNEL    :      275372161923465,      275372162135631,               212166
[1  ] CL_COMPLETE   CL_COMMAND_BARRIER           :      275372158781086,      275372162447953,              3666867
[2  ] CL_COMPLETE   CL_COMMAND_NDRANGE_KERNEL    :      275372162137683,      275372162146683,                 9000
[3  ] CL_COMPLETE   CL_COMMAND_BARRIER           :      275372158807180,      275372164875723,              6068543
[4  ] CL_COMPLETE   CL_COMMAND_NDRANGE_KERNEL    :      275372162148451,      275372162192534,                44083
[5  ] CL_COMPLETE   CL_COMMAND_BARRIER           :      275372158836095,      275372167017520,              8181425

( Ignore any minor swings in kernel execution time )

clGetPlatformIDs crashes if integrated graphic card disabled

$
0
0

I have an Intel(R) Core(TM) i5-6600K CPU with Intel(R) HD Graphics 530
and my main graphic card NVIDIA GeForce GTX 1060 6GB.

OS:
Microsoft Windows [Version 10.0.14393]

All latest drivers installed from Windows Update.

Intel(R) HD Graphics 530 driver:

Vendor: Intel Corporation
Date: 28.02.2018
Version: 23.20.16.4973

NVIDIA GeForce GTX 1060 6GB driver:
Vendor: NVIDIA
Date: 27.10.2017
Version: 23.21.13.8813

I have manually disabled my Intel graphic card in Device Manager (my monitor is connected to GTX1060), and when I did that,
clGetPlatformIDs started crashing, when trying to retrieve a number of platforms.

Dump copy for history/search index:
> igdrclneo32.dll!044a6204() Unknown
[Frames below may be incorrect and/or missing, no symbols loaded for igdrclneo32.dll]
igdrclneo32.dll!0449394b() Unknown
igdrclneo32.dll!044dc726() Unknown
igdrclneo32.dll!04494471() Unknown
igdrclneo32.dll!0449ad9f() Unknown
[External Code]
OpenCL.dll!5bd910c2() Unknown
OpenCL.dll!5bd960d0() Unknown
[External Code]
OpenCL.dll!5bd96184() Unknown
OpenCL.dll!5bd9527a() Unknown
IntelOpenCLTest.exe!main() Line 13 C++

IntelOpenCLTest.zip sample app attached (the exact version on which dump was made).
IntelOpenCLTest.dmp_.zip - full crash dump attached as well.

NOTE: tested on other systems and it is not everywhere gets reproduced.
On other system (not sure what is the Windows/drivers version) it crashes with Integrated graphic disabled only if running with SYSTEM user (e.g. started as Windows Service).

NOTE2: did the same test with disabling my GTX1060 and having monitor connected to Intel Integrated Graphic card - no issues or crashes.

New 24.20.100.6094 Win10 driver performance regression from .6025

$
0
0

My suite of kernels compiled (binaries) with the .6094 driver on Win10/x64 take almost twice the amount of time to execute as those compiled with .6025.

Compiling on .6025 and executing on .6094 shows no regression.

Compiling on .6094 and executing on .6094 or .6025 shows the huge performance drop.

Furthermore, the .6094 driver has reenabled support for dumping GEN assembly via the IOC64 -asm switch.

Inspection of the .6094 produced assembly shows long sequences of MOV operations that I believe are unnecessary. 

I wish there was a better way to report performance regressions (and reproducers) than here or the GitHub issues page (which is very quiet).

-ASM

How to Locally Debug OpenCL Kernels on an Intel GPU?

$
0
0

My goal is to debug OpenCL kernels on an Intel GPU using Visual Studio 2017. My Intel GPU is not being used for graphics, so I should be able to use a single machine according to the “Developer Guide for Intel SDK for OpenCL Applications.” Are there any instructions or guides on how to debug a kernel on an Intel GPU on a setup like mine? I have been trying to piece together relevant portions of old guides and stackoverflow posts, but nothing seems to work. It appears that there have been several changes to the Visual Studio support in the past couple years, so no source of information has a complete explanation of the current state of development for this scenario.

The “Debugging OpenCL Kernels on GPU” steps in the Developer Guide for Intel SDK for OpenCL Applications only goes into detail for remote debug sessions. When I try to follow the steps in it, I just get errors about gdbserver not working:

INTEL_GT_DEBUGGER: (7594099) Starting gdbserver on localhost
INTEL_GT_DEBUGGER: (7594286) gdbserver exited with 0.
INTEL_GT_DEBUGGER: (7595134) Attempt 1/3 failed: One or more errors occurred.
INTEL_GT_DEBUGGER: (7596136) Attempt 2/3 failed: One or more errors occurred.
INTEL_GT_DEBUGGER: (7597138) Attempt 3/3 failed: One or more errors occurred.
INTEL_GT_DEBUGGER: (7597155) Unable to connect to remote target localhost.
Please make sure that the target machine is accessible.

I have also found that Intel OpenCL platforms are completely unavailable within debug sessions unless I disable “Enable OpenCL API Debugger” in Code-Builder. Is that expected behavior?

Does cv sdk (l_openvino_toolkit_p_2018.1.249.tgz) support Atom(TM) x5-Z8350 OpenCL?

$
0
0

Hi,

After installing OpenCL NEO driver using script install_NEO_OCL_driver.sh from SDK l_openvino_toolkit_p_2018.1.249.tgz we observe such output of clinfo tool.

$ clinfo
Number of platforms                               0

The kernel 4.14.20 was also installed using script install_4_14_kernel.sh. We're using up to date Ubuntu 16.04 LTS.

$ uname -a
Linux ubuntu-upboard 4.14.20-041420-generic #201802162247 SMP Fri Feb 16 22:48:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.4 LTS"

The CPU is Intel(R) Atom(TM) x5-Z8350  CPU @ 1.44GHz.

We tried beignet-opencl-icd drivers instead and they look working (at least for clinfo). But with beignet drivers we still unable to load model to clDNN plugin. In the beignet case the classification_sample in GPU mode finishes with:

[ ERROR ] failed to create engine: No OpenCL device found which would match provided configuration:
    Intel(R) HD Graphics Cherryview: missing out of order support

So do we have a chance to get classification_sample working in GPU mode with SDK l_openvino_toolkit_p_2018.1.249.tgz on our platform with x5-Z8350?

Thanks.

 

 

Intel OpenCL Platform Disappears in Debug Mode

$
0
0

I am using Visual Studio 2017 version 15.4.5 with “Intel SDK for OpenCL for Windows” version 7.0.0.2567.

If I launch an OpenCL project from Visual Studio without debugging, clGetPlatformIDs(0, NULL, &n) gives 3 platforms. If I get details on the 3 platforms, they are:
1) Nvidia
2) AMD
3) Intel

The Intel platform contains the Intel CPU and integrated Intel GPU for this machine.

However, if I launch an OpenCL project from Visual Studio with debugging, clGetPlatformIDs(0, NULL, &n) only gives 2 platforms. The platforms in that case are:
1) Nvidia
2) AMD

Strangely, the OpenCL API trace view shows that clGetPlatformIDs returned an error code of -1001, but my code sees a return value of CL_SUCCESS. I see that happen even in the simplest possible program:

cl_int clErr = 9999; // Initialize to something invalid
cl_uint n;
clErr = clGetPlatformIDs(0, NULL, &n);
// n is now 2 if running in debug mode, and clErr is now CL_SUCCESS, but
// the Trace View says that clGetPlatformIDs returned error code -1001

How does clGetPlatformIDs simultaneously return both CL_SUCCESS and -1001 (CL_PLATFORM_NOT_FOUND_KHR) in a single call?

Why does the Intel OpenCL platform not appear when debugging? I want to debug code that uses the Intel OpenCL platform. I have the same issue even when running Intel’s OpenCL template project, so it does not appear to be a Visual Studio project setting problem.

I have found that if I disable “OpenCL API Debugger” in the Code-Builder plugin in Visual Studio, the Intel platform appears when debugging. However, disabling the OpenCL API debugger seems to defeat the purpose of wanting to debug OpenCL.


Installation problem openCL sdk

Install intel sdk for opencl applications

$
0
0

Hi,

do I need to use the script shared at https://software.intel.com/en-us/articles/sdk-for-opencl-gsg for installing intel sdk even though my ubuntu kernel version is 4.15 (ubuntu 18.04 version) ?

I looked into script, it's installing kernel 4.7 and applying some intel related patch set. Can you point me to the script which can be used on Ubuntu 18.04 setup for installing intel sdk (opencl)? 

Thanks.

SVM Fine-Grained Buffer Atomics Regression with NEO Drivers for Intel GPU on Win10

$
0
0

I wrote a simple test to prove that SVM fine-grained buffer atomics work on my system. The Intel OpenCL driver says that my GPU (HD Graphics 630) supports fine-grained buffers with atomics.

If I use any NEO driver version, including version 24.20.100.6136 currently listed on Intel’s website, the test fails. With NEO drivers, memory is not synchronizing between the CPU and GPU when I make an atomic access. If a GPU kernel makes an atomic modification to shared virtual memory, the CPU cannot see the change until the entire GPU kernel (not just the atomic access) has completed.

If I downgrade to a pre-NEO driver (confirmed with 22.20.16.4836 and 21.20.16.4590), I see that memory is synchronizing with each atomic access as expected. When the GPU makes an atomic modification to SVM, the CPU sees the change immediately while the GPU kernel is still running.

I have been trying to implement an approach like Intel’s presentation “GPU Daemon: Road to Zero Cost Submission,” but that is impossible without working shared atomics.

When will SVM atomics be supported by the NEO driver?

Install Intel OpenCL SDK on Ubuntu 16.04

$
0
0

I'd like to ask for help on installing Intel OpenCL SDK as described here https://software.intel.com/en-us/articles/sdk-for-opencl-gsg 

I have followed those steps but then I need to run SDK installer or use this archive to run installer from which I have downloaded from https://software.intel.com/en-us/intel-opencl/download

 

> intel_sdk_for_opencl_2017_7.0.0.2568_x64

 

This file is not executable and I have mounted it as archive with "Archive Mounter" but executing the install I was quit. 

 

    ubuntu@ubuntu:/run/user/1000/gvfs/archive:host=file%253A%252F%252F%252Fhome%252Fubuntu%252FDownloads%252Fintel%252Fintel_sdk_for_opencl_2017_7.0.0.2568_x64/intel_sdk_for_opencl_2017_7.0.0.2568_x64$ ./install.sh 

    Error: Incorrect path to setup script. Setup can not be started

    if the path contains ':, ~, @, #, %, &, [, ], $, =, ), (, *' symbols.

    

    Quitting!

 

 

Anyone has experience on how to install Intel OpenCL SDK under Linux (Ubuntu)? 

 

 

  [1]: https://software.intel.com/en-us/articles/sdk-for-opencl-gsg"Linux with Intel® SDK for OpenCL"

  [2]: https://software.intel.com/en-us/intel-opencl/download"Intel® SDK for OpenCL"

how to compile genFFT

$
0
0

Hi, I download the source code of genFFT from this link: https://software.intel.com/en-us/articles/genFFT

but how to compile it? I use VS to open the solution and compile but with many error such like: can not open "mkl_dfti.h". is there anything i need to install to compile genFFT? 

 

 

 

Viewing all 1182 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>