Quantcast
Channel: Intel® Software - OpenCL*
Viewing all 1182 articles
Browse latest View live

OpenCL driver / INDE issue on Win10

$
0
0

I have a HP Z230 PC with an integrated Intel HD 4600 GPU.

On this PC with Windows 7 64bit, my OpenCL program runs correctly. After upgrading to Windows 10 64bit, the call to clSetKernelArg fails and returns CL_INVALID_ARG_SIZE. No code changes on my side.

On windows 10, the driver version is 10.18.15.4256. I tried to upgrade to the newest driver version (15.40.4.64.4256). The installation seems to succeed, but installs version 10.18.15.4256 again.

I also installed INDE 2015 update 2. The installer complains that the OpenCL driver is missing or out of date, but gives the option to install INDE anyway, which I did. The command line OpenCL compiler, ioc64, is not working however. It just doesn't produce any output. Using the INDE OpenCL code builder, building the OpenCL kernel produces error message IOC engine exited with code -1073740791.

So the current status is:

  • OpenCL code that runs fine on Win7, does not run on the same hardware on Win10: clSetKernelArg returns CL_INVALID_ARG_SIZE.
  • Trying to install the newest OpenCL driver for the HD 4600 on Win10 just reinstalls the current driver
  • INDE 2015 update 2 does not compile an OpenCL kernel on Win10: it doesn't produce any output using the command line compiler. Using the INDE IDE it produces the error message above.

My suspicion is that there is something not right with the driver. Can anyone confirm that, or point me to another possible cause? Thanks in advance!


Unhandled exception IntelOpenCLProfiler.dll

$
0
0

Hi,

I have just installed OpenCL Code builder on top of Visual 2013 and followed the user manual steps to run the OCL Kernel development and Application analysis. However, I have trouble debugging with OpenCL API debugger : when debugging the template project in CPU-mode, it stops at this line in FindOpenCLPlatform function of OpenCLProjectCodeBuilder.cpp  :

    err = clGetPlatformIDs(0, NULL, &numPlatforms);

with the following message :

Unhandled exception at 0x56B1AD75 (IntelOpenCLProfiler.dll) in OpenCLProjectCodeBuilder.exe: 0xC0000005: Access violation reading location 0x00000000.

My configuration is as follows :

CPU : Corei5-4300M

GPU : AMD Hainan and Intel HD Graphics 4600 (the latter is not listed in the Code Builder Platform Info Tree)

VisualStudio 2013

Code Builder installed from MSS Pro 2015R6 (Code Buider API Debugger 4.0.0.1)

Device : Intel(R) CPU (-device=CPU)

Thanks for your help !

Marc

OpenCL SDK and the Atom E3845 on Linux

$
0
0

From what I've read, it seems that the Atom E3800 line of processor is not currently supported by the OpenCL SDK or the OpenCL drivers for doing work on the GPU under Linux. Is that correct? And if so, is there a future release that may add support?

Thanks.

Critical Update to Intel(R) INDE OpenCL Code Builder

$
0
0

A new update of OpenCL Code Builder (ver. 5.2) was released in Intel INDE Update 2. You can download it directly from https://registrationcenter.intel.com/download.aspx?ProductID=2376.

The new update includes critical bug fixes for Windows 10 and Visual Studio 2015:

  • Build failures with Intel OpenCL Offline Compiler on Windows 10

  • Empty Code Builder project generation failure with Jump start kit on Visual studio 2015

  • Stability improvements in API debugger and Code Analyzer

Enqueing many kernels leads to a hang on HD 4600

$
0
0

The reproducer is attached.

In it, I am enqueuing 1000 instances of the same kernel, each subsequent instance being made dependent on the previous one. Then I just wait for the event associated with the last kernel enqueued. The kernel just zeroes out a buffer. This issue doesn't reproduce on a no-op kernel.

Expected result: application finishes successfully.

Actual result: application hangs.

I get the expected result on Intel CPU and NVidia GPU devices, but a hang on Intel HD 4600 GPU.

I am running Windows 8.1 with the latest 10.18.14.4264 Intel graphics driver.

AttachmentSize
Downloadtest.cpp2.1 KB

Poor performance with opencl CPU driver

$
0
0

Link to source
http://pastebin.com/FyZkMrvQ

Used Intel® software was OpenCL CPU driver opencl_runtime_15.1_x64_5.0.0.57 from https://software.intel.com/en-us/articles/opencl-drivers#lin64

Compare Beignet (GPU, id 0) vs Intel® proprietary driver (CPU, id 1) vs pocl (CPU, id 2)

user@host:~/.dev/OpenCL$ gcc perftest.c -std=c11 -O2 -lOpenCL -o perftest
user@host:~/.dev/OpenCL$ for id in 0 1 2; do time ./perftest $id; done
Succeeded to create a device group!
    Device: 0
        Name:                Intel(R) HD Graphics Haswell Ultrabook GT2 Mobile
        Vendor:                Intel
        Available:            Yes
        Compute Units:            20
        Clock Frequency:        1000 mHz
        Global Memory:            2048 mb
        Max Allocateable Memory:    1024 mb
        Local Memory:            65536 kb

Succeeded to create a compute context!
Succeeded to create a command commands!
Succeeded to create compute program!
Succeeded to create program executable!
Succeeded to create compute kernel!

real    0m25.741s
user    0m0.604s
sys    0m17.796s
Succeeded to create a device group!
    Device: 1
        Name:                Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz
        Vendor:                Intel(R) Corporation
        Available:            Yes
        Compute Units:            4
        Clock Frequency:        1600 mHz
        Global Memory:            5664 mb
        Max Allocateable Memory:    1416 mb
        Local Memory:            32768 kb

Succeeded to create a compute context!
Succeeded to create a command commands!
Succeeded to create compute program!
Succeeded to create program executable!
Succeeded to create compute kernel!

real    0m50.082s
user    1m21.951s
sys    0m40.065s
Succeeded to create a device group!
    Device: 2
        Name:                pthread-Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz
        Vendor:                GenuineIntel
        Available:            Yes
        Compute Units:            4
        Clock Frequency:        2600 mHz
        Global Memory:            5664 mb
        Max Allocateable Memory:    5664 mb
        Local Memory:            1643847680 kb

Succeeded to create a compute context!
Succeeded to create a command commands!
Succeeded to create compute program!
Succeeded to create program executable!
Succeeded to create compute kernel!

real    0m28.620s
user    0m49.843s
sys    0m4.252s

My clinfo output: http://pastebin.com/30jkBzzs
Looks strange - open source library pocl (http://portablecl.org) beats official Intel® software in such simple test case (don't look at "Clock Frequency" reported - when loaded it runs at 2300 MHz in both cases). If it isn't bug in my system - maybe it will be better for Intel® to support pocl (which still has a lot of problem with standards support and stability) in stead of development own driver?

Trouble installing Linux GPU drivers for OpenCL

$
0
0

I've been trying for hours to install the Intel Media Server Studio 2015 R6 – Community Edition for Linux so that I can use the GPU on my Xeon E3-1246v3, but no matter what I try I can't seem to get it to work.  I'm using Fedora 22 64-bit and am following the instructions for Generic (really, only official support CentOS 7.1?).  I extracted mediaserverstudioessentials2015r6.tar.gz, then within that extracted SDK2015Production16.4.2.1.tar.gz, but the install_media.sh script doesn't seem to support Fedora (no warning about what dependencies I may be missing).  I tried to manually do everything in the install script, but after I finished and rebooted running a simple clinfo doesn't show any Intel GPU device.  Please help, this shouldn't be so difficult and the install should be much cleaner than a tar containing many tars and rpms and all the burden on the user.  Also, it would be important to know if and how the driver can be installed to work with Secure Boot enabled.  Thanks!

P.S., Please provide all CPU/GPU/Phi drivers and run-times in a single standalone installation with the option to select target devices, rather than some as individual standalone installations (and old, e.g., 14.2 and 15.1 found at https://software.intel.com/en-us/articles/opencl-drivers) and others packaged in large development suites that differ substantially OS (e.g., .Intel Integrated Native Developer Experience and Intel Media Server Studio).  Ideally there should be a simple driver/run-time install file and then an SDK file for all the tools and static libraries.

OpenCL application blocked from accessing graphics hardware

$
0
0

Hi,

I've been writing some OpenCL applications for my research project. These applications contain kernels which can run for a long time (~15 seconds). I am running into issues when running these kernels on my intel HD 5500 on Windows 8.1.

I've modified the TdrDelay registry value to try to avoid the system from killing the kernels for running for too long. I've set it to 30 seconds as of now.

Now, I am able to successfully run my kernels about 1/10 of the time. The rest of the times I get an error:

"Application has been blocked from accessing Graphics hardware"

This error seems to occur within 2 seconds of launching the kernel.

My driver version is: 10.18.14.4029

When the application runs to completion, the results are very promising so I'm really hoping to get this working more reliably.

Thanks a lot!


No image object were created with Intel HD 4400 and 4600

$
0
0

Hello, 

As you can see the subject, I can't create image object with my Intel HD graphics.

I have 3 computers that each computer has 'NVidia', 'Intel HD 4400', 'Intel HD 4600' graphic chip-set. Initially made program and kernels with NVidia first. And now, I tried to run my application on other computers but it failed.

Intel HD graphic chip-set returns 'CL_INVALID_IMAGE_DESCRIPTOR(-65)' for create image with cl::Image2D(...) .

I am using Windows 7, Intel OpenCL SDK for build. and 10.18.10.3496 driver for Intel HD graphics. I have saw Intel HD graphics OpenCL driver version is 1.2 and succeed image creation from OpenGL texture sharing. So, I tested which image format is supported by Intel HD graphics.

Here is the source code for doing test,

const cl_channel_order channel_order_value[] = {
CL_R,
CL_A,
CL_RG,
CL_RA,
CL_RGB,
CL_RGBA,
CL_BGRA,
CL_ARGB,
CL_INTENSITY,
CL_LUMINANCE,
CL_Rx,
CL_RGx,
CL_RGBx,
CL_DEPTH,
CL_DEPTH_STENCIL
};

const cl_channel_type channel_type_value[] = {
CL_SNORM_INT8,
CL_SNORM_INT16,
CL_UNORM_INT8,
CL_UNORM_INT16,
CL_UNORM_SHORT_565,
CL_UNORM_SHORT_555,
CL_UNORM_INT_101010,
CL_SIGNED_INT8,
CL_SIGNED_INT16,
CL_SIGNED_INT32,
CL_UNSIGNED_INT8,
CL_UNSIGNED_INT16,
CL_UNSIGNED_INT32,
CL_HALF_FLOAT,
CL_FLOAT,
CL_UNORM_INT24
};

const char* channel_order_name[] = { "CL_R","CL_A","CL_RG","CL_RA","CL_RGB","CL_RGBA","CL_BGRA","CL_ARGB","CL_INTENSITY","CL_LUMINANCE","CL_Rx","CL_RGx","CL_RGBx","CL_DEPTH","CL_DEPTH_STENCIL"
};

const char* channel_type_name[] = { "CL_SNORM_INT8","CL_SNORM_INT16","CL_UNORM_INT8","CL_UNORM_INT16","CL_UNORM_SHORT_565","CL_UNORM_SHORT_555","CL_UNORM_INT_101010","CL_SIGNED_INT8","CL_SIGNED_INT16","CL_SIGNED_INT32","CL_UNSIGNED_INT8","CL_UNSIGNED_INT16","CL_UNSIGNED_INT32","CL_HALF_FLOAT","CL_FLOAT","CL_UNORM_INT24"
};

void doImageCreationTest()
{
    cl_int err = CL_SUCCESS;

    for (int order = 0; order < 15; ++order)
    {
        for (int type = 0; type < 16; ++type)
        {
            err = CL_SUCCESS;
            cl::Image2D img = cl::Image2D(context, CL_MEM_READ_WRITE, cl::ImageFormat(channel_order_value[order], channel_type_value[type]), 512, 512, 0, 0, &err);
            if (err == CL_SUCCESS)
            {
                cout << "=> "<< channel_order_name[order] << " with "<< channel_type_name[type] << endl;
            }
        }
    }
}

result:

[NVIDIA]

=> CL_R with CL_SNORM_INT8
=> CL_R with CL_SNORM_INT16
=> CL_R with CL_UNORM_INT8
=> CL_R with CL_UNORM_INT16
=> CL_R with CL_SIGNED_INT8
=> CL_R with CL_SIGNED_INT16
=> CL_R with CL_SIGNED_INT32
=> CL_R with CL_UNSIGNED_INT8
=> CL_R with CL_UNSIGNED_INT16
=> CL_R with CL_UNSIGNED_INT32
=> CL_R with CL_HALF_FLOAT
=> CL_R with CL_FLOAT
=> CL_A with CL_SNORM_INT8
=> CL_A with CL_SNORM_INT16
=> CL_A with CL_UNORM_INT8
=> CL_A with CL_UNORM_INT16
=> CL_A with CL_SIGNED_INT8
=> CL_A with CL_SIGNED_INT16
=> CL_A with CL_SIGNED_INT32
=> CL_A with CL_UNSIGNED_INT8
=> CL_A with CL_UNSIGNED_INT16
=> CL_A with CL_UNSIGNED_INT32
=> CL_A with CL_HALF_FLOAT
=> CL_A with CL_FLOAT
=> CL_RG with CL_SNORM_INT8
=> CL_RG with CL_SNORM_INT16
=> CL_RG with CL_UNORM_INT8
=> CL_RG with CL_UNORM_INT16
=> CL_RG with CL_SIGNED_INT8
=> CL_RG with CL_SIGNED_INT16
=> CL_RG with CL_SIGNED_INT32
=> CL_RG with CL_UNSIGNED_INT8
=> CL_RG with CL_UNSIGNED_INT16
=> CL_RG with CL_UNSIGNED_INT32
=> CL_RG with CL_HALF_FLOAT
=> CL_RG with CL_FLOAT
=> CL_RA with CL_SNORM_INT8
=> CL_RA with CL_SNORM_INT16
=> CL_RA with CL_UNORM_INT8
=> CL_RA with CL_UNORM_INT16
=> CL_RA with CL_SIGNED_INT8
=> CL_RA with CL_SIGNED_INT16
=> CL_RA with CL_SIGNED_INT32
=> CL_RA with CL_UNSIGNED_INT8
=> CL_RA with CL_UNSIGNED_INT16
=> CL_RA with CL_UNSIGNED_INT32
=> CL_RA with CL_HALF_FLOAT
=> CL_RA with CL_FLOAT
=> CL_RGB with CL_SNORM_INT8
=> CL_RGB with CL_SNORM_INT16
=> CL_RGB with CL_UNORM_INT8
=> CL_RGB with CL_UNORM_INT16
=> CL_RGB with CL_UNORM_SHORT_565
=> CL_RGB with CL_UNORM_SHORT_555
=> CL_RGB with CL_UNORM_INT_101010
=> CL_RGB with CL_SIGNED_INT8
=> CL_RGB with CL_SIGNED_INT16
=> CL_RGB with CL_SIGNED_INT32
=> CL_RGB with CL_UNSIGNED_INT8
=> CL_RGB with CL_UNSIGNED_INT16
=> CL_RGB with CL_UNSIGNED_INT32
=> CL_RGB with CL_HALF_FLOAT
=> CL_RGB with CL_FLOAT
=> CL_RGBA with CL_SNORM_INT8
=> CL_RGBA with CL_SNORM_INT16
=> CL_RGBA with CL_UNORM_INT8
=> CL_RGBA with CL_UNORM_INT16
=> CL_RGBA with CL_SIGNED_INT8
=> CL_RGBA with CL_SIGNED_INT16
=> CL_RGBA with CL_SIGNED_INT32
=> CL_RGBA with CL_UNSIGNED_INT8
=> CL_RGBA with CL_UNSIGNED_INT16
=> CL_RGBA with CL_UNSIGNED_INT32
=> CL_RGBA with CL_HALF_FLOAT
=> CL_RGBA with CL_FLOAT
=> CL_BGRA with CL_SNORM_INT8
=> CL_BGRA with CL_SNORM_INT16
=> CL_BGRA with CL_UNORM_INT8
=> CL_BGRA with CL_UNORM_INT16
=> CL_BGRA with CL_SIGNED_INT8
=> CL_BGRA with CL_SIGNED_INT16
=> CL_BGRA with CL_SIGNED_INT32
=> CL_BGRA with CL_UNSIGNED_INT8
=> CL_BGRA with CL_UNSIGNED_INT16
=> CL_BGRA with CL_UNSIGNED_INT32
=> CL_BGRA with CL_HALF_FLOAT
=> CL_BGRA with CL_FLOAT
=> CL_ARGB with CL_SNORM_INT8
=> CL_ARGB with CL_SNORM_INT16
=> CL_ARGB with CL_UNORM_INT8
=> CL_ARGB with CL_UNORM_INT16
=> CL_ARGB with CL_SIGNED_INT8
=> CL_ARGB with CL_SIGNED_INT16
=> CL_ARGB with CL_SIGNED_INT32
=> CL_ARGB with CL_UNSIGNED_INT8
=> CL_ARGB with CL_UNSIGNED_INT16
=> CL_ARGB with CL_UNSIGNED_INT32
=> CL_ARGB with CL_HALF_FLOAT
=> CL_ARGB with CL_FLOAT
=> CL_INTENSITY with CL_SNORM_INT8
=> CL_INTENSITY with CL_SNORM_INT16
=> CL_INTENSITY with CL_UNORM_INT8
=> CL_INTENSITY with CL_UNORM_INT16
=> CL_INTENSITY with CL_SIGNED_INT8
=> CL_INTENSITY with CL_SIGNED_INT16
=> CL_INTENSITY with CL_SIGNED_INT32
=> CL_INTENSITY with CL_UNSIGNED_INT8
=> CL_INTENSITY with CL_UNSIGNED_INT16
=> CL_INTENSITY with CL_UNSIGNED_INT32
=> CL_INTENSITY with CL_HALF_FLOAT
=> CL_INTENSITY with CL_FLOAT
=> CL_LUMINANCE with CL_SNORM_INT8
=> CL_LUMINANCE with CL_SNORM_INT16
=> CL_LUMINANCE with CL_UNORM_INT8
=> CL_LUMINANCE with CL_UNORM_INT16
=> CL_LUMINANCE with CL_SIGNED_INT8
=> CL_LUMINANCE with CL_SIGNED_INT16
=> CL_LUMINANCE with CL_SIGNED_INT32
=> CL_LUMINANCE with CL_UNSIGNED_INT8
=> CL_LUMINANCE with CL_UNSIGNED_INT16
=> CL_LUMINANCE with CL_UNSIGNED_INT32
=> CL_LUMINANCE with CL_HALF_FLOAT
=> CL_LUMINANCE with CL_FLOAT
=> CL_DEPTH with CL_SNORM_INT8
=> CL_DEPTH with CL_SNORM_INT16
=> CL_DEPTH with CL_UNORM_INT8
=> CL_DEPTH with CL_UNORM_INT16
=> CL_DEPTH with CL_SIGNED_INT8
=> CL_DEPTH with CL_SIGNED_INT16
=> CL_DEPTH with CL_SIGNED_INT32
=> CL_DEPTH with CL_UNSIGNED_INT8
=> CL_DEPTH with CL_UNSIGNED_INT16
=> CL_DEPTH with CL_UNSIGNED_INT32
=> CL_DEPTH with CL_HALF_FLOAT
=> CL_DEPTH with CL_FLOAT

But Intel HD cannot create any type of images.

What would I do for Intel HD graphics?

I am sorry for my poor English.

Out of memory error from clBuildProgram

$
0
0

After adding too many lines to my kernels, clBuildProgram() is returning the error CL_BUILD_PROGRAM_FAILURE from the driver. clGetProgramBuildInfo() returns the string "Error: out of memory." and nothing else. If I remove enough lines of code from my OpenCL code, the error goes away. If I change the device from CL_DEVICE_TYPE_GPU to CL_DEVICE_TYPE_CPU the error goes away. The total number of lines of code in my program is about 900.

Processor: 2.2 Ghz Intel Core i7

Graphics: Intel Iris Pro 1536MB

OSX 10.10.5

I'm using whatever comes standard with that version of OSX, I haven't installed anything else, not even XCode.

My problem seems to be identical to that of this person: https://software.intel.com/en-us/forums/opencl/topic/559994

I'm trying to get clearance to post the source code, but in the meantime any ideas?

problem with profiling

$
0
0

hello !

I have an issue with an OpenCL application that compute matrix multiplication.

In particular i think that the problem is related to "clGetEventProfilingInfo" function. If i execute the program by using the CPU (Intel Core i5-4300U) all works fine and by using "clGetEventProfiling" function it calculates the execution time correctly.

Instead, if i use GPU (Intel HD4400), all works fine if i don't use the "clGetEventProfilingInfo".When i use "clGetEventProfilingInfo" to calculate the execution time and set a local work size in "clEnqueueNDRangeKernel" the program crashes and i don't understand why (instead, if i use "NULL" for local work size parameter in "clEnqueueNDRangeKernel" all seems to work) .Using Visual Studio debugger i think it's a "access violation" problem but i'm not shure.

This is the code of application : https://www.friendpaste.com/2NIpYvk8R96S01kFD3H3Gl

Can someone help me?

State of Headless Mode HD Graphics OpenCL

$
0
0

Hi,

What's the current (driver) state of accessing the compute capabilities of the an integrated HD graphics processor with OpenCL when there is NO display connected? I do not see the my HD 4000 when the display is connected to the discreet GPU.  

Driver version: 10.18.10.4252

Is there any software hack to this (not the dummy plug solution) ?

Thanks,

Florian

Applications/examples which uses OpenCL 2.0 features (SVM, GAS, C11)

$
0
0

Hello..

Any one having idea on third party application available in the market which uses OpenCL 2.0 features (SVM, GAS, C11) ?

Applications like Adobe Photoshop CC, Auto Desk Maya, Sony Vegas Pro 12, etc ... I guess these applications were not implemented using OpenCL 2.0 features.

Any other applications/examples implemented with the 2.0 features?
Any university examples also fine for me....

Windows 7 32 Bit

$
0
0

Is there anyway to find an older SDK that will install on windows 7 32 bit?

Thanks,

Darren

 

Interpreting the timeline in Platform analyzer?

$
0
0

Hi,

I am running my OpenCL application on an Intel HD 530 graphics device and experience exessive time usage for some kernels. On all previous devices I have tried (including earlier Intel HD), the measured time of my kernel during repeating calls has been fairly constant. On the HD 530 graphics, the execution time jumps between 3ms (normal) to 40ms(!). 

I got two questions:

  1. In the attached sceren shot, the Platform Analyzer shows me the execution times for my kernels, but I don't know how I should interpret this, as each kernel is displayed several times, on different lines, with different execution times. Is the top line of colored boxes, the "actual execution time" and the lower ones indicate when the command was issued from CPU?
  2. Do you have any idea why my execution time would jump between 3ms, which is the normal, and 40ms? When I enable more computation in my kernel I can see the 3ms is increasing, but the 40ms is constant. As I mentioned above, this has never happened on other devices (NVIDIA, AMD, Intel), which makes me think there is something else going on on the GPU.

Thanks

 

AttachmentSize
DownloadOpenCL profiler.png21.88 KB

Linux Repository Request

$
0
0

If Intel doesn't already (link would greatly be appreciated if you do), I'd like to request Intel to put their proprietary Linux OpenCL drivers and runtimes for Intel CPUs, GPUs, and Phi's in a repository like rpmfusion similar to NVIDIA's akmod and xorg packages so I can hassle-free install using yum or dnf and stay up-to-date with the latest official versions.  Thanks!

OpenCL CPU-only runtime version 5.2.*?

$
0
0

Dear all,

some of our users reported that they can no longer use our OpenCL-based software on their machines. After some investigations, we noticed that they were using the CPU-only OpenCL runtime with driver version 5.2.*. By "5.2.*" I mean the version number reported by OpenCL as the "Driver version" and not the runtime version reported here for download (like 14.2 and 15.1):

https://software.intel.com/en-us/articles/opencl-drivers

The same binaries of our software are working fine under all the previous versions of the Intel CPU-only runtime (as well as under platforms other than Intel). I've then a couple of questions:

- Where can I download the CPU-only runtime with driver version 5.2.*? At best, I can download runtime version 15.1 which contains the driver version 5.0.0.57 (and it works nicely with our software). It seems that this updated version is installed only with some Intel HD Graphics driver.

- Did you make some specific upgrades from driver version 5.0 to 5.2 that make the newer CPU-only runtime non-backward compatible?

Thanks a lot,
Achille

image2d_t direct pixel access with vload/vstore

$
0
0

Hi all, I have question regarding the read/write of image2d_t pixels and hope someone can post a solution.

I am using MediaSDK to decompress images. After decompression the picture (NV12) resides in an IDirect3DSurface9.
With
    cl_mem memY = clCreateFromDX9MediaSurfaceKHR(context, CL_MEM_READ_ONLY, CL_ADAPTER_D3D9EX_KHR, &surfaceIn, 0, &err);
and    
    clEnqueueAcquireDX9MediaSurfacesKHR(queue, 1, memY, 0, 0, 0));
I got a cl_mem handle(which is image2d_t type) and can be passed to my kernel
    clSetKernelArg(m_kernel, 1, sizeof(cl_mem), (void*)&memY); // srcImg
    
Now it's possible to use it in my kernel
__kernel void Dummy(__read_only image2d_t srcY)
{
    ...
    uint16 pix;
    for (int i=0; i<16; i++)
    {
        float4 val= read_imagef(srcY, CLK_FILTER_NEAREST, sCoord);
        pix[0]= convert_uint(val.x*255); // val.x  is y value
    }
    ...    
 This works pretty fine, but the performance of read_imagef (single pixel access) is very low.

As explained in the Sobel tutorial (https://software.intel.com/en-us/videos/optimizing-simple-opencl-kernels...), I would like to access the pixels in the form of uchar* like:

    __global uchar* pSrcImage;
    uint16 pix = convert_uint16(vload16(0, pSrcImage));
    
to read 16 pixel (256 bit) in a single memory access from the Y plane of the NV12 surface. This is possible, when I create a cl_mem with clCreateBuffer(), but I did not find a way to get access to the image2d_t data. The only way (I found) to read the pixels from image2d_t is with read_imagef() which is very slow.

My questions are:

How can I read the pixels of an image2d_t with vload() ?

or

Can I convert the DirectX surface to a cl_mem which is a "flat" buffer, and not a image2d_t?

Thanks for any help

 

Line of code that fails with Intel driver, but works with AMD and NVIDIA.

$
0
0

Hello,

following line of code (OpenCL C++ bindings 1.2) fails with Intel runtime 14.2 (on ArchLinux) -  (segmentation fault), but works with AMD and NVIDIA.

`cl_context_properties cps = getContext(queue).getInfo<CL_CONTEXT_PROPERTIES>()[1];`

Is this a bug in the Intel driver?

The context is created in this constructor:

https://github.com/AvtechScientific/ASL/blob/a14703fa5e4ec933248a5b9ed17...

Context properties are passed here:

https://github.com/AvtechScientific/ASL/blob/a14703fa5e4ec933248a5b9ed17...
https://github.com/AvtechScientific/ASL/blob/a14703fa5e4ec933248a5b9ed17...

Here is the getContext():

https://github.com/AvtechScientific/ASL/blob/a14703fa5e4ec933248a5b9ed17...

Everything fails on this line:
https://github.com/AvtechScientific/ASL/blob/a14703fa5e4ec933248a5b9ed17...

once called here with the queues[i] being an Intel device
https://github.com/AvtechScientific/ASL/blob/a14703fa5e4ec933248a5b9ed17...

It looks like `getInfo<CL_CONTEXT_PROPERTIES>()` returns garbage...

Thank you.

 

An OpenCL kernel that reads/writes textures causes access violation when jitted for CPUs that support up to SSE4

$
0
0

Hi,

I have encountered a possible code generation bug in the OpenCL runtime compiler for Intel CPUs on Windows platforms.

Please find attached an archive of source code (ocltest.zip) that reproduces the bug. It is a CMake project and you can build it with, e.g., the following commands:

$ unzip ocltest.zip
$ mkdir ocltest-build
$ cd ocltest-build/
$ cmake -G "Visual Studio 12 2013" -A "x64" ../ocltest/
$ cmake --build . --config RelWithDebInfo

Note that you need CMake, Visual Studio 2013 (or 2015), and an OpenCL SDK (Intel INDE or CUDA).

If I run the resulting executable (oclellipticpde.exe) on a PC with the following configuration

Intel Core i7 3770K @ 3.50 GHz, 8 GB RAM, Windows 10 x64

it terminates normally and we obtain the following output:

CL_PLATFORM_NAME: Intel(R) OpenCL
CL_DEVICE_NAME:        Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
CL_DEVICE_VERSION: OpenCL 1.2 (Build 57)
CL_DRIVER_VERSION: 5.0.0.57
done

However, if I run the same executable on another PC with

Dual Intel Xeon X5650 @ 2.67 GHz, 24 GB RAM, Windows 7 x64

it crashes after the following output:

CL_PLATFORM_NAME: Intel(R) OpenCL
CL_DEVICE_NAME: Intel(R) Xeon(R) CPU           X5650  @ 2.67GHz
CL_DEVICE_VERSION: OpenCL 1.2 (Build 57)
CL_DRIVER_VERSION: 5.0.0.57

I have also attached a log of WinDbg for this. The faulting code locates on a rather low address space that does not correspond to any module so it is likely to be the jitted code.

Since the kernel works on other CPUs (and also on GPUs), I suppose it is correct. I am not quite sure exactly what configuration can cause the crash, I suspect the CPU architecture (i.e., SSE4) matters.

Can anyone reproduce the problem or point out what is wrong with the code or any workaround?

Thank you.

Yousuke

AttachmentSize
Downloadocltest.zip3.27 KB
Downloadwindbg_log.txt17.38 KB
Viewing all 1182 articles
Browse latest View live