Quantcast
Channel: Intel® Software - OpenCL*
Viewing all 1182 articles
Browse latest View live

Request: provide Linux OpenCL 2.1 support for Skylake iGPU like new Windows 15.47 drivers..

$
0
0

Hi, was impressed to see OpenCL 2.1 support exposed on Intel HD 530 iGPU (Skylake) in new Intel 22.20.16.4785 WHQL drivers.. it supports also SPIR-V.. I checked it.. previously was only a Kabylake iGPU only support.. so that means that future SRB5.x/6.0 Linux OpenCL graphics drivers will also add support for OpenCL 2.1 to Skylake iGPUs like HD 530? thanks..

 

 

 

 


What to do when kernel hangs OpenCL on Win10?

$
0
0

I've been debugging some new kernels and have managed to hang IOC (or something deeper).

After the debugging session I can't compile any more kernels.

Is there a convenient way to restart/reset the OpenCL subsystem so that I don't have to reboot?

MS VS2017 warning message after installing OpenCL package

$
0
0

After installing the recent OpenCL package by executing intel_aocl_setup_7.0.0.2519.exe, I reveive the warning message

'The Scc Display Information' package did not load correctly

after starting either MS Visual Studio 2015 or 2017. Both have got all recent updates.

Computer is MS Surface Pro 4, i5 6300, HD 520 graphics, Windows 10.

According to https://stackoverflow.com/questions/36358630/visual-studio-2015-with-upd... I could not find a registry's HKLM key: SOFTWARE\WOW6432Node\SourceCodeControlProvider.

I do not experience any negative influence for now. The test package intel_ocl_caps_basic_win is running well.

What may be the reason for this message and how to avoid it?

Holger

Intel® SDK for OpenCL™ for Windows* Intel® Software Setup Assistant ended prematurely due to error(s).

$
0
0

Intel® SDK for OpenCL™ for Windows* Intel® Software Setup Assistant ended prematurely due to error(s).
 
Installer logs location: C:\Users\SP\AppData\Local\Temp\intel_tmp_SP\2017.10.10_15.33.17_00002798\log\
To install this program at a later time, run Intel® Software Setup Assistant again. Click the Finish button to exit the Intel® Software Setup Assistant.

I got this message when I tried installing this software. It would be really helpful if anyone could help me fix this issue.

Thank You

How to debug segfault in an OpenCL kernel?

$
0
0

My kernel works on AMD and NVIDIA OpenCL, but for some specific test, it fails on Intel ocl with i7-7700k. it threw a segfault and crashed.

I want to find out where in my code had caused the issue. I wish there are some debugging tools, like valgrind, to print out the offending lines. 

I installed oclgrind on my Ubuntu 16.04 box, for a small workload, it worked without any issue, no error was captured. but for large enough workload, oclgrind crashed too without printing anything useful related to my kernel! 

simulation run# 1 ...  /usr/bin/oclgrind: line 145:  7204 Killed                  LD_LIBRARY_PATH=$LIBDIR:$LD_LIBRARY_PATH LD_PRELOAD=$LIBDIR/liboclgrind-rt.so "$@"

does Intel OCL sdk have any tool for this purpose? I don't have a windows machine installed with intel ocl, so a command line tool is preferred.

thanks

PS: if you are interested in testing, here is my code

git clone https://github.com/fangq/mcxcl.git
cd mcxcl/src
make
cd ../example/benchmark
../../bin/mcxcl -L
./run_benchmark2.sh -G ???

where ??? is an 01 string to select the CPU. for example, if the mcxcl -L command above lists 3 devices, the CPU is the 1st, you should use -G 1; if it is the 2nd, you should give -G 01, if it is the 3rd, use -G 001 and so on. You should see a segfault when running the last command.

Probleme de manipulation les fichiers DICOM sous Android Studio et OpenCL

$
0
0

 

Bonjour,

J ai un soucis pour manipuler les images DICOM sous Andoid Studio et open CL en lecture, ecriture, extraction et autres.

Ci dessus un code qui presente des erreurs de compilation

public DICOM(java.io.InputStream is)
DICOM dcm = new DICOM(is);
dcm.run("Name");
dcm.show();

Merci,

clEnqueueReadBuffer sometimes too slow(2second)

$
0
0

clEnqueueReadBuffer sometimes too slow(2second)

Hello, I'm making real-time application but sometimes clEnqueueReadBuffer is too slow.
I tested clEnqueueReadBuffer with attached code. But I don't know why it is slow. Please help.

this is my environment.

OS: Windows 10 Pro 64 bits
CPU: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
GPU : Intel(R) HD Graphics 630
OpenCL: 1.2 version.
Intel OpenCL SDK:
Version=6.3.0.1904
InternalVersion=dkdnfngdfkjndfkjgndfndfgk
Visual Studio Professional 2015.

below is my test code. full project is attached.

	cl_mem					d_buf;
	unsigned char *			h_in;
	unsigned char *			h_out;
	int						byte;

	unsigned long			tick_start;
	unsigned long			tick_end;
	int						idx;


	// initialize
	byte	= 4096;

	h_in	= new	unsigned char[4096];
	h_out	= new	unsigned char[4096];
	d_buf	= clCreateBuffer( ocl.context, CL_MEM_READ_WRITE, byte, NULL, &err );

	::memset( h_in, 0, byte );
	err = clEnqueueWriteBuffer( ocl.commandQueue, d_buf, CL_TRUE, 0, byte, h_in, 0,	NULL,NULL );
	if ( CL_SUCCESS != err ) {
		printf( "WriteError %d \r\n", err );
		DebugBreak();
	}


	// main loop
	for ( idx = 0 ; idx <= 500000 ; idx ++ ) {
		tick_start = ::GetTickCount();

		err = clEnqueueReadBuffer( ocl.commandQueue, d_buf,	CL_TRUE, 0, byte, h_out, 0,	NULL, NULL );
		if ( CL_SUCCESS != err ) {
			printf( "ReadError %d \r\n", err );
			DebugBreak();
		}

		tick_end = ::GetTickCount();

		// for check progress
		if ( idx %10000 == 0 ) {
			printf( "idx: %d \r\n", idx );
		}

		// for check large delay
		if ( tick_end - tick_start > 100 ) {
			printf( "idx: %d, Elapsed: %d ms \r\n", idx,  (int)(tick_end - tick_start));
		}
	}

	// rlease memory
	clReleaseMemObject( d_buf );
	delete [] h_in;
	delete [] h_out;

 

AttachmentSize
Downloadapplication/zipOpenCLProject.zip13.6 KB

opencl sdk can't install for vs 2017 15.4.1


OpenCl kernel code behaves differenty (incorrectly) when wrapped in a function.

$
0
0

(Sorry if a similar problem has already been posted - I wasn't sure what to search for.)

Hello,

I'm using PyOpenCl and my OpenCl kernel code behaves differently (wrongly) when I put it in a function than when it's part of the main kernel program.  It works correctly on one of my Dell laptops with an Nvidia GPU, but not on my other one with a Dell GPU.  Both are using Ubuntu 16.04.

I'm writing a very crude image filtering function to smoothly offset pixels by a non-integer value, eg. translating the image by (.25, 0) pixels will have this effect: newPixelHere = oldPixelHere * .75 + oldPixelToTheLeft * .25.

I've attached a redux version of the code, where a simple 3d array is used as a standin for a colour image.  As I've noted in the code comments, the function filterImg() contains identical code to the commented out code in the main body, but the main body works, whereas the function does not - it just sets "ret" to a constant value of 49.  Moreover, the function behaves as expected if I change "i<4" to "i<3" in the main loop, or if I avoid using integer / and %.

I'm just starting with OpenCl programming and am admittedly out of my depth, but I've read some things about using loops with caution in OpenCl.  Again, this works on my Nvidia GPU, but not on my Dell one.

Finally, although OpenCl had been working properly on my Dell GPU up until this problem, I have been receiving the following warning when I run OpenCl:

beignet-opencl-icd: no supported GPU found, this is probably the wrong opencl-icd package for this hardware
(If you have multiple ICDs installed and OpenCL works, you can ignore this message)

I hadn't worried much about this since I thought I might have multiple ICDs installed, since there are multiple entries in /etc/OpenCL/vendors/:

 

~:ls -1 /etc/OpenCL/vendors/
intel-beignet-x86_64-linux-gnu.icd
mesa.icd
~:

... but again, I don't really know what I'm doing.  Any help appreciated, thanks very much in advance.

Jeremy.

 

python code:

#!/usr/bin/python
import pyopencl as cl
import numpy as np

def printLs3d(ls3d):
    # Utility to print "red" channel of 3d array.
    for y in range(len(ls3d[0])):
        print
        for x in range(len(ls3d)):
            if ls3d[x][y][0] == 0:
                print "...",
            else:
                print "%03d" % ls3d[x][y][0],

def shadeImg(lsIn):
    printLs3d(lsIn)

    cntxt = cl.create_some_context()
    queue = cl.CommandQueue(cntxt)
    res = (len(lsIn)-1, len(lsIn[0])-1)
    print

    # Inputs
    srcImgAr_buf =  cl.Buffer(cntxt, cl.mem_flags.READ_ONLY |
        #cl.mem_flags.COPY_HOST_PTR,hostbuf=np.array(list(pygame.surfarray.array3d(srcImg))))
        cl.mem_flags.COPY_HOST_PTR,hostbuf=np.array(lsIn))

    # Outputs
    shadedImg = np.zeros((len(lsIn), len(lsIn[0]), len(lsIn[0][0])), dtype=np.uint8)
    shadedImg_buf = cl.Buffer(cntxt, cl.mem_flags.WRITE_ONLY |
        cl.mem_flags.COPY_HOST_PTR,hostbuf=shadedImg)

    kernelPath = "/home/jeremy/dev/warp/testOpenClLoops/testOpenClLoops.c"
    with open(kernelPath) as f:
        kernel = "".join(f.readlines())

    bld = cl.Program(cntxt, kernel).build()
    launch = bld.krShadeImg(
            queue,
            #srcImgAr.shape,
            res,
            None,
            np.int32(res[0]),
            np.int32(res[1]),
            srcImgAr_buf,
            shadedImg_buf)
    launch.wait()


    cl.enqueue_read_buffer(queue, shadedImg_buf, shadedImg).wait()
    printLs3d(shadedImg)



testIn = [
    [[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]],
    [[0, 0, 0], [100, 100, 100], [100, 100, 100], [100, 100, 100], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]],
    [[0, 0, 0], [100, 100, 100], [100, 100, 100], [100, 100, 100], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]],
    [[0, 0, 0], [100, 100, 100], [100, 100, 100], [100, 100, 100], [0, 0, 0], [100, 100, 100], [100, 100, 100], [0, 0, 0]],
    [[0, 0, 0], [100, 100, 100], [100, 100, 100], [100, 100, 100], [0, 0, 0], [100, 100, 100], [100, 100, 100], [0, 0, 0]],
    [[0, 0, 0], [100, 100, 100], [100, 100, 100], [100, 100, 100], [0, 0, 0], [100, 100, 100], [100, 100, 100], [0, 0, 0]],
    [[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [100, 100, 100], [100, 100, 100], [0, 0, 0]],
    [[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]],
    [[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]],
    [[0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0], [0, 0, 0]]]


testAr = np.array(testIn, dtype=np.uint8)


shadeImg(testAr)


 

C++ kernel code contained in "/home/jeremy/dev/warp/testOpenClLoops/testOpenClLoops.c":

void setArrayCell(int x, int y, int xres, int yres,
  uchar* val,
  __global uchar* ret)
{
    if (x >= 0 && x < xres && y >= 0 && y < yres) {
        int i = (x * yres + y) * 3;
        ret[i] = val[0];
        ret[i+1] = val[1];
        ret[i+2] = val[2];
    }
}


void filterImg  (unsigned int x, unsigned int y, int xres, int yres,
    __global uchar* img,
    uchar* ret) {
    // This function contains identical code to the commented-out
    // block in the main body, except some comments.
    // The main body code works; this function doesn't.  Why???

    // Offset xy lookup to weigh influence of each neighbour.
    // If xOfs == yOfs == 0, pixel (x,y) gets full weight.
    // If xOfs == yOfs == 1, pixel (x+1,y+1) gets full weight.
    float xOfs = .75f;
    float yOfs = .5f;

    for (int i=0; i<4; i++) { // ***WORKS IF YOU CHANGE 4 TO 3!
        // Sample 2x2 grid of neighbouring pixels in this order:
        // (x,y), (x+1,y), (x,y+1), (x+1,y+1)
        int dx = i%2; // ***WORKS IF YOU SET dx AND dy TO CONSTANT 0 OR 1
        int dy = i/2;
        int xx = x + dx;
        int yy = y + dy;

        // Calculate weight of this pixel (not 100% sure this is correct)
        float wx = dx == 0 ? xOfs : 1.0f-xOfs;
        float wy = dy == 0 ? yOfs : 1.0f-yOfs;
        float k = wy*wx;


        int address = (xx * yres + yy) * 3;
        for (int j=0; j<3; j++) {
            ret[j] += img[address+j]*k;
        }

    }

}

__kernel void krShadeImg(
            int xres,
            int yres,
            __global uchar* img,
            __global uchar* shadedImg)
{
    unsigned int x = get_global_id(0);
    unsigned int y = get_global_id(1);


    if (x < xres-1 && y < yres-1) {
        uchar ret[3] = {0, 0, 0};

        filterImg(x, y, xres, yres, img, ret);
        // *** TO MAKE THE CODE WORK, comment out the above
        // *** line and uncomment the following block.
        /*
        // Offset xy lookup to weigh influence of each neighbour.
        // If xOfs == yOfs == 0, pixel (x,y) gets full weight.
        // If xOfs == yOfs == 1, pixel (x+1,y+1) gets full weight.
        float xOfs = .75f;
        float yOfs = .5f;

        for (int i=0; i<4; i++) {
            // Sample 2x2 grid of neighbouring pixels in this order:
            // (x,y), (x+1,y), (x,y+1), (x+1,y+1)
            int dx = i%2;
            int dy = i/2;
            int xx = x + dx;
            int yy = y + dy;

            // Calculate weight of this pixel (not 100% sure this is correct)
            float wx = dx == 0 ? xOfs : 1.0f-xOfs;
            float wy = dy == 0 ? yOfs : 1.0f-yOfs;
            float k = wy*wx;


            int address = (xx * yres + yy) * 3;
            for (int j=0; j<3; j++) {
                ret[j] += img[address+j]*k;
            }

        }
        */
        setArrayCell(x, y, xres, yres, ret, shadedImg);
    }
}

 

Output when using filterImg function (broken):

Before OpenCl process:

... ... ... ... ... ... ... ... ... ...
... 100 100 100 100 100 ... ... ... ...
... 100 100 100 100 100 ... ... ... ...
... 100 100 100 100 100 ... ... ... ...
... ... ... ... ... ... ... ... ... ...
... ... ... 100 100 100 100 ... ... ...
... ... ... 100 100 100 100 ... ... ...
... ... ... ... ... ... ... ... ... ...

After OpenCl process:

049 049 049 049 049 049 049 049 049 ...
049 049 049 049 049 049 049 049 049 ...
049 049 049 049 049 049 049 049 049 ...
049 049 049 049 049 049 049 049 049 ...
049 049 049 049 049 049 049 049 049 ...
049 049 049 049 049 049 049 049 049 ...
049 049 049 049 049 049 049 049 049 ...
... ... ... ... ... ... ... ... ... ...

 

 

Output when using main body code (correct):

 

Before OpenCl process:

... ... ... ... ... ... ... ... ... ...
... 100 100 100 100 100 ... ... ... ...
... 100 100 100 100 100 ... ... ... ...
... 100 100 100 100 100 ... ... ... ...
... ... ... ... ... ... ... ... ... ...
... ... ... 100 100 100 100 ... ... ...
... ... ... 100 100 100 100 ... ... ...
... ... ... ... ... ... ... ... ... ...

After OpenCl process:

012 049 049 049 049 037 ... ... ... ...
024 098 098 098 098 074 ... ... ... ...
024 098 098 098 098 074 ... ... ... ...
012 049 049 049 049 037 ... ... ... ...
... ... 012 049 049 049 037 ... ... ...
... ... 024 098 098 098 074 ... ... ...
... ... 012 049 049 049 037 ... ... ...
... ... ... ... ... ... ... ... ... ...

Code Builder initialization failed

$
0
0

Dear all,

I am facing an issue with the Code Builder in the last Intel OpenCL SDK (version 7.0.0.2519) with Visual Studio 2017. When I start VS the following message appears: "Code Builder initialization failed: Failed to get platform info from server.". The problem is that I am not able to get access to any functionality of the Code Builder plug-in (Kernel Development, API debugger, etc).

My system consist on a Intel Core i3-3220T CPU with a NVIDIA GeForce GTX 550 Ti, I am using the latest drivers for both devices and I am running under Windows 10. The strange situation is that this issue does not happen when I had on my PC an AMD GPU (HD7970), this problem started once I replaced the AMD GPU with the NVIDIA one. I tried a fresh install of both the OS and the development tools but the issue remains... maybe there is a conflict with NVIDIA devices?. I tried an installation on a second machine with a Intel CPU and NVIDIA Quadro GPU and I obtained the same issue.

 

Thanks for your help!

New Computer Vision SDK Beta R3 Released

$
0
0

Just released: new Intel® Computer Vision SDK Beta R3 with FPGA support, deep learning enhancements, and traditional computer vision improvements

We’re proud to announce the launch of Intel® Computer Vision SDK Beta R3.  Software developers and data scientists working on computer vision, neural network inference, and deep learning deployment capabilities for smart cameras, robotics, office automation, and autonomous vehicles can accelerate their solutions across multiple types of platforms: CPU, GPU, and now FPGA.

For the latest information, documentation, and updates, see the Intel® Computer Vision SDK Beta R3 product site.

How to read / write pipe in order with different workgroups and CUs?

$
0
0

Hello~

 i am trying to use Pipe in OpenCL2.0, but in my code I use many different workgroups and CUs. 

I have defined below attribute

__attribute__((num_compute_units(2))) 

In my code  it produces random values and put into different workgroups , for example 2 workgroups of 4x4 dimension each , 

write into the pipe, and then read from the pipe.

 I found that that when CU = 1, values write and read from pipe are in order, but when CU > 2, write / read are not in order.

Does anyone has idea about how to read / write pipe in order with many different workgroups and CUs?

Thanks in advance!

 

 

 

Install OpenCL on Centos6.9

$
0
0

Hi,all

I want to install OpenCL Driver on Centos6.I got a message that the drive did not support this OS.

Is there any version supports this OS?

Please .Thanks!

 

Memory leak in clSetKernelArg with arg_value=NULL

$
0
0

I am using "block_advanced_motion_estimate_check_intel" kernel for motion estimation. As per OpenCL Extensions for AVME, some of the arg_value can be set to NULL if the corresponding parameter needs to be ignored. When setting arg_value to NULL, there is a memory leak and valgrind shows the leak in the NULL argument lines. Is there any way to pass NULL and still get rid of memory leak? Is there any setting required so that this scenario is not treated as memory leak?

No GPU device found on SkyLake + Ubuntu 16.04

$
0
0

hi,

Suppose OpenCL driver for Intel HD graphics on SkyLake is available on Ubuntu 14.02. But it's unfortunately not.

I downloaded SRB5.0_linux64.zip (with BUILD_ID 63503 in it) and installed it this way -

Install the intel-opencl-r5.0 driver
$ sudo apt-get update
$ sudo apt-get install xz-utils
$ mkdir intel-opencl
$ tar -C intel-opencl -Jxf intel-opencl-r5.0-BUILD_ID.x86_64.tar.xz
$ tar -C intel-opencl -Jxf intel-opencl-devel-r5.0-BUILD_ID.x86_64.tar.xz
$ tar -C intel-opencl -Jxf intel-opencl-cpu-r5.0-BUILD_ID.x86_64.tar.xz
$ sudo cp -R intel-opencl/* /
$ sudo ldconfig
 

But no GPU device is available when I checked it with samples like this, -

$ ./CapsBasic
Number of available platforms: 1
Platform names:
    [0] Intel(R) OpenCL [Selected]
Number of devices available for each type:
    CL_DEVICE_TYPE_CPU: 1
    CL_DEVICE_TYPE_GPU: 0
    CL_DEVICE_TYPE_ACCELERATOR: 0

----------------------------------------------------------------------------------------------------------------------------------------------------------

Below is configuration of my desktop -

$ grep -m 1 name /proc/cpuinfo
model name      : Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz

$ lspci -nn | grep VGA
00:02.0 VGA compatible controller [0300]: Intel Corporation Sky Lake Integrated Graphics [8086:1912] (rev 06)

$ uname -a
Linux iot-demo 4.10.0-37-generic #41~16.04.1-Ubuntu SMP Fri Oct 6 22:42:59 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Please get help! Thanks!

 

 


Performance issue with clCreateKernel()

$
0
0

Hello,

I am experiencing performance issues with Intel OpenCL drivers when calling clCreateKernel().
Attached you can find a test code (using Boost Compute) with reference values taken from various OpenCL devices.
Here, the Intel drivers show by far the worst results by a factor of 10 to 600 compared to other vendors.
Please let me know which further details you need, or where I can file a bug report, to get this fixed.

Best regards

AttachmentSize
Downloadtext/x-c++srcclCreateKernel.cpp5.67 KB

Centos 6.5 Support

$
0
0

I need to install Intel OpenCL support under CentOS 6.5, and after considerable research, I believe the answer is the depricated 'OpenCL Applications XE 2013 R3' package.  Problem is, after two days of searching, I cannot find it anywhere.  Surely Intel has an archive somewhere ...?... Can someone direct me to where I might download this package.  I only need OCL 1.2...

Thanks!

Steve

A couple of problems using latest OpenCL SDK

$
0
0

Hi, I have a fresh PC build and wanted to try out OpenCL SDK, but some functionality seems to be broken.

System I'm using:

  • Intel I7-8700 with UHD Graphics 630
  • Intel Graphics Driver 15.47.02.4815
  • Intel SDK for OpenCL Applications 2017 Beta (7.0.0.2519)
  • Windows 10 Pro
  • Visual Studio 2017

Things that doesn't work:

  • OpenCL Kernel Development
    • Run
      • It seems to be running something successfully, then it opens tab "CodeBuilder Run Results" which only says "loading...", and a popup appears with the message: "Error, run results are missing.".
    • Debug
      • Nothing is run, instead an error message appears:
        • "Found the following unexpected registry settings on localhost: Key HKLM\SYSTEM\CurrentControlSet\Control\GraphicsDrivers\TdrLevel. Expected: 0 (DWORD), Actual: null Key HKLM\SYSTEM\CurrentControlSet\Control\GraphicsDrivers\Scheduler\EnablePreemption. Expected: 0 (DWORD), Actual: null Do you want the debugger to fix these settings for you? Press yes to fix them automatically, and No if you plan todo it manually. Note: a reboot is required after updating your registry settings.". Pressing Yes and rebooting does nothing.
      • And then another error message appears:
        • KDF debugging not available: Failed to start debugger on target host localhost
    • Analyze
      • Seems to be doing something, but the generated html file contains only: "Error: can not find main menu data"
  • OpenCL Application Analysis
    • In the generated view, the main page says "Error: unable to retrieve "Home Page": "SyntaxError: Syntax error"."
    • The other views seems to be okay however.

Something to note: When installing SDK, I wasn't able to select option "Debugger target support". I tried to install some of the packages separately, and when trying out gen_debugger_target_release_1.1.63081.0.msi, an error appeared: "Your system doesn't meet the requirements to support OpenCL (TM) kernel debugging. OpenCL(TM) kernel debugging is supported only on 6(th) Generation or above of Intel(R) Core(TM) Processors with Intel(R) Iris(TM), Intel(R) Iris Pro and HD Graphics. Please also make sure you have installed latest Intel(R) Graphics ..."

And just to clarify, the rest of functionality does seem to work - GPU is correctly reported (supporting OpenCL 2.0), the example from project template runs fine (I think), and the OpenCL API debugger looks to be showing proper data (or at least realistic).

SegFault in 'PrepareKernelArgs' caused by function call within __kernel

$
0
0

Hi,

I have encountered an issue compiling my OpenCL kernel using the Intel CPU-only runtime (v16.1.1) on  RHEL 7 (x64).

I've created a MWE in a github gist here

(using pyopencl to compile / setup args, etc.)

Essentially, if my kernel "testkernel" calls the "bad" subkernel (defined as a __kernel function with required work group size), I get this segfault.  If I change the kernel definition to the "good" subkernel (a simple "void" function call) it compiles and runs correctly.  However, the OpenCL standard seems to imply the two should be identical:

>The __kernel (or kernel) qualifier declares a function to be a kernel that can be executed by an application on an OpenCL device(s). The following rules apply to functions that are declared with this qualifier:

>It is just a regular function call if a __kernel function is called by another kernel function.

In any case, it probably shouldn't be giving me segfaults!

------------------------------------------------------------------------------------------

output:

PYOPENCL_CTX='' PYOPENCL_COMPILER_OUTPUT=1 python2.7 test.py broken.opencl

Stack dump:
0.      Running pass 'PrepareKernelArgs' on module 'main'.
Segmentation fault (core dumped)

PYOPENCL_CTX='' PYOPENCL_COMPILER_OUTPUT=1 python2.7 test.py working.opencl

/home/ncurtis/.local/lib/python2.7/site-packages/pyopencl/cffi_cl.py:1502: CompilerWarning: From-source build succeeded, but resulted in non-empty logs:
Build on <pyopencl.Device 'Intel(R) Xeon(R) CPU E5-4640 v2 @ 2.20GHz' on 'Intel(R) OpenCL' at 0x31b5678> succeeded, but said:

Compilation started
Compilation done
Linking started
Linking done
Device build started
Device build done
Kernel <bad> was successfully vectorized (4)
Kernel <testkernel> was not vectorized
Done.
 warn(text, CompilerWarning
 

------------------------------------------------------------------------------------------------------

Thanks,

Nick

 

p.s. any update on a new release of the CPU-only runtime?? It would be nice to get some of the bugfixes applied (and even better to try out OCL 2.0+)

Wrong alignment when passing structs to a kernel

$
0
0

Hello,

I am using clSetKernelArg to pass a struct with cl_float3 values to the kernel.
cl_float3 is the same as cl_float4 and needs to be aligned at 16 byte boundary according to the OpenCL specification.
This works fine with various vendors, but not for Intel due to misalignments.
Please fix the alignment to match the required alignment of the structure in the kernel argument.

Attached you can find an example application. The expected alignment values are the least possible values to work without vload/vstore.

using device:   Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
device version: OpenCL 1.2 (Build 63463)
driver version: 1.2
result:          4,  4,  4,  8, 16
expected:        4, 16, 16,  4, 16

 

AttachmentSize
Downloadtext/x-c++srcAlignment.cpp2.98 KB
Viewing all 1182 articles
Browse latest View live