Quantcast
Channel: Intel® Software - OpenCL*
Viewing all 1182 articles
Browse latest View live

Compiler crash on byte swap function for ulong

$
0
0

Hi, everybody!

I found a bit strange bug in OpenCL compiler for Intel HD4000 GPU (and maybe for all Intel GPUs).

Consider this kernel code:

ulong reverse( ulong x )
{
    ulong y = x >> 0x38;
    y |= ( ( x >> 0x28 ) & 0x000000000000FF00 );
    y |= ( ( x >> 0x18 ) & 0x0000000000FF0000 );
    y |= ( ( x >> 0x08 ) & 0x00000000FF000000 );
    y |= ( ( x << 0x08 ) & 0x000000FF00000000 );
    y |= ( ( x << 0x18 ) & 0x0000FF0000000000 );
    y |= ( ( x << 0x28 ) & 0x00FF000000000000 );
    y |= x << 0x38;

    return y;
}

ulong reverseHigh( ulong x )
{
    ulong y = x >> 0x38;
    y |= ( ( x >> 0x28 ) & 0x000000000000FF00 );
    y |= ( ( x >> 0x18 ) & 0x0000000000FF0000 );
    y |= ( ( x >> 0x08 ) & 0x00000000FF000000 );

    return y;
}

ulong reverseLow( ulong x )
{
    ulong y = x << 0x38;
    y |= ( ( x << 0x08 ) & 0x000000FF00000000 );
    y |= ( ( x << 0x18 ) & 0x0000FF0000000000 );
    y |= ( ( x << 0x28 ) & 0x00FF000000000000 );

    return y;
}

ulong rev( ulong x )
{
    return reverseHigh( x ) | reverseLow( x );
}

__kernel void func( __global ulong* input, __global ulong* output )
{
    uint gid = get_global_id( 0 );

    // those 2 functions doesn't compile
    output[ gid ] = reverse( input[ gid ] );
    //output[ gid ] = rev( input[ gid ] );

    // this code is work!
    //output[ gid ] = reverseHigh( input[ gid ] ) | reverseLow( input[ gid ] );
}

Compiler will crash with very strange error:

fcl build 1 succeeded.
error: Cannot yet select: 0x5e82160: i64 = bswap 0x5f26550 [ORD=5] [ID=25]
  0x5f26550: i64,ch = GHAL3DISD::LOAD64 0x5f262a8, 0x5f26440, 0x5f264c8 [ID=24]
    0x5f262a8: i32,ch = load 0x5e81d20:1, 0x5f26220, 0x5e82050<LD4[%2](align=8)> [ID=21]
      0x5e81d20: i32,ch = llvm.GHAL3D.get.global.id 0x5e81770, 0x5e81fc8, 0x5e81c98 [ORD=1] [ID=11]
        0x5e81770: ch = EntryToken [ORD=1] [ID=0]
        0x5e81fc8: i32 = Constant<77> [ID=5]
        0x5e81c98: i32 = Constant<0> [ORD=1] [ID=3]
// a lot more lines...

This happens if reverse() or rev() functions are used in kernel and doesn't happen, when using the last line of code, which in fact is inlining of rev() function.

Inline keyword doesn't help.

I test this code on my Asus notebook and I can't install Intel SDK. It complains for old OpenCL driver. I downloaded the latest beta driver from official site and copied Intel_OpenCL_ICD64.dll (ver. 2.0.0.0), IntelOpenCL64.dll (ver. 10.18.10.3652), Intel_OpenCL_ICD32.dll (ver. 2.0.0.0) and IntelOpenCL32.dll (ver. 10.18.10.3652) to System32 and SysWOW64 folders. The error remains.

Any help will be appreciated.

 


Device becomes unavailable after attempting to create context

$
0
0

My cpu device is reported as available from clGetDeviceInfo.  But when I try to clCreateContext, it returns error -2 (CL_DEVICE_NOT_AVAILABLE).  The device is then reported as non-availble from clGetDeviceInfo.

I am running ubuntu 14.04 on virtualbox on Windows 7.  From what I gathered, running in a VM is not an issue for OpenCL on a cpu.  My cpu is an i5-2520m.

There might be a solution (http://stackoverflow.com/questions/18193320 and http://stackoverflow.com/questions/17346647), but I thought I would try this forum first.  The solution in those posts is to install the AMD graphics driver, even if one does not have an AMD video card.

Compiler hangs up when image is used

$
0
0

Hi, everybody!

I have found 2 problems with OpenCL kernel compilation on Intel HD4600:

1) Kernel Builder can't build some of my kernels and reports this:

OpenCL Intel(R) Graphics device was found!
Device name: Intel(R) HD Graphics 4600
Device version: OpenCL 1.2
Device vendor: Intel(R) Corporation
Device profile: FULL_PROFILE
fcl build 1 succeeded.

Build failed!

If i reduce kernel code, then build would finish successfully. But the whole kernel can't be build. And my program also can't do this.

At the same time this kernel can be compiled by Kernel Builder. I can't share my kernel code, because it is proprietary. And while trying to write test kernel to reproduce this problem, I found another one:

2) Kernel Builder hangs up while building following kernel:

__constant sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP | CLK_FILTER_NEAREST;

#define ROTR( x, n )    ( ( (x) >> (n) ) | ( (x) << ( 32 - (n) ) ) )
#define SHR( x, n )     ( (x) >> (n) )

#define Ch( x, y, z )   ( ( (x) & (y) ) ^ ( ~(x) & (z) ) )
#define Maj( x, y, z )  ( ( (x) & (y) ) ^ ( (x) & (z) ) ^ ( (y) & (z) ) )
#define SIGMA0( x )     ( ROTR( (x), 2  ) ^ ROTR( (x), 13 ) ^ ROTR( (x), 22 ) )
#define SIGMA1( x )     ( ROTR( (x), 6  ) ^ ROTR( (x), 11 ) ^ ROTR( (x), 25 ) )
#define sigma0( x )     ( ROTR( (x), 7  ) ^ ROTR( (x), 18 ) ^  SHR( (x),  3 ) )
#define sigma1( x )     ( ROTR( (x), 17 ) ^ ROTR( (x), 19 ) ^  SHR( (x), 10 ) )

#define ROUND( A, B, C, D, E, F, G, H, W, k )     {         \
    (H) += SIGMA1( (E) ) + Ch( (E), (F), (G) ) + (k) + (W); \
    (D) += (H);                                             \
    (H) += SIGMA0( (A) ) + Maj( (A), (B), (C) );            }

void test( __read_only image2d_t image, uint word, uint* digest )
{
    uint4 storage = read_imageui( image, sampler, (int2)( get_local_id( 0 ), get_group_id( 0 ) ) );
    uint a = storage.x;
    uint b = storage.y;
    uint c = storage.z;
    uint d = storage.w;

    uint e = 5;
    uint f = 6;
    uint g = 7;
    uint h = 8;

    uint w0 = word;
    uint w1 = 1;
    uint w2 = 0;
    uint w3 = 0;
    uint w4 = 0;
    uint w5 = 0;
    uint w6 = 0;
    uint w7 = 0;
    uint w8 = 0;
    uint w9 = 0;
    uint wA = 0;
    uint wB = 0;
    uint wC = 0;
    uint wD = 0;
    uint wE = 0;
    uint wF = 1;

    ROUND( a, b, c, d, e, f, g, h, w0, 1 );
    ROUND( h, a, b, c, d, e, f, g, w1, 2 );
    ROUND( g, h, a, b, c, d, e, f, w2, 3 );
    ROUND( f, g, h, a, b, c, d, e, w3, 4 );
    ROUND( e, f, g, h, a, b, c, d, w4, 5 );
    ROUND( d, e, f, g, h, a, b, c, w5, 6 );
    ROUND( c, d, e, f, g, h, a, b, w6, 7 );
    ROUND( b, c, d, e, f, g, h, a, w7, 8 );
    ROUND( a, b, c, d, e, f, g, h, w8, 9 );
    ROUND( h, a, b, c, d, e, f, g, w9, 10 );
    ROUND( g, h, a, b, c, d, e, f, wA, 11 );
    ROUND( f, g, h, a, b, c, d, e, wB, 12 );
    ROUND( e, f, g, h, a, b, c, d, wC, 13 );
    ROUND( d, e, f, g, h, a, b, c, wD, 14 );
    ROUND( c, d, e, f, g, h, a, b, wE, 15 );
    ROUND( b, c, d, e, f, g, h, a, wF, 16 );

    digest[ 0 ] = 1 + a;
    digest[ 1 ] = 2 + b;
    digest[ 2 ] = 3 + c;
    digest[ 3 ] = 4 + d;
    digest[ 4 ] = 5 + e;
    digest[ 5 ] = 6 + f;
    digest[ 6 ] = 7 + g;
    digest[ 7 ] = 8 + h;
}

__kernel void hangup( __read_only image2d_t image,
    __constant uint* cdata, __global uint* data )
{
    const uint gid = get_global_id( 0 );
    uint digest[ 8 ];

    test( image, data[ gid ], digest );
    for( uint index = 0; index < 8; ++index )
    {
        data[ gid ] ^= digest[ index ];
    }
}

And again, this kernel can be compiled by Kernel Builder. I assume hang occurs on link stage.

If I don't use 2D image, then no problems occur while building.

OpenCL driver version is 10.18.10.3652. Any help will be appreciated.

TBB scheduling issue on Xeon and Xeon Phi

$
0
0

Hello!

I am running code that uses your latest implementation of Intel OpenCL SDK on Linux. I execute code on both Xeon and on Xeon Phi. I am profiling code with Intel VTune Amplifier. According to the analysis the main limiting factor that I experience is due to some unidentified TBB scheduling issues. I would rather expect the kernels ([Dynamic Code]) to be the main hotspots in the code. I would like to know what might be the reason that so much execution time is spent in the highlighted parts of code that are not parts of my code, but library calls. Due to the lack of stack trace provided by VTune Amplifier I cannot further optimize the code, because I cannot identify what leads to calling those TBB scheduling functions.

On both Xeon and Xeon Phi I can observe the common behaviour, but on Xeon Phi it is escalates significantly. In case of Xeon Phi does the result of analysis mean that the host CPU is limiting the performance? I observed that the observed difference grows with the number of workgroups. According to OpenCL Optimization Guide it might show issues with workgroup scheduling, but I let OpenCL kernel compiler to pick the right local workgroup size, while I call clEnqueueNDRangeKernel. Actually, when I specify the local workgroup size explicitly, I get 10-15% performance boost, but still the above mentioned issue is by far the main limiting factor to the performance.

I do not observe anything that would resemble a similar issue on Windows platform and Core i7 processors. In general, due the above issue, the application performs better on a Intel(R) Xeon(R) CPU E5-2665 than on Xeon Phi 5110P. Hyperthreading is turned off on Xeon. Analysis shows that all cores are used on both platforms.

I would like to ask for guidance with this issue. I attach the screenshots from the profiler. VTune analysis: Basic Hotspots for Xeon, General Exploration for Xeon Phi.

Error when creating OpenCL Project with Intel OpenCL SDK on VS2012 and VS2013

$
0
0

Dear all,

Unfortunately I have a problem when creating a project using the "OpenCL" option, offered with the Intel OpenCL SDK 2014

The system gives the following message:

Not implemented (Exception from HRESULT : 0x80004001 (E_NOTIMPL))

I have tried to uninstall, reinstall, redownload, etc both VS2012, VS2013 and Intel OpenCL SDK without success. Even with different computers happens the same. This error occurs when I try to add an OpenCL project to a Solution. The workaround for the moment is to add the project as a C++ one and then convert it to the OpenCL API and edit manually the project properties. 

Thanks for your help!

In some cases opencl runtime can't create context in DLL

$
0
0

I try create some DLL based opencl for working with MATLAB. Function clCreateContext can't create context when it called from DLL used from MATLAB (I try a function clCreateContextFromType too). Return error -6 (CL_OUT_OF_HOST_MEMORY), but at this moment task manager show more than 1.5 GB memory is free.

It is strange but if I use this DLL with Intel platform from VisualStudio or from Excel all is working.

This DLL is working with MATLAB when I use another platform (AMD on the same machine or Nvidia on another machine).

The system is CPU Intel Core i5 660, RAM 4 GB, Windows 7 64-bit, MATLAB 8.3 64 bit, Intel SDK 4.4.0.117, Intel OpenCL runtime for CPU 4.5.0.5 (I check this on runtime 4.4.0.117 doesn't work too), AMD APP SDK 2.9.599.381, Nvidia version I can't remember (I think something latest).

I attached little project for VisualStudio 2013. This project write error code in windows application event log. In archive also include matlab.m file.

AttachmentSize
DownloadVS.rar9.54 KB

New Haswell IGP OpenCL Windows drivers are available

$
0
0

Just installed 10.18.10.3907.  The release notes indicate there are OpenCL performance improvements.

 

Some questions on no. of threads and work groups

$
0
0

I have some background on  NVIDIA, and so to learn OpenCL for Intel, I would like to correlate.

In case of Nvidia, we have following rules :

1- Warp size: 32 (or in some cases 64)    

2- Maximum no. of resident blocks per multiprocessor: 8 

3- Maximum no. of  threads that can be resident on a Multiprocessor: 768 ( in older cards)

4- Amount of shared memory available per workgroup: 64 KB (48 + 16 KB )

5- No. of threads per workgroup: 512 (on latest cards it is 1024)

6- A workgroup runs only on a single Multiprocessor i.e half on MP#1 and other half on MP#2 , is not possible.

I would like to know these values for Intel HD graphics. 

I will really appreciate if somebody can point me to some links where I may be able to educate myself on this. 

Where would I get such info for intel HD graphics? Specifically I have Intel NUC with Celeron N2820 SoC  

 

 


OpenCL™ 2.0 is here! Download the Release 2 of Intel® SDK for OpenCL™ Applications 2014

$
0
0

Dear Developers,

We are happy to announce the availability of our latest and most advanced SDK for OpenCL: Release 2 of Intel® SDK for OpenCL SDK 2014 is the industry’s first SDK to provide an OpenCL 2.0 development environment with the new Intel® Core™ M Processors.

This major advance in graphics programmability and accessibility will help you make greater use of the graphics engine to deliver new experiences on Intel-based platforms 

New with SDK 2014 Release 2:

  • OpenCL™ 2.0: Support for latest OpenCL standard version 2.0 with shared virtual memory. Start now and write your first OpenCL 2.0 code on your current development platform, simply by using the new OpenCL 2.0 development environment that is installed with the SDK.
  •  
  • Intel® Core™ M Processors: Take advantage of OpenCL 2.0 with the latest Intel® Core™ M processors and future generations of Intel Core processors.
  • SPIR* 1.2: Improve portability and drive innovation with a standard, non-source intermediate representation (IR) for device programs. Supported on both CPU and Intel Graphics.
  • Greater development experience: The SDK provides everything you need to build, debug, and analyze OpenCL application. This release not only adds OpenCL 2.0 and SPIR 1.2 development support, but also adds new preview features for debugging and analyzing applications

Learn more› 

Regards,

Arnon Peleg, OpenCL SDK Product Manager

 

OpenCL GPU compiler fails on valid code

$
0
0

This code fails to compile when using Intel SDK for OpenCL Applications 2014 with an Intel HD 4000 GPU:

kernel void f1(global double* W)
{
    const double t = 1.;
    min(W[0], t);
}

It complains about the call to min being ambiguous. Changing it to fmin fixes the compilation. The same code compiles fine when targeting the CPU device (Core i7-3820QM), or an NVIDIA GPU.

convert_double function unimplemented on GPU

$
0
0

The following code fails to compile:

kernel void f2(global double* W)
{
    uint w;
    convert_double(w);
}

The error is: Error, unimplemented function called: _Z14convert_doublej()

 

Kickstart the GPU clock boost to reach maximum frquency

$
0
0

To whom it may concern:
(Maybe this is not the right forum, or it has been asked before ... )


Background: Power saving on CPU / GPU level like on a mobile device is not required for my application.
My application uses the Intel internal GPU as a high performance coprocessor via OpenCL.
The application actually does handover with lowfrequence kernel calls. And the kernels do not last until next call is given.
Hence the GPU gets not enough workload to raise / boost the GPU clock.
So I face that the OpenCL kernels are not running at potential maximum performance.


Question: Is there a decicated call to kickstart the GPU clock boost to reach maximum clock frequency, even while Intel HD GPU is idle ?
The latency to come to max GPU clock would take my application into account.
Then my kernels would reach maximum performance.


By the way: Is this an issue at all for a Intel HD GPU in a workstation with relaxed power saving constrains ?


Otherwise: I think I need to implement a workaround like:
- Let some low prio dummy kernels run all the time on the GPU to keep the clock at maximum
- Send real work to the GPU with high prio
OR
- collect work and send it as burst
- in the meantime send dummy kernels to keep the GPU busy.



Thanks in advance for the answer. Best regards, Stephan

Debugging issues with Visual Studio 2012 on Windows 8

$
0
0

I have written an OpenCL program for vector addition, for the Intel HD graphics processor. The code got built but during debugging, many of the files cant be found. The message window shows the following:

'OpenCLProject2.exe' (Win32): Loaded 'C:\Users\lenovo PC\Dropbox\Daily Work\Fundamental matrix test code\c to mex\OpenCLProject2\x64\Debug\OpenCLProject2.exe'. Symbols loaded.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\ntdll.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\kernel32.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\KernelBase.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\apphelp.dll'. Cannot find or open the PDB file.

SHIMVIEW: ShimInfo(Complete)

FTH: (7364): *** Fault tolerant heap shim applied to current process. This is usually due to previous crashes. ***

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\apppatch\apppatch64\AcLayers.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\msvcrt.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\user32.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\shlwapi.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\sfc.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\winspool.drv'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\gdi32.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\sfc_os.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\SortWindows61.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\imm32.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\msctf.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\PROGRA~3\ASSIST~1\ASSIST~2.DLL'. Module was built without symbols.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\advapi32.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\sechost.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\rpcrt4.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Unloaded 'C:\PROGRA~3\ASSIST~1\ASSIST~2.DLL'

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\OpenCL.dll'. Module was built without symbols.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\msvcp110d.dll'. Symbols loaded.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\msvcr110d.dll'. Symbols loaded.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\IntelOpenCL64.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Program Files (x86)\Common Files\Intel\OpenCL\bin\x64\intelocl64.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Program Files (x86)\Common Files\Intel\OpenCL\bin\x64\task_executor64.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\opengl32.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\glu32.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\ddraw.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\dciman32.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\ntmarta.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Program Files (x86)\Common Files\Intel\OpenCL\bin\x64\cpu_device64.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\version.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Program Files (x86)\Common Files\Intel\OpenCL\bin\x64\tbb\tbb.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\igdrcl64.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\igdfcl64.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\dbghelp.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\igdbcl64.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\igdusc64.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\Intelopencl64_2_0.dll'. Module was built without symbols.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Program Files (x86)\Common Files\Intel\OpenCL2.0\bin\x64\Intelocl64.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Program Files (x86)\Common Files\Intel\OpenCL2.0\bin\x64\task_executor64_2_0.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Program Files (x86)\Common Files\Intel\OpenCL2.0\bin\x64\cpu_device64.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\IntelOpenCLProfiler.dll'. Module was built without symbols.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\shell32.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\ws2_32.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\combase.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\nsi.dll'. Cannot find or open the PDB file.

'OpenCLProject2.exe' (Win32): Loaded 'C:\Windows\System32\psapi.dll'. Cannot find or open the PDB file.

The program '[7364] OpenCLProject2.exe' has exited with code 0 (0x0).

 

Kernel crashes latest SDK with "access violation reading location ......"

$
0
0

Here is the offending kernel. The task is to find the maximum number of bits in a block of pixels.

 

 

/////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////

#define CODEBLOCKX  32

#define CODEBLOCKY 32

CONSTANT sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE  | CLK_FILTER_NEAREST;



void KERNEL run(write_only image2d_t R,

                         write_only image2d_t G, 

                         write_only image2d_t B,

                         write_only image2d_t A , const unsigned int  width, const unsigned int height) {

    // Red channel 

    //find maximum number of bits in code block

    LOCAL char msbScratch[CODEBLOCKX];

    // between one and 32 - zero value indicates that this code block is identically zero

    int2 posIn = (int2)(getLocalId(0) + getGlobalId(0)*CODEBLOCKX,  getGlobalId(1)*CODEBLOCKY);

    int maxVal = -2147483647-1;

    for (int i = 0; i < CODEBLOCKY; ++i) {

        maxVal = max(maxVal, read_imagei(R, sampler, posIn).x);    

        posIn.y++; 

    }

    char msbWI = 32 - clz(maxVal);

    msbScratch[getLocalId(0)] =msbWI;

    localMemoryFence();

    

    //group by twos

    if ( (getLocalId(0)&1) == 0) {

        msbWI = max(msbWI, msbScratch[getLocalId(0)+1]);

    }

    localMemoryFence();

    

    //group by fours

    if ( (getLocalId(0)&3) == 0) {

        msbWI = max(msbWI, msbScratch[getLocalId(0)+2]);

    }

    localMemoryFence();

    

    

    //group by eights

    if ( (getLocalId(0)&7) == 0) {

        msbWI = max(msbWI, msbScratch[getLocalId(0)+4]);

    }

    localMemoryFence();

    

    //group by 16ths

    if ( (getLocalId(0)&15) == 0) {

        msbWI = max(msbWI, msbScratch[getLocalId(0)+8]);

    }

    localMemoryFence();

    

    

    if (getLocalId(0) == 0) {

        msbScratch[0] = max(msbWI, msbScratch[16]);  //crashes here with access violation while reading location .....

    }

    localMemoryFence();

    

}

OpenCL: Intel -Generating Intermediate Program Binaries with Offline Compiler

$
0
0

I am using Intel opencl SDK on Windows with Intel HD graphics. Would like to compile my kernel offline then use in host code with:

clCreateProgramFromBinary(…)

This link says :

OpenCL™ API Offline Compiler plug-in for Microsoft Visual Studio* IDE enables you to develop OpenCL applications with Visual Studio IDE.

The plug-in supports the following features:

New project templates
New OpenCL file (*.cl) template
Syntax highlighting
Types and functions auto-completion
Offline compilation and build of OpenCL kernels
LLVM code view
Assembly code view
program IR generation
Selection of target OpenCL device – CPU or Intel Graphics

NOTE

To work with the plug-in features, create an OpenCL project template \or convert an existing project into the OpenCL project.

I want to use this feature, so I wanted to know what all I have to install?

As per the note above I should create an OpenCL project template. How do I do this? Also what do we mean by "or convert an existing project into the OpenCL project"

 

Also  things under this links are also not working

 

https://software.intel.com/en-us/articles/programming-with-the-intel-sdk...

 

Try opening  how to guide in the above link 

 

How To Guide

Demo

Getting Started Video

Case Study

Using Offline Compiler Integration with Microsoft Visual Studio*

 It will say 4040 not found.

 

So please help us in getting started with offline compilation.   

 

Another kernel crash, with reproducer

$
0
0

This one is very simple - just reading in blocks of an image and storing in LDS.

Crashes with access violation on read.

Windows 7, latest SDK, CPU device.

//////////////////////////////////////////////////////////////////////////////////////////////////////////

// image is of dimension 512 x 512

//size_t local_work_size[3] = 32, 32/4

//size_t global_work_size[3] = {512, 512/4,1};



#define CODEBLOCKX 32

#define CODEBLOCKY 32

#define CODEBLOCKY_QUARTER 8

#define BOUNDARY 1

#define STATE_BUFFER_SIZE 1156

#define STATE_BUFFER_SIZE_QUARTER 289



void kernel run(read_only image2d_t R) {

    local int state[STATE_BUFFER_SIZE];

    //initialize pixels (including top and bottom boundary pixels)

    int2 posIn = (int2)(get_global_id(0) + get_global_id(0)*CODEBLOCKX,  get_global_id(1)*CODEBLOCKY);

    local int* statePtr = state + BOUNDARY + get_local_id(0);

    for (int i = 0; i < 4; ++i) {

        *statePtr = read_imagei(R, sampler, posIn).x;

        posIn.y += CODEBLOCKY_QUARTER;

        statePtr += STATE_BUFFER_SIZE_QUARTER;

    }

}

Offline compiler web link not working

Gen 8 and fp64

$
0
0

In the document GVCS001-The Compute Architecture of Intel Processor Graphics Gen8.pdf

it states.

"Finally, one of the FPUs provides extended math capability to support high-throughput transcendental math functions and double precision 64-bit floating-point."

Does this mean its possible for intel gpu opencl to one day fully support cl_khr_fp64 :)

Does the FPU also do high-throughput double precision transcendental math functions?

Before someone mentions Xeon's and AVX, there is no reason Intel can not give both options and let the market decide.

Perhaps you could release a pro part (Xeon with igp) with both of the two FPU's supporting double precision...

 

 

 

NULL image from clCreateFromGLTexture2D

$
0
0

I'm getting a NULL cl_mem value.. but CL_SUCCESS as an error code. I'm on x86_64 Arch Linux with Intel® Iris™ Pro Graphics 5200. This is the code I'm trying to run:

Quote:


#include
#include
#include
#include

static int WINDOW_WIDTH = 800;
static int WINDOW_HEIGHT = 600;

int main() {
cl_int err;

cl_platform_id platform;
if (clGetPlatformIDs(1, &platform, NULL) != CL_SUCCESS) {
fprintf(stderr, "error getting platforms\n");
return 1;
}

cl_device_id device;
if(clGetDeviceIDs(platform, CL_DEVICE_TYPE_ALL, 1, &device, NULL) != CL_SUCCESS) {
fprintf(stderr, "error getting devices\n");
return 1;
}

cl_context context = clCreateContext(NULL, 1, &device, NULL, NULL, &err);

SDL_Window* w = SDL_CreateWindow("Oh hello, mandelbrot", 0, 0, WINDOW_WIDTH, WINDOW_HEIGHT, 0);
SDL_GL_CreateContext(w);

GLint tex;
glGenTextures(1, &tex);
glBindTexture(GL_TEXTURE_2D, tex);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MAG_FILTER, GL_NEAREST);
glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexImage2D(
GL_TEXTURE_2D,
0,
GL_RGBA,
WINDOW_WIDTH,
WINDOW_HEIGHT,
0,
GL_RGBA,
GL_FLOAT,
NULL
);

if(glGetError() != GL_NO_ERROR) {
fprintf(stderr, "OpenGL error\n");
return 2;
}

cl_mem img = clCreateFromGLTexture2D(
context,
CL_MEM_WRITE_ONLY,
GL_TEXTURE_2D,
0,
tex,
&err
);

if(err != CL_SUCCESS) {
fprintf(stderr, "OpenCL error creating image: %d\n", err);
return 1;
} else if (img == NULL) {
fprintf(stderr, "NULL image\n");
return 1;
}

return 0;
}

Is this a bug?

Thanks!

OpenCL vs Intel Cilk Plus Issues, Differences and Capabilities

$
0
0

I  am curious as to the differences between OpenCL and Intel Cilk Plus. They are both parallel programming paradigms that are receiving wide recognition but technically speaking is one better than the other or are they simply different. Also what yardstick do I use when choosing between the two when solving an embarrassingly parallel problem. Please i need answers.

Thanks!

Yaknan

Viewing all 1182 articles
Browse latest View live