Quantcast
Channel: Intel® Software - OpenCL*
Viewing all 1182 articles
Browse latest View live

Some problems in HTML report when using latest OpenCL SDK

$
0
0

Hi, I had installed the newest OpenCL SDK, but some functionality seems to be broken.

System I'm using:

  • Intel SDK for OpenCL Applications 2017 Beta (7.0.0.2567)
  • Windows 10 Pro
  • Visual Studio 2017

Things that doesn't work:

OpenCL Application Analysis

  • OpenCL Application Analysis
    • Analyze
      • In the generated view, the main page says "Error: unable to retrieve "Home Page": "SyntaxError: Syntax error"."
         
  • OpenCL Application Analysis
    • In the generated view, the main page says "Error: unable to retrieve "Home Page": "SyntaxError: Syntax error"."

fail to verify number of compute units

$
0
0

I cannot verify the number of compute units (24 compute units as reported via 'CL_DEVICE_COMPUTE_UNITS') of my GPU device . Test results are as follows. What's wrong?

Case 1:

-- local work size (1, 1, 1)

-- global work size (1, 1, 1)

-- duration 107.375ms (difference between 'CL_PROFILING_COMMAND_START' and 'CL_PROFILING_COMMAND_END')

 

Case 2

-- local work size (1, 1, 1)

-- global work size (1, 1, 12)

-- duration 109.577ms

Case 3

-- local work size (1, 1, 1)

-- global work size (1, 1, 13)

-- duration 212.974ms

1>OpenCL Intel(R) Graphics device was found!
1>Device name: Intel(R) HD Graphics 520
1>Device version: OpenCL 2.0
1>Device vendor: Intel(R) Corporation
1>Device profile: FULL_PROFILE
1>fcl build 1 succeeded.
1>bcl build succeeded.
1>
1>CNN_MNIST_Infer info:
1> Maximum work-group size: 256
1> Compiler work-group size: (1, 1, 1)
1> Local memory size: 12064
1> Preferred multiple of work-group size: 8
1> Minimum amount of private memory: 288
1> Amount of spill memory used by the kernel: 0
1>
1>Build succeeded!
 

VEBox CCM module - input/output

$
0
0

 

I am trying to run VEBox modules on Intel Core M3 processor, which is supposed to have Skylake HD Graphics.  I have installed latest OpenCL drivers and OpenCL SDK on Ubuntu 16.04.  

The following note on Color Conversion Matrix module is a bit confusing:

Color Correction Matrix stage is available if 

    CL_DEVICE_VE_COLOR_PIPE_VERSION_INTEL is equal or greater than 

    CL_VE_VERSION_VER_2_INTEL and by default accepts only Bayer inputs. 

    If CL_DEVICE_VE_COLOR_PIPE_VERSION_INTEL is equal or greater than 

    CL_VE_VERSION_VER_3_INTEL it accepts also RGBA and NV12 inputs.

On my system 

CL_DEVICE_VE_COLOR_PIPE_VERSION_INTEL == CL_VE_VERSION_VER_2_INTEL

which means CCM accepts Bayer inputs.  My understanding is that the 3x3 matrix of CCM works on RGB pixel.  How does it get applied to Bayer pattern which is has two different Green pixels along with R, and G.

Further, if indeed CCM accepts Bayer, then it would mean disabling upstream Demosaic module in which case how does one do Demosaic-ing after CCM?

 

As for VE_COLOR_PIPE_VERSION  -  which version of VEBox/Driver combination would show up as CL_VE_VERSION_VER_3_INTEL?

 

Thanks,

Puneet

OpenCL Code Builder "Server" still crashes on startup (twice)

Intel OpenCL support for FPGA ...

VEBox - Denoise module on Camera low light noise

$
0
0

Hi,

I am runing VEBox modules on Intel Core M3 processor, with Skylake HD Graphics.  I have installed latest OpenCL drivers and OpenCL SDK on Ubuntu 16.04. 

I have enabled Denoise module of "ve_enhance_intel" kernel with NV12 input from camera.  In low light conditions the camera sensor noise is seen (as randomly moving pixels in dark areas of video).  I am not able to see any difference in the output of denoise module.  I tried all settings of the module, but there is not visible reduction of noise.  

Any suggestions/ideas?

Thanks,

Puneet

parallel execution of kernels on EU's through OOQ is possible?

$
0
0

Hi,

I am developing an opencl kernel with Out of order execution queue. I have read this article 

https://software.intel.com/en-us/articles/opencl-out-of-order-queue-on-i... which describes the OOQ and it performance implications.

i want to understand, when two kernels are enqueued in to OOQ will these two kernels get executed simultaneously on different EU's?.  i am not able to conclude that from the article. What i understand is that even in the OOQ the kernels are executed serially not simulataneously on the EU's.

please clarify this confusion.

 

Best Regards,

Rajesh

 

 

how many work item that a EU have

$
0
0

I look from other topic that:

Each Execution Unit (EU) in our integrated graphics has seven hardware threads, each hardware thread is capable of running 8, 16, or 32 work items depending on whether compiler chose to build your kernel SIMD8, SIMD16 or SIMD32.

is that means when i call get_global_size it will return different value according how the compiler compile the kernel(with SIMD8, SIMD16 or SIMD32​)?

 


[Announcement]: Codebuilder plugin for Eclipse runtime dependencies

$
0
0

Hello Intel® SDK for OpenCL™ Applications 2017 Users,

Some Intel® SDK for OpenCL™ Applications 2017 R2 users have observed an error box when their Eclipse instance loads up the Codebuilder plugin. The error box prohibits the user’s ability to use the plugin. Here is a screen shot of an example error box:

We’ve confirmed some mitigations. In short, Ubuntu 16.04.4 was used with Eclipse 4.7 (Oxygen) to confirm libicu52 dynamic runtimes are needed for Codebuilder as distributed in Intel® SDK for OpenCL™ Applications 2017 R2.

Please see this updated getting started guide page for details:

https://software.intel.com/en-us/articles/sdk-for-opencl-gsg

Please see these updated release notes that also address the issue:

https://software.intel.com/en-us/articles/opencl-code-builder-release-notes

 

Thank you,

 

-MichaelC

Intel HD Graphics 4600 GPU + ternary operator on pointers + while-loop = freeze

$
0
0

Hi!

Found and refined new strange cause for freezing OpenCL kernel.

This is absolutely minimal demonstration:

__kernel void freeze(
    __global int *a,
    __global int *b,
    int parity )
{
    __global int* c = (parity>0) ? a : b;
    uint id = (uint) get_global_id(0);
    while (id<256) {
        c[id] = -1;
        id += (uint) get_local_size(0);
    }
}

(The ZIP archive with complete code of this minimal demonstration is attached to the message)

Conditions:

1. Intel HD Graphics 4600 GPU (while all others I checked -- Intel CPU, NVIDIA GPU, AMD GPU -- run fine)

2. work group size: starting from 32

3. kernel like this:

  •   ternary operator choosing from one of two global arrays of the same type
  •   while-loop to write into this array (size 256 can be anything else)

4. the condition of the operator is met (in this code, parity is greater than zero)

 

Then the program locks, several seconds later the computer freezes completely so you need to press Reset.

-----------------

I hope there are no dumb errors:

  •  both arrays exist,
  •  their size is large enough: is 65536*sizeof(cl_int)
  •  and you can explicitly use either of them in writing: a[id]=-1 and b[id]=-1 work OK. so, only access to c[] in looped manner cause the freezing.
  • the initial (rather complex) program works well on many other devices.

P.S.
if printed, the pointer 'c' looks OK:

  if (id==0) printf(" a=%p b=%p c=%p\n", a, b, c);

it prints, for example:
 

     a=0x100000000 b=0x200000000 c=0x100000000
    or
     a=0x100000000 b=0x200000000 c=0x200000000

The code with printf() does not freeze. But I don't need any printf() in the production code :)

Looks like bu... well... imperfection :)
And I don't see any restriction on ternary operator in documentation.

 

Best Regard,

Petr

Convenience of vector data types on current GPUs

$
0
0

I have a question regarding the use of vector data types inside OpenCL kernels. Since I am working in OpenCL I have heard about the advantages of using vector data types to bulk load/store data from/to device memory, to take chance of SSE and/or AVX instructions available on CPUs. However, looking to the CL_DEVICE_PREFERRED_VECTOR_WIDTH_INT/LONG/FLOAT property of several GPUs (including Intel HD, AMD and NVIDIA graphics processors) all of them present a value of 1. So it seems that in current GPU architectures they do not take any advantage of the use of vector data types, as pointed in the following stackoverflow discussion: https://stackoverflow.com/questions/16258930/speedup-when-using-float4-o...

Therefore, would you recommend (excluding CPUs) to use vector data types to store data on global memory?. I am currently working on a Monte Carlo code for particle transport and I use float4 data types to store particle information (position, energy, etc.), its attributes are "codified" on these data types and therefore I usually have to extract them addressing the vector components, for example:

// store is a float4 data type!!
float position = store.xyz
float energy = store.w

Maybe it would be more advisable (on a performance point of view) to just use plain int or float data types?.

Thanks for your help!

(Linux) Incompatibility between SDK experimental ICD and CPU runtime 6.4.0.25

$
0
0

I have recently installed the 2017 R2 SDK on a Linux machine, alongside an existing installation of the 2016 CPU runtime. At installation time, I have opted to install the experimental runtime with OpenCL 2.1 support. Now, when I run clinfo I randomly come across a segmentation fault

Stack dump:
0.      Running pass 'ChannelPipeTransformation' on module 'main'.

due to the 2016 runtime finding the experimental libOclCpuBackEnd.so instead of its own. I can work around this by selectively hiding one of the two platforms, but an “official” fix (e.g. by renaming the libraries that differ between experimental and official runtimes) would be preferrable.

Bug report: clCreateKernelsInProgram is not thread-safe

$
0
0

When invoking clCreateKernelsInProgram to the same cl_program from many different threads. The Intel OpenCL runtime will randomly raise a CL_INVALID_VALUE error.

Neo crashes

$
0
0

OS: Ubuntu 17.10

Gdb clinfo output:

...@...:~$ gdb clinfo  
GNU gdb (Ubuntu 8.0.1-0ubuntu1) 8.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from clinfo...(no debugging symbols found)...done.
(gdb) run
Starting program: /usr/bin/clinfo 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Number of platforms                               2
  Platform Name                                   Experimental OpenCL 2.1 CPU Only Platform
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 2.1 LINUX
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer 
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             INTEL

  Platform Name                                   Intel(R) OpenCL HD Graphics
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 2.1 
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_depth_images cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_media_block_io cl_intel_driver_diagnostics cl_intel_device_side_avc_motion_estimation cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_khr_subgroups cl_khr_il_program cl_khr_fp64 cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_advanced_motion_estimation 
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             INTEL

  Platform Name                                   Experimental OpenCL 2.1 CPU Only Platform
Number of devices                                 1
  Device Name                                     Intel(R) Core(TM) i3-6100 CPU @ 3.70GHz
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 2.1 (Build 10)
  Driver Version                                  1.2.0.10
  Device OpenCL C Version                         OpenCL C 2.0 
  Device Type                                     CPU
  Device Profile                                  FULL_PROFILE
  Max compute units                               4
  Max clock frequency                             3700MHz
  Device Partition                                (core)
    Max number of sub-devices                     4
    Supported partition types                     by counts, equally, by names (Intel)
  Max work item dimensions                        3
  Max work item sizes                             8192x8192x8192
  Max work group size                             8192
[New Thread 0x7ffef2b3a700 (LWP 14210)]
[New Thread 0x7ffeebfff700 (LWP 14211)]
[New Thread 0x7ffef2739700 (LWP 14212)]
  Preferred work group size multiple              128
  Max sub-groups per work group                   1
  Preferred / native vector sizes                 
    char                                                 1 / 32      
    short                                                1 / 16      
    int                                                  1 / 8       
    long                                                 1 / 4       
    half                                                 0 / 0        (n/a)
    float                                                1 / 8       
    double                                               1 / 4        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              16704724992 (15.56GiB)
  Error Correction support                        No
  Max memory allocation                           4176181248 (3.889GiB)
  Unified memory for Host and Device              Yes
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   Yes
    Fine-grained system sharing                   Yes
    Atomics                                       Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Preferred alignment for atomics                 
    SVM                                           64 bytes
    Global                                        64 bytes
    Local                                         0 bytes
  Max size for global variable                    65536 (64KiB)
  Preferred total size of global vars             65536 (64KiB)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        262144
  Global Memory cache line                        64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             480
    Max size for 1D images from buffer            261011328 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   64 bytes
    Pitch alignment for 2D image buffers          64 bytes
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 480
    Max number of write image args                480
    Max number of read/write image args           480
  Max number of pipe args                         16
  Max active pipe reservations                    65535
  Max pipe packet size                            1024
  Local memory type                               Global
  Local memory size                               32768 (32KiB)
  Max constant buffer size                        131072 (128KiB)
  Max number of constant args                     480
  Max size of kernel argument                     3840 (3.75KiB)
  Queue properties (on host)                      
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Local thread execution (Intel)                Yes
  Queue properties (on device)                    
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Preferred size                                4294967295 (4GiB)
    Max size                                      4294967295 (4GiB)
  Max queues on device                            4294967295
  Max events on device                            4294967295
  Prefer user sync for interop                    No
  Profiling timer resolution                      1ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            Yes
    Sub-group independent forward progress        No
    IL version                                    SPIR-V_1.0
    SPIR versions                                 1.2
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Device Extensions                               cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_depth_images cl_khr_3d_image_writes cl_intel_exec_by_local_thread cl_khr_spir cl_khr_fp64 cl_khr_image2d_from_buffer 

  Platform Name                                   Intel(R) OpenCL HD Graphics
Number of devices                                 1
  Device Name                                     Intel(R) Gen9 HD Graphics NEO
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 2.1 NEO 
  Driver Version                                  1.0
  Device OpenCL C Version                         OpenCL C 2.0 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Max compute units                               23
  Max clock frequency                             1050MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256

Thread 1 "clinfo" received signal SIGSEGV, Segmentation fault.
TC::Utils::FillOutputArgs (pFEBinaryResult=0x0, pOutputArgs=0x7fffffffd940, exceptString="") at /home/.../neo/igc/IGC/OCLFE/igd_fcl_mcl/source/clang_tb.cpp:139
139                             pOutputArgs->ErrorStringSize = (uint32_t)strlen(pFEBinaryResult->GetErrorLog());
(gdb) bt
#0  TC::Utils::FillOutputArgs (pFEBinaryResult=0x0, pOutputArgs=0x7fffffffd940, exceptString="") at /home/.../neo/igc/IGC/OCLFE/igd_fcl_mcl/source/clang_tb.cpp:139
#1  0x00007ffef2c8dde6 in TC::CClangTranslationBlock::TranslateClang (this=this@entry=0x170ab00, pInputArgs=pInputArgs@entry=0x7fffffffd7c0, 
    pOutputArgs=pOutputArgs@entry=0x7fffffffd940, exceptString="", 
    pInternalOptions=0xbae6f0 "-ocl-version=210 -fpreserve-vec3-type -cl-ext=-all,+cl_khr_3d_image_writes,+cl_khr_byte_addressable_store,+cl_khr_fp16,+cl_khr_depth_images,+cl_khr_global_int32_base_atomics,+cl_khr_global_int32_exten"...) at /home/.../neo/igc/IGC/OCLFE/igd_fcl_mcl/source/clang_tb.cpp:824
#2  0x00007ffef2c8fe73 in TC::CClangTranslationBlock::Translate (this=0x170ab00, pInputArgs=0x7fffffffd970, pOutputArgs=0x7fffffffd940)
    at /home/.../neo/igc/IGC/OCLFE/igd_fcl_mcl/source/clang_tb.cpp:1055
#3  0x00007ffef2c93a96 in IGC::FclOclTranslationCtx<0ul>::Impl::Translate (tracingOptionsCount=0, tracingOptions=0x0, internalOptions=0x637b50, options=0x16ebaa0, 
    src=<optimized out>, outVersion=1, this=0x17797a0) at /home/.../neo/igc/IGC/AdaptorOCL/ocl_igc_interface/impl/fcl_ocl_translation_ctx_impl.h:115
#4  IGC::FclOclTranslationCtx<1ul>::TranslateImpl (this=<optimized out>, outVersion=1, src=<optimized out>, options=0x16ebaa0, internalOptions=0x637b50, 
    tracingOptions=0x0, tracingOptionsCount=0) at /home/.../neo/igc/IGC/AdaptorOCL/ocl_igc_interface/impl/fcl_ocl_translation_ctx_impl.cpp:42
#5  0x00007ffff539eef4 in OCLRT::CompilerInterface::build(OCLRT::Program&, OCLRT::TranslationArgs const&, bool) () from /opt/intel/opencl/libigdrcl.so
#6  0x00007ffff53d24c3 in OCLRT::Program::build(unsigned int, _cl_device_id* const*, char const*, void (*)(_cl_program*, void*), void*, bool) ()
   from /opt/intel/opencl/libigdrcl.so
#7  0x00007ffff5375156 in clBuildProgram () from /opt/intel/opencl/libigdrcl.so
#8  0x00000000004057f8 in ?? ()
#9  0x0000000000405d0c in ?? ()
#10 0x0000000000406a57 in ?? ()
#11 0x0000000000407642 in ?? ()
#12 0x00000000004014c4 in ?? ()
#13 0x00007ffff760a1c1 in __libc_start_main (main=0x401180, argc=1, argv=0x7fffffffdff8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, 
    stack_end=0x7fffffffdfe8) at ../csu/libc-start.c:308
#14 0x00000000004015a9 in ?? ()
(gdb) 

Ask your questions, please.

opencl_runtime_16.1.2 not for ubuntu

$
0
0

Hi all

On this page:

https://software.intel.com/en-us/articles/opencl-drivers#latest_CPU_runtime

this link is stating both RH and Ubuntu :

but the installer opencl_runtime_16.1.2_x64_rh_6.4.0.37

is not compatible with Ubuntu :

"

Unsupported OS
Detected operating system is not supported. Supported operating systems for this release are:
CentOS* 7 (Intel(R) 64), 7 (Intel(R) 64)
Red Hat Enterprise Linux* 7 (Intel(R) 64), 6.5 (Intel(R) 64), 6.6 (Intel(R) 64)

"

Kind

WT

 

 

 


Getting Started in Linux with Intel® SDK for OpenCL™ Applications

$
0
0

I followed the instructions here: https://software.intel.com/en-us/articles/sdk-for-opencl-gsg

I have successfully completed all the steps listed above. When running neon.3 eclipse I have got the following message:

"Cannot get platform list:

Could not load required libraries; please make sure to set the correct path under the Code Builder for OpenCL preference page."

python sys_analyzer -v outputs:

"

--------------------------
Hardware readiness checks:
--------------------------
 [ OK ] Processor name: Intel(R) Core(TM) i3-4010U CPU @ 1.70GHz
 [ INFO ] Intel Processor
 [ INFO ] Processor brand: Core
 [ INFO ] Processor arch: Haswell
--------------------------
OS readiness checks:
--------------------------
 [ INFO ] GPU PCI id     : 0A16
 [ INFO ] GPU description: HSW ULT GT2 mobile
 [ OK ] GPU visible to OS
 [ INFO ] no nomodeset in GRUB cmdline (good)
 [ INFO ] Linux distro   : Ubuntu 16.04
 [ INFO ] Linux kernel   : 4.13.0-36-generic
 [ INFO ] glibc version  : 2.23
 [ INFO ] Linux distro suitable for Generic install
 [ INFO ] gcc version    : 20160609 (>=4.8.2 suggested)
--------------------------
Media Server Studio Install:
--------------------------
 [ OK ] user in video group
 [ OK ] libva.so.1 found
 [ ERROR ] libva not loading Intel iHD
 [ OK ] vainfo reports valid codec entry points
 [ INFO ] i915 driver in use by Intel video adapter
 [ OK ] /dev/dri/renderD128 connects to Intel i915
--------------------------
Component Smoke Tests:
--------------------------
 [ ERROR ] no Media SDK include files.  Are Intel components installed?
 [ OK ] OpenCL check:platform:Intel(R) OpenCL GPU OK CPU OK
platform:Experimental OpenCL 2.1 CPU Only Platform GPU OK CPU OK

"

I also successfully able to run

ioc64 -input=simpleAdd.cl -asm

 

What is missing? Do I have to install anything else?

Thank you...

 

 

cannot download opencl sdk

$
0
0

Hello, everyone.

I want to get opencl sdk, but I can't.

I clicked download button, and I got this message only.

But I didn't got even spam e-mail...

Anyone know this problem?

Thanks

Intel® SDK for OpenCL™ Applications for Windows*

2017

Thank you for registering for Intel® SDK for OpenCL™ Applications for Windows*.
Please check your email for instructions to download your product. Note that this may take up to two business days. If you have any questions please refer to our support page.

How can I move the intel OpenCL SDK files to another directory?

$
0
0

 Hi, my computer specifications are as follows.

computer : new surface pro 

Cpu : intel Core m3-7Y30

Gpu : intel HD graphics 615

IDE: Visual Studio 2017 Community

 

I installed "intel_sdk_for_opencl_2017_7.0.0.2567"  to use OpenCL, and successfully performed.

But I want to move sdk folder in C:\Intel to another directory,   C:\usr\local

So I just cut the whole sdk folder (which is in C:\Intel) and I copy that folder to another directory(C:\usr\local).

But When I open my openCL solution file by Visual studio 2017 Community,

there are Error messages as follows.

        Code Builder initialization failed:

        Unable to load DLL 'kernelBuilderAPI.dll": The specified module could not be found. (Exception from HRESULT: 0x8007007E)

 

How can I fix this problem?

I just want another directory for OpenCL

 

OpenCL on multicore processor

$
0
0

Hi everyone!

I'm new here and my first question is very basic. I was not able to find the clear answer anywhere. I have some knowledge in working with OpenCL on FPGA platform, but now I'm trying to implement the same algorithm on the CPU platform (Core).

My question is: Can I implement an OpenCL application only on CPU cores (without using GPUs)? My goal is to run an application similar to multithreaded concept, but using OpenCL. In this configuration, one CPU core should be the host, and other cores should be devices. Is this possible at all? Or is OpenCL based on the concept: CPU acts as host and GPU cores act as devices, and it cannot function in any other way?

Thanks in advance!

Windows 10 driver 23.20.16.5018 - NEO CL + Vulkan 1.1

$
0
0

The 23.20.16.5018 driver appeared in Windows Update a few days ago.

I see that it is the first Windows driver with a device version string of "OpenCL 2.1 NEO".

The driver also supports Vulkan 1.1 and enables all subgroup operations across all stages.  Impressive!

One question, will Vulkan shaders always be compiled with a subgroup width of 32 or will the compiler have latitude to select a subgroup size (8, 16, or 32) based on register occupancy like the OpenCL kernel compiler?

Presumably this OpenCL runtime is based on the recently open-sourced NEO codebase?

This is very exciting!

Viewing all 1182 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>