Quantcast
Channel: Intel® Software - OpenCL*
Viewing all articles
Browse latest Browse all 1182

Strange behavior when using "managed c++" memory.

$
0
0

In our project we mix Microsoft .NET code with native code, and we're trying to speed up areas using OpenCL. Here is a block of code I'm working on: 

array<System::Byte>^ OpenCLBase::DoIt(array<System::UInt16>^ toDo, int maxDiff, int width, int height)
{
    array<System::Byte>^ retManaged = gcnew array<System::Byte> (toDo->Length);

    pin_ptr<unsigned char> retPinned = &retManaged[0];
    unsigned char* retBuffer = retPinned;

    pin_ptr<unsigned short> managedPin = &toDo[0];
    unsigned short* pinnedData = managedPin;

    cl_int error;
    int bufferSizeInBytes = 512 * 424 * 2;

    cl_mem inputBuffer = clCreateBuffer(_context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR,
        bufferSizeInBytes, pinnedData, &error);

    /* IF THIS IS UNCOMMENTED, THEN 'V' BELOW IS 55. AND THE PERFORMANCE IS HIGH. */
    /* ========================================================================== */
    cl_mem outputBuffer = clCreateBuffer(_context, CL_MEM_WRITE_ONLY,
        retManaged->Length, NULL, &error);
    /* ========================================================================== */

    /* IF THIS IS UNCOMMENTED, THEN 'V' BELOW IS 55. THE PERFORMANCE IS NOT AS GOOD. */
    /* ============================================================================= */
    //unsigned char* mask = (unsigned char*)_aligned_malloc(512 * 424, 4096);
    //cl_mem outputBuffer = clCreateBuffer(_context, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR,
    //    retManaged->Length, mask, &error);
    /* ============================================================================= */

    /* IF THIS IS UNCOMMENTED, THEN 'V' BELOW IS 0. */
    /* ============================================ */
    //cl_mem outputBuffer = clCreateBuffer(_context, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR,
    //    retManaged->Length, retBuffer, &error);
    /* ============================================ */

    if (error == CL_SUCCESS)
    {
        cl_event clearCompleted;

        void* mappedBuffer = ::clEnqueueMapBuffer(
            _queue, // the command queue
            outputBuffer, // the output buffer
            CL_TRUE, // can't be unmapped before read
            CL_MAP_WRITE, // mapped for writing.
            0, // no offset
            retManaged->Length, // the size
            0, // no events on the waiting list
            NULL, // no event list&clearCompleted, // event to wait on&error);

        error = clWaitForEvents(1, &clearCompleted);

        if (error == CL_SUCCESS)
        {
            error = clSetKernelArg(_kernel, 0, sizeof(cl_mem), (void*)&inputBuffer);
            error |= clSetKernelArg(_kernel, 1, sizeof(cl_mem), (void*)&outputBuffer);
            error |= clSetKernelArg(_kernel, 2, sizeof(height), &height);
            error |= clSetKernelArg(_kernel, 3, sizeof(width), &width);
            error |= clSetKernelArg(_kernel, 4, sizeof(maxDiff), &maxDiff);

            if (error == CL_SUCCESS)
            {
                size_t workgroupDims[2];
                workgroupDims[0] = width;
                workgroupDims[1] = height;

                //https://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/clEnqueueNDRangeKernel.html

                DWORD dwStart = ::GetTickCount();
                error = clEnqueueNDRangeKernel(_queue, // the queue
                    _kernel, // the kernel
                    2, // the number of dimensions of this work-group
                    NULL, // always NULL
                    workgroupDims, // the dimensons of the work group
                    NULL, // # of work-items in a work group. NULL = letting OpenCL figure it out.
                    0, 0, 0); // no events to wait on, nor are we waiting.
                clFinish(_queue); // blocks until the queue has finished

                char* test = (char*)mappedBuffer;
                char v = test[0];

                // v is 55.
            }
        }

        clReleaseMemObject(inputBuffer);
        clReleaseMemObject(outputBuffer);
    }

    return retManaged;
}

I also have a toy kernel which just sets the memory to 55 (ignore the extra parameters as I had to shave this kernel function down to illustrate my point): 

__kernel void FindEdges(__global ushort* iterateValues,
    __global char* writeValues,
    int height, int width, int maxDiff)
{
    const int x     = get_global_id(0);
    const int y     = get_global_id(1);
    const int stride = get_global_size(0);

    int i = y*stride+x;

    ushort val = iterateValues[i];
    int minval = val - maxDiff;
    int maxval = val + maxDiff;

    writeValues[i] = 55;
}

For those who maybe don't know much about Microsoft's managed C++, it allows you to write native code that interacts closely with "managed" .NET code. If you want to access raw memory from managed C++, then you need to "pin" it. This prevents Microsoft's garbage collector from moving the memory around and causing issues with native code that expects memory to stay in one spot. You can see what I'm doing if you look at " pin_ptr<unsigned char>" in the above code (pin_ptr<> is a pinned pointer). To access the raw memory, one then just casts the pinned pointer to a native type. This is a snippet from above that shows this: 

    pin_ptr<unsigned char> retPinned = &retManaged[0];
    unsigned char* retBuffer = retPinned;

Problem is, if I want to use clEnqueueMapBuffer and map this pointer, it doesn't appear to work. I use the value of "char v = test[0]" above to note whether it's working. In the code above I have three regions that can be commented/uncommented. If I want everything to run fine, then I uncomment this code: 
 

   cl_mem outputBuffer = clCreateBuffer(_context, CL_MEM_WRITE_ONLY,
        retManaged->Length, NULL, &error);

The "char v" at the bottom of the code snippet above is 55 (which is expected). If I uncomment this, 

unsigned char* mask = (unsigned char*)_aligned_malloc(512 * 424, 4096);
cl_mem outputBuffer = clCreateBuffer(_context, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR,
    retManaged->Length, mask, &error);

...then I also get expected behavior. But it's slower (as a side note, why is this? Why is using CL_MEM_USE_HOST_PTR slower than in my first example?). 

However, if I uncomment this, then nothing works at all. : 

cl_mem outputBuffer = clCreateBuffer(_context, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR,
    retManaged->Length, retBuffer, &error);

Meaning the "char v" at the bottom of the code is uninitialized, almost as if the memory never got set. 

I'm really new at this, but I've poured over the code and can't find anything I'm doing wrong. I've tried executing clCreateBuffer against natively allocated (malloc) buffers that aren't page aligned before and it worked, so I'm not thinking it's due to that (maybe it is still)? 

The width/height are 512/424, respectively. 

Any help would be appreciated. 

 


Viewing all articles
Browse latest Browse all 1182

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>