Strange behavior when using "managed c++" memory.

In our project we mix Microsoft .NET code with native code, and we're trying to speed up areas using OpenCL. Here is a block of code I'm working on:

array<System::Byte>^ OpenCLBase::DoIt(array<System::UInt16>^ toDo, int maxDiff, int width, int height)
{
    array<System::Byte>^ retManaged = gcnew array<System::Byte> (toDo->Length);

    pin_ptr<unsigned char> retPinned = &retManaged[0];
    unsigned char* retBuffer = retPinned;

    pin_ptr<unsigned short> managedPin = &toDo[0];
    unsigned short* pinnedData = managedPin;

    cl_int error;
    int bufferSizeInBytes = 512 * 424 * 2;

    cl_mem inputBuffer = clCreateBuffer(_context, CL_MEM_READ_ONLY | CL_MEM_USE_HOST_PTR,
        bufferSizeInBytes, pinnedData, &error);

    /* IF THIS IS UNCOMMENTED, THEN 'V' BELOW IS 55. AND THE PERFORMANCE IS HIGH. */
    /* ========================================================================== */
    cl_mem outputBuffer = clCreateBuffer(_context, CL_MEM_WRITE_ONLY,
        retManaged->Length, NULL, &error);
    /* ========================================================================== */

    /* IF THIS IS UNCOMMENTED, THEN 'V' BELOW IS 55. THE PERFORMANCE IS NOT AS GOOD. */
    /* ============================================================================= */
    //unsigned char* mask = (unsigned char*)_aligned_malloc(512 * 424, 4096);
    //cl_mem outputBuffer = clCreateBuffer(_context, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR,
    //    retManaged->Length, mask, &error);
    /* ============================================================================= */

    /* IF THIS IS UNCOMMENTED, THEN 'V' BELOW IS 0. */
    /* ============================================ */
    //cl_mem outputBuffer = clCreateBuffer(_context, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR,
    //    retManaged->Length, retBuffer, &error);
    /* ============================================ */

    if (error == CL_SUCCESS)
    {
        cl_event clearCompleted;

        void* mappedBuffer = ::clEnqueueMapBuffer(
            _queue, // the command queue
            outputBuffer, // the output buffer
            CL_TRUE, // can't be unmapped before read
            CL_MAP_WRITE, // mapped for writing.
            0, // no offset
            retManaged->Length, // the size
            0, // no events on the waiting list
            NULL, // no event list&clearCompleted, // event to wait on&error);

        error = clWaitForEvents(1, &clearCompleted);

        if (error == CL_SUCCESS)
        {
            error = clSetKernelArg(_kernel, 0, sizeof(cl_mem), (void*)&inputBuffer);
            error |= clSetKernelArg(_kernel, 1, sizeof(cl_mem), (void*)&outputBuffer);
            error |= clSetKernelArg(_kernel, 2, sizeof(height), &height);
            error |= clSetKernelArg(_kernel, 3, sizeof(width), &width);
            error |= clSetKernelArg(_kernel, 4, sizeof(maxDiff), &maxDiff);

            if (error == CL_SUCCESS)
            {
                size_t workgroupDims[2];
                workgroupDims[0] = width;
                workgroupDims[1] = height;

                //https://www.khronos.org/registry/cl/sdk/1.0/docs/man/xhtml/clEnqueueNDRangeKernel.html

                DWORD dwStart = ::GetTickCount();
                error = clEnqueueNDRangeKernel(_queue, // the queue
                    _kernel, // the kernel
                    2, // the number of dimensions of this work-group
                    NULL, // always NULL
                    workgroupDims, // the dimensons of the work group
                    NULL, // # of work-items in a work group. NULL = letting OpenCL figure it out.
                    0, 0, 0); // no events to wait on, nor are we waiting.
                clFinish(_queue); // blocks until the queue has finished

                char* test = (char*)mappedBuffer;
                char v = test[0];

                // v is 55.
            }
        }

        clReleaseMemObject(inputBuffer);
        clReleaseMemObject(outputBuffer);
    }

    return retManaged;
}

I also have a toy kernel which just sets the memory to 55 (ignore the extra parameters as I had to shave this kernel function down to illustrate my point):

__kernel void FindEdges(__global ushort* iterateValues,
    __global char* writeValues,
    int height, int width, int maxDiff)
{
    const int x     = get_global_id(0);
    const int y     = get_global_id(1);
    const int stride = get_global_size(0);

    int i = y*stride+x;

    ushort val = iterateValues[i];
    int minval = val - maxDiff;
    int maxval = val + maxDiff;

    writeValues[i] = 55;
}

For those who maybe don't know much about Microsoft's managed C++, it allows you to write native code that interacts closely with "managed" .NET code. If you want to access raw memory from managed C++, then you need to "pin" it. This prevents Microsoft's garbage collector from moving the memory around and causing issues with native code that expects memory to stay in one spot. You can see what I'm doing if you look at " pin_ptr<unsigned char>" in the above code (pin_ptr<> is a pinned pointer). To access the raw memory, one then just casts the pinned pointer to a native type. This is a snippet from above that shows this:

    pin_ptr<unsigned char> retPinned = &retManaged[0];
    unsigned char* retBuffer = retPinned;

Problem is, if I want to use clEnqueueMapBuffer and map this pointer, it doesn't appear to work. I use the value of "char v = test[0]" above to note whether it's working. In the code above I have three regions that can be commented/uncommented. If I want everything to run fine, then I uncomment this code:

   cl_mem outputBuffer = clCreateBuffer(_context, CL_MEM_WRITE_ONLY,
        retManaged->Length, NULL, &error);

The "char v" at the bottom of the code snippet above is 55 (which is expected). If I uncomment this,

unsigned char* mask = (unsigned char*)_aligned_malloc(512 * 424, 4096);
cl_mem outputBuffer = clCreateBuffer(_context, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR,
    retManaged->Length, mask, &error);

...then I also get expected behavior. But it's slower (as a side note, why is this? Why is using CL_MEM_USE_HOST_PTR slower than in my first example?).

However, if I uncomment this, then nothing works at all. :

cl_mem outputBuffer = clCreateBuffer(_context, CL_MEM_READ_WRITE | CL_MEM_USE_HOST_PTR,
    retManaged->Length, retBuffer, &error);

Meaning the "char v" at the bottom of the code is uninitialized, almost as if the memory never got set.

I'm really new at this, but I've poured over the code and can't find anything I'm doing wrong. I've tried executing clCreateBuffer against natively allocated (malloc) buffers that aren't page aligned before and it worked, so I'm not thinking it's due to that (maybe it is still)?

The width/height are 512/424, respectively.

Any help would be appreciated.

Strange behavior when using "managed c++" memory.

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112