In this article there are recommendations about how to use OpenCL properly to achieve zero copy behavior when using Intel HD Graphics. In particular, there is a recommendation to use CL_MEM_ALLOC_HOST_PTR | CL_MEM_COPY_HOST_PTR in the following cases:
- You want the OpenCL runtime to handle the size and alignment requirements.
- In cases when you may be reading or writing data from a file or another I/O stream and aren't allowed to write to the buffer you are given.
- Buffer is not already in a properly aligned and sized allocation and you want it to be.
- You are okay with the performance cost of the copy relative to the length of time your application executes, for example at initialization.
- Porting existing application code where you don't know if it has been aligned and sized properly.
- The buffer used to create the OpenCL buffer needs the data to be unmodified and you want to write to the buffer
But what is the point of usage of CL_MEM_ALLOC_HOST_PTR | CL_MEM_COPY_HOST_PTR for the abovementioned cases, why can't we just use CL_MEM_COPY_HOST_PTR? Intel HD Graphics doesn't have its own memory, so a new buffer will be definitely allocated at RAM. And it seems that CL_MEM_COPY_HOST_PTR does all the necessary job about alignment and size (which is rather reasonable).
The only argument that came into my mind is that sometimes Intel HD Graphics do have its own relatively small memory, and by using CL_MEM_ALLOC_HOST_PTR we guarantee that the allocation will be definitely done at RAM, but it doesn't seem very convincing, so, maybe I miss something about CL_MEM_ALLOC_HOST_PTR's behavior.