I enqueued 5 kernels for an in-order execution in my application.Noticed an unusual behaviar after timing the program.One kernel's enqueue API cost a lot more time than the others, almost as much as the actual kernels running time.The enqueue API should be very efficient according to my experience , so what may cause this stalling?
↧