Hi,
In Intel Vtune Amplifier profiler, there is no counter for how many instructions execute on Integrated GPUs.
Instead, the profiler provide three metrics indicating the ratio of EU in state active, stall and idle.
So if my kernel (written in OpenCL) is highly divergent and the divergence is input dependent, it is difficult to measure the GFLOPS,
any ideas?