I have an Intel HD 4600 gpu and noticed some performance discrepancies when running a microbenchmark with a significant number of loops for built-in math functions (arithmetic operators are fine). The results are compared against results from running the microbenchmark on the cpu, and running the standard C math functions in a loop (vectorisation and optimizations are avoided). So my question is this; is there a big loop or math function overhead when executing a kernel on an Intel HD GPU?
↧