Quantcast
Channel: Intel® Software - OpenCL*
Viewing all articles
Browse latest Browse all 1182

unknown optimization on x64

$
0
0

I have written a benchmarking application for opencl https://github.com/krrishnarraj/clpeak . One of the tests include measuring compute capacity(gflops) of the device. When run on windows 32, it gives expected results on sandybridge as

Platform: Intel(R) OpenCL
  Device:       Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz
    Driver version: 1.2 (Win32)

    Single-precision compute (GFLOPS)
      float   : 25.19
      float2  : 50.48
      float4  : 50.37
      float8  : 51.75
      float16 : 51.85

Theoratical peak of this device is 76.8 gflops

But when same code runs on 64 bit, it gives a different result

Platform: Intel(R) OpenCL
  Device:       Intel(R) Core(TM) i7-3630QM CPU @ 2.40GHz
    Driver version: 1.2 (Win64)

    Single-precision compute (GFLOPS)
      float   : 25.15
      float2  : 99.25
      float4  : 172.25
      float8  : 80.07
      float16 : 96.42

Looks like vector code(float2, float4) has been optimized out to float or some out-of-order optimization has happend. Not sure what is happening!!

ASM output from kernel-analyzer has properly generated all fmad & fmul. Is there any optimization that is specific to x64? anything advanced?


Viewing all articles
Browse latest Browse all 1182

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>