I look from other topic that:
Each Execution Unit (EU) in our integrated graphics has seven hardware threads, each hardware thread is capable of running 8, 16, or 32 work items depending on whether compiler chose to build your kernel SIMD8, SIMD16 or SIMD32.
is that means when i call get_global_size it will return different value according how the compiler compile the kernel(with SIMD8, SIMD16 or SIMD32)?