Does subgroup extension API "intel_sub_group_block_readN/writeN" have better performance than "vloadN/vstoreN"? I did some testing, but don't see much difference between them. Can you elaborate the read/write performance expectation between them?
↧
Does subgroup extension API "intel_sub_group_block_readN/writeN" have better performance than "vloadN/vstoreN"? I did some testing, but don't see much difference between them. Can you elaborate the read/write performance expectation between them?