I wrote an OpenCL code to multiply a row vector into a Compressed Sparse Row matrix. But it gives me a different answer eachtime I run it.
I have built a small repro case based on my matrcies. As one can see, the program breaks at differnet values of j, despite it is expected to print Success. I think the problem is related to atomic_cmpxchg cache flushing, since the loop containing it always run only one time, which is a little stange.
Can any body help me on this please?
Thanks.