-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Threadgroup atomics require all-atomic operation #217
Comments
Hmm, this seems to apply to other atomics as well: using Metal
function local_kernel(f, a, val::T) where T
i = thread_position_in_grid_1d()
b = MtlThreadGroupArray(T, 128)
#b[i] = a[i]
val = Metal.atomic_load_explicit(pointer(a, i))
Metal.atomic_store_explicit(pointer(b, i), val)
f(pointer(b, i), val)
#a[i] = b[i]
val = Metal.atomic_load_explicit(pointer(b, i))
Metal.atomic_store_explicit(pointer(a, i), val)
return
end
function main(; T=Int32, n=16)
a = ones(T, n)
b = MtlArray(a)
val = one(T)
@metal threads=n local_kernel(Metal.atomic_fetch_add_explicit, b, val)
@show .+(a, val)
@show Array(b)
return
end |
I think this is a general truth (and why we don't have atomics for arrays yet) If you mix atomic operations with non-atomic operations you will get issues. But I would have expected that the load and stores to thread-local so |
Why is that? Every thread is accessing its own memory locations, so why would mixing atomics with regular loads and stores not work? Note that removing atomics altogether works fine here. |
MWE:
Note how the load and stores that initialize the threadgroup memory and copy it back to global memory need to be atomics for this example to work, even though every thread has its own dedicated memory address to act upon. Demoting those operations to regular array operations results in the final array containing all zeros.
This smells like an upstream bug, especially because the above pattern is impossible to replicate in Metal C (where
atomic_int
is used as element type, promoting all operations to atomic):The text was updated successfully, but these errors were encountered: