You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Isolate deepmd GPU kernels that use atomicAdd floating point, study them and compare them to their CPU counterparts.
Random data are enough for this comparison. Then remove the atomic operations and test again.
Atomic operations are present because they have multiple levels of parallelization. Contrary to my initial thoughts the atomic add is used because one parallelization level involves the neighbors. This can be removed completely with a minimal amount of change.
Isolate deepmd GPU kernels that use atomicAdd floating point, study them and compare them to their CPU counterparts.
Random data are enough for this comparison. Then remove the atomic operations and test again.
Atomic operations are present because they have multiple levels of parallelization. Contrary to my initial thoughts the atomic add is used because one parallelization level involves the neighbors. This can be removed completely with a minimal amount of change.
example
the
y
dimension of the block is used for parallelization over neighbors. That's why the atomic add is used.better way to do this
but it requires different launch parameters. so it should be tested.
The text was updated successfully, but these errors were encountered: