Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix incorrect use of POSIX atomics (github Issue 91)
The nvtxInit.h macro definitions for NVTX_ATOMIC_WRITE_32 and NVTX_ATOMIC_CAS_32 were correct on Windows, but not on POSIX. For the CAS (compare and swap), the "newval" and "oldval" arguments were reversed. This led to an unfortunate bug where the NVTX init was not threadsafe, and the problem was not noticed because the CAS operation with the arguments reversed happened to be returning the FRESH value even though it did not succeed in doing the swap. The reporter of Issue 91 on the NVTX github page correctly described that __sync_val_compare_and_swap takes the comparand (oldval) and exchange (newval) arguments in the reverse order we were passing them. The fix here is simply to swap the order we pass them from the macro to __sync_val_compare_and_swap. While reviewing the docs for __sync_val_compare_and_swap, I noticed that the function we used in the implementation of NVTX_ATOMIC_WRITE_32 may not do what we want on all platforms. __sync_lock_test_and_set is only an acquire barrier, not a full memory barrier, and here we are using it as a release instead. Also some platforms apparently only support writing the value 1, and we are writing the value 2. So I changed this to simply do a memory barrier, a volatile write, and another memory barrier. Also, this change removes an unnecessary __sync_synchronize in the macro before __sync_val_compare_and_swap. I reviewed the generated assembly using godbolt, and confirmed on x86 and ARM the correct memory-ordering behavior is produced just by __sync_val_compare_and_swap by itself. This raises the issue we need much better threadsafety testing. We will work on ways to validate threadsafe initialization programmatically -- a simple test using multiple threads with an intentionally long init routine would have caught this error since the init calls would not have serialized.
- Loading branch information