Merge back changes from ROCm 6.1 (#448)

* Bulk fast-forward merge develop to 6.1 staging branch (#414) * Hard coded path (/opt/rocm-version) references removed from cmake files (#407) * Remove codecov.groovy (#402) * readme and changelog updates (#410) * readme and changelog updates * Update README.md Co-authored-by: Saad Rahim (AMD) <[email protected]> --------- Co-authored-by: Saad Rahim (AMD) <[email protected]> --------- Co-authored-by: arvindcheru <[email protected]> Co-authored-by: Sam Wu <[email protected]> Co-authored-by: Lisa <[email protected]> Co-authored-by: Saad Rahim (AMD) <[email protected]> * Add a workaround for regressions in XORWOW and LFSR on MI200 (#432) --------- Co-authored-by: Stanley Tsang <[email protected]> Co-authored-by: arvindcheru <[email protected]> Co-authored-by: Sam Wu <[email protected]> Co-authored-by: Lisa <[email protected]> Co-authored-by: Saad Rahim (AMD) <[email protected]> Co-authored-by: Anton Gorenko <[email protected]>
ROCm · Feb 21, 2024 · d26de79 · d26de79
1 parent 909c1c7
commit d26de79
Show file tree

Hide file tree

Showing 4 changed files with 18 additions and 4 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,7 +3,7 @@
 Documentation for rocRAND is available at
 [https://rocm.docs.amd.com/projects/rocRAND/en/latest/](https://rocm.docs.amd.com/projects/rocRAND/en/latest/)
 
-## (Unreleased) rocRAND-3.0.0 for ROCm 6.0.0
+## rocRAND-3.0.0 for ROCm 6.0.0
 
 ### Changes
 

diff --git a/README.md b/README.md
@@ -1,7 +1,6 @@
 # rocRAND
 
 The rocRAND project provides functions that generate pseudorandom and quasirandom numbers.
-
 The rocRAND library is implemented in the [HIP](https://github.com/ROCm/HIP)
 programming language and optimized for AMD's latest discrete GPUs. It is designed to run on top
 of AMD's [ROCm](https://rocm.docs.amd.com) runtime, but it also works on CUDA-enabled GPUs.

diff --git a/library/src/rng/lfsr113.hpp b/library/src/rng/lfsr113.hpp
@@ -1,4 +1,4 @@
-// Copyright (c) 2022-2023 Advanced Micro Devices, Inc. All rights reserved.
+// Copyright (c) 2022-2024 Advanced Micro Devices, Inc. All rights reserved.
 //
 // Permission is hereby granted, free of charge, to any person obtaining a copy
 // of this software and associated documentation files (the "Software"), to deal
@@ -85,6 +85,13 @@ ROCRAND_KERNEL __launch_bounds__(ROCRAND_DEFAULT_MAX_BLOCK_SIZE) void generate_k
 
         distribution(input, output);
 
+#if defined(__gfx90a__)
+        // Workaround: The compiler hoists s_waitcnt vmcnt(..) out of the loops.
+        // For some reason this optimization decreases performance of uniform distributions
+        // on MI200. MI100 and MI300 are not affected.
+        // Here we add s_waitcnt vmcnt(0)
+        __builtin_amdgcn_s_waitcnt(/*vmcnt*/ 0 | (/*exp_cnt*/ 0x7 << 4) | (/*lgkmcnt*/ 0xf << 8));
+#endif
         vec_data[index] = *reinterpret_cast<vec_type*>(output);
         index += stride;
     }

diff --git a/library/src/rng/xorwow.hpp b/library/src/rng/xorwow.hpp
@@ -1,4 +1,4 @@
-// Copyright (c) 2017-2023 Advanced Micro Devices, Inc. All rights reserved.
+// Copyright (c) 2017-2024 Advanced Micro Devices, Inc. All rights reserved.
 //
 // Permission is hereby granted, free of charge, to any person obtaining a copy
 // of this software and associated documentation files (the "Software"), to deal
@@ -89,6 +89,14 @@ namespace detail {
             }
             distribution(input, output);
 
+#if defined(__gfx90a__)
+            // Workaround: The compiler hoists s_waitcnt vmcnt(..) out of the loops.
+            // For some reason this optimization decreases performance of uniform distributions
+            // on MI200. MI100 and MI300 are not affected.
+            // Here we add s_waitcnt vmcnt(0)
+            __builtin_amdgcn_s_waitcnt(/*vmcnt*/ 0 | (/*exp_cnt*/ 0x7 << 4)
+                                       | (/*lgkmcnt*/ 0xf << 8));
+#endif
             vec_data[index] = *reinterpret_cast<vec_type *>(output);
             // Next position
             index += stride;