Skip to content

Commit

Permalink
Merge back changes from ROCm 6.1 (#448)
Browse files Browse the repository at this point in the history
* Bulk fast-forward merge develop to 6.1 staging branch (#414)

* Hard coded path (/opt/rocm-version) references removed from cmake files (#407)

* Remove codecov.groovy (#402)

* readme and changelog updates (#410)

* readme and changelog updates

* Update README.md

Co-authored-by: Saad Rahim (AMD) <[email protected]>

---------

Co-authored-by: Saad Rahim (AMD) <[email protected]>

---------

Co-authored-by: arvindcheru <[email protected]>
Co-authored-by: Sam Wu <[email protected]>
Co-authored-by: Lisa <[email protected]>
Co-authored-by: Saad Rahim (AMD) <[email protected]>

* Add a workaround for regressions in XORWOW and LFSR on MI200 (#432)

---------

Co-authored-by: Stanley Tsang <[email protected]>
Co-authored-by: arvindcheru <[email protected]>
Co-authored-by: Sam Wu <[email protected]>
Co-authored-by: Lisa <[email protected]>
Co-authored-by: Saad Rahim (AMD) <[email protected]>
Co-authored-by: Anton Gorenko <[email protected]>
  • Loading branch information
7 people authored Feb 21, 2024
1 parent 909c1c7 commit d26de79
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 4 deletions.
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
Documentation for rocRAND is available at
[https://rocm.docs.amd.com/projects/rocRAND/en/latest/](https://rocm.docs.amd.com/projects/rocRAND/en/latest/)

## (Unreleased) rocRAND-3.0.0 for ROCm 6.0.0
## rocRAND-3.0.0 for ROCm 6.0.0

### Changes

Expand Down
1 change: 0 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# rocRAND

The rocRAND project provides functions that generate pseudorandom and quasirandom numbers.

The rocRAND library is implemented in the [HIP](https://github.com/ROCm/HIP)
programming language and optimized for AMD's latest discrete GPUs. It is designed to run on top
of AMD's [ROCm](https://rocm.docs.amd.com) runtime, but it also works on CUDA-enabled GPUs.
Expand Down
9 changes: 8 additions & 1 deletion library/src/rng/lfsr113.hpp
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// Copyright (c) 2022-2023 Advanced Micro Devices, Inc. All rights reserved.
// Copyright (c) 2022-2024 Advanced Micro Devices, Inc. All rights reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
Expand Down Expand Up @@ -85,6 +85,13 @@ ROCRAND_KERNEL __launch_bounds__(ROCRAND_DEFAULT_MAX_BLOCK_SIZE) void generate_k

distribution(input, output);

#if defined(__gfx90a__)
// Workaround: The compiler hoists s_waitcnt vmcnt(..) out of the loops.
// For some reason this optimization decreases performance of uniform distributions
// on MI200. MI100 and MI300 are not affected.
// Here we add s_waitcnt vmcnt(0)
__builtin_amdgcn_s_waitcnt(/*vmcnt*/ 0 | (/*exp_cnt*/ 0x7 << 4) | (/*lgkmcnt*/ 0xf << 8));
#endif
vec_data[index] = *reinterpret_cast<vec_type*>(output);
index += stride;
}
Expand Down
10 changes: 9 additions & 1 deletion library/src/rng/xorwow.hpp
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// Copyright (c) 2017-2023 Advanced Micro Devices, Inc. All rights reserved.
// Copyright (c) 2017-2024 Advanced Micro Devices, Inc. All rights reserved.
//
// Permission is hereby granted, free of charge, to any person obtaining a copy
// of this software and associated documentation files (the "Software"), to deal
Expand Down Expand Up @@ -89,6 +89,14 @@ namespace detail {
}
distribution(input, output);

#if defined(__gfx90a__)
// Workaround: The compiler hoists s_waitcnt vmcnt(..) out of the loops.
// For some reason this optimization decreases performance of uniform distributions
// on MI200. MI100 and MI300 are not affected.
// Here we add s_waitcnt vmcnt(0)
__builtin_amdgcn_s_waitcnt(/*vmcnt*/ 0 | (/*exp_cnt*/ 0x7 << 4)
| (/*lgkmcnt*/ 0xf << 8));
#endif
vec_data[index] = *reinterpret_cast<vec_type *>(output);
// Next position
index += stride;
Expand Down

0 comments on commit d26de79

Please sign in to comment.