From ef4ec607b532e9f468dfbc575be1e0d0f236d083 Mon Sep 17 00:00:00 2001 From: srawat <120587655+SwRaw@users.noreply.github.com> Date: Mon, 11 Nov 2024 19:14:45 +0530 Subject: [PATCH 1/5] Update CHANGELOG.md --- CHANGELOG.md | 47 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 44 insertions(+), 3 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 34c0b01..02d1c1e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -335,9 +335,50 @@ Example for file plugin output: - pcsampler sample code has been removed due to deprecation from v2. -## ROCprofiler for rocm 6.3 +## ROCProfiler for ROCm 6.3 ### Added -- JSON output plugin for rocprofv2, the JSON file matches Google Trace Format, so it should be easily loaded to perfetto, chrome tracing or speedscope. For Speedscope, `--disable-json-data-flows` option will be needed as speedscope doesn't work with data flows. -- Added `--no-serialization` flag to disable kernel serialization when rocprofv2 is in counter-collection mode. This change was added to allow rocprofv2 to avoid deadlock when profiling certain programs in counter-collection mode. \ No newline at end of file +- JSON output plugin for `rocprofv2`. The JSON file matches Google Trace Format making it easy to load on Perfetto, Chrome tracing, or Speedscope. For Speedscope, use `--disable-json-data-flows` option as speedscope doesn't work with data flows. +- `--no-serialization` flag to disable kernel serialization when `rocprofv2` is in counter collection mode. This allows `rocprofv2` to avoid deadlock when profiling certain programs in counter collection mode. +- `FP64_ACTIVE` and `ENGINE_ACTIVE` +- New HIP APIs with struct defined inside union. +- Early checks for ELF file +- Support for kernel name filtering in `rocprofv2` +- Barrier bit to read and stop packets +- ROCProfiler support for gfx1150 and gfx1151 +- ATT support for gfx12 +- gfx12 support + +### Changed + +- Extended lifetime for proxy queues +- Setting the `trace-start` option for `rocprof` to `off` now disables kernel tracing +- Load `libpciaccess-dev` functions with `dlopen` +- Initialize `PcieAccessApi*` api and `void* libpciaccess_handle` to `nullptr` + +### Removed + +- Extra licenses +- `libsystemd-dev` from `CMakeLists.txt` + +### Optimized + +- ROCProfiler Performance improved to reduce profiling time for large workloads of counter collection + +### Resolved issues + +- Fixed bandwidth measurement in MI300 +- Fix for `s_delay_alu` followed by `s_waitcnt` on gfx11 +- Fixed Perfetto plugin issue of `roctx` trace not getting displayed +- Fixed `--help` for counter collection +- Fixed signal management issues in `queue.cpp` +- Fixed Perfetto tracks for multi-GPU +- Fixed `rocsys` +- Fixed incorrect number of columns in the CSV file +- Fixed the ROCProfiler hang issue when running kernel trace, thread trace, or counter collection on Iree benchmark for AMD Instinct MI300 accelerator +- Fixed build errors thrown during parsing of unions +- Fixed the system hang caused while running `--kernel-trace` with Perfetto for certain applications. +- Fixed missing profiler records issue caused while running `--trace-period` +- Fixed the hang issue of `ProfilerAPITest` of `runFeatureTests` on AMD Instinct MI300 accelerator +- Fixed segment fault on Navi32 From 586da2c48d337c0527585ace0e6ef2a2ea26ef82 Mon Sep 17 00:00:00 2001 From: srawat <120587655+SwRaw@users.noreply.github.com> Date: Mon, 11 Nov 2024 19:27:30 +0530 Subject: [PATCH 2/5] Update CHANGELOG.md --- CHANGELOG.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 02d1c1e..e06f776 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -354,8 +354,8 @@ Example for file plugin output: - Extended lifetime for proxy queues - Setting the `trace-start` option for `rocprof` to `off` now disables kernel tracing -- Load `libpciaccess-dev` functions with `dlopen` -- Initialize `PcieAccessApi*` api and `void* libpciaccess_handle` to `nullptr` +- `libpciaccess-dev` functions now load with `dlopen` +- `PcieAccessApi*` api and `void* libpciaccess_handle` are now initialized to `nullptr` ### Removed @@ -369,7 +369,7 @@ Example for file plugin output: ### Resolved issues - Fixed bandwidth measurement in MI300 -- Fix for `s_delay_alu` followed by `s_waitcnt` on gfx11 +- Fix implemented for `s_delay_alu` followed by `s_waitcnt` on gfx11 - Fixed Perfetto plugin issue of `roctx` trace not getting displayed - Fixed `--help` for counter collection - Fixed signal management issues in `queue.cpp` From a596f083833aca7b9ead1031c4a9ef85f0427b18 Mon Sep 17 00:00:00 2001 From: srawat <120587655+SwRaw@users.noreply.github.com> Date: Thu, 14 Nov 2024 15:55:34 +0530 Subject: [PATCH 3/5] Update CHANGELOG.md --- CHANGELOG.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index e06f776..ee7f6f7 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -341,9 +341,9 @@ Example for file plugin output: - JSON output plugin for `rocprofv2`. The JSON file matches Google Trace Format making it easy to load on Perfetto, Chrome tracing, or Speedscope. For Speedscope, use `--disable-json-data-flows` option as speedscope doesn't work with data flows. - `--no-serialization` flag to disable kernel serialization when `rocprofv2` is in counter collection mode. This allows `rocprofv2` to avoid deadlock when profiling certain programs in counter collection mode. -- `FP64_ACTIVE` and `ENGINE_ACTIVE` +- `FP64_ACTIVE` and `ENGINE_ACTIVE` metrics to AMD Instinct MI300 accelerator - New HIP APIs with struct defined inside union. -- Early checks for ELF file +- Early checks to confirm the eligibility of ELF file in ATT plugin - Support for kernel name filtering in `rocprofv2` - Barrier bit to read and stop packets - ROCProfiler support for gfx1150 and gfx1151 @@ -359,7 +359,7 @@ Example for file plugin output: ### Removed -- Extra licenses +- Obsolete BSD and GPL licenses - `libsystemd-dev` from `CMakeLists.txt` ### Optimized From e2894051c4d642b101e82e4bf357cbcc7171d3c5 Mon Sep 17 00:00:00 2001 From: srawat <120587655+SwRaw@users.noreply.github.com> Date: Fri, 15 Nov 2024 21:39:54 +0530 Subject: [PATCH 4/5] Update CHANGELOG.md --- CHANGELOG.md | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index ee7f6f7..05c48a9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -369,16 +369,15 @@ Example for file plugin output: ### Resolved issues - Fixed bandwidth measurement in MI300 -- Fix implemented for `s_delay_alu` followed by `s_waitcnt` on gfx11 - Fixed Perfetto plugin issue of `roctx` trace not getting displayed - Fixed `--help` for counter collection - Fixed signal management issues in `queue.cpp` - Fixed Perfetto tracks for multi-GPU -- Fixed `rocsys` -- Fixed incorrect number of columns in the CSV file +- Fixed Perfetto plugin usage with `rocsys` +- Fixed incorrect number of columns in the output CSV files for counter collection and kernel tracing - Fixed the ROCProfiler hang issue when running kernel trace, thread trace, or counter collection on Iree benchmark for AMD Instinct MI300 accelerator - Fixed build errors thrown during parsing of unions -- Fixed the system hang caused while running `--kernel-trace` with Perfetto for certain applications. +- Fixed the system hang caused while running `--kernel-trace` with Perfetto for certain applications - Fixed missing profiler records issue caused while running `--trace-period` - Fixed the hang issue of `ProfilerAPITest` of `runFeatureTests` on AMD Instinct MI300 accelerator - Fixed segment fault on Navi32 From 04daffb1af510ed750fc10b43de6dca617047b21 Mon Sep 17 00:00:00 2001 From: srawat <120587655+SwRaw@users.noreply.github.com> Date: Fri, 15 Nov 2024 22:02:47 +0530 Subject: [PATCH 5/5] Update CHANGELOG.md --- CHANGELOG.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 05c48a9..4b101d0 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -341,7 +341,7 @@ Example for file plugin output: - JSON output plugin for `rocprofv2`. The JSON file matches Google Trace Format making it easy to load on Perfetto, Chrome tracing, or Speedscope. For Speedscope, use `--disable-json-data-flows` option as speedscope doesn't work with data flows. - `--no-serialization` flag to disable kernel serialization when `rocprofv2` is in counter collection mode. This allows `rocprofv2` to avoid deadlock when profiling certain programs in counter collection mode. -- `FP64_ACTIVE` and `ENGINE_ACTIVE` metrics to AMD Instinct MI300 accelerator +- `FP64_ACTIVE` and `ENGINE_ACTIVE` metrics to AMD Instinct MI300 accelerator - New HIP APIs with struct defined inside union. - Early checks to confirm the eligibility of ELF file in ATT plugin - Support for kernel name filtering in `rocprofv2` @@ -369,15 +369,15 @@ Example for file plugin output: ### Resolved issues - Fixed bandwidth measurement in MI300 -- Fixed Perfetto plugin issue of `roctx` trace not getting displayed -- Fixed `--help` for counter collection -- Fixed signal management issues in `queue.cpp` -- Fixed Perfetto tracks for multi-GPU -- Fixed Perfetto plugin usage with `rocsys` -- Fixed incorrect number of columns in the output CSV files for counter collection and kernel tracing -- Fixed the ROCProfiler hang issue when running kernel trace, thread trace, or counter collection on Iree benchmark for AMD Instinct MI300 accelerator -- Fixed build errors thrown during parsing of unions -- Fixed the system hang caused while running `--kernel-trace` with Perfetto for certain applications +- Fixed Perfetto plugin issue of `roctx` trace not getting displayed +- Fixed `--help` for counter collection +- Fixed signal management issues in `queue.cpp` +- Fixed Perfetto tracks for multi-GPU +- Fixed Perfetto plugin usage with `rocsys` +- Fixed incorrect number of columns in the output CSV files for counter collection and kernel tracing +- Fixed the ROCProfiler hang issue when running kernel trace, thread trace, or counter collection on Iree benchmark for AMD Instinct MI300 accelerator +- Fixed build errors thrown during parsing of unions +- Fixed the system hang caused while running `--kernel-trace` with Perfetto for certain applications - Fixed missing profiler records issue caused while running `--trace-period` -- Fixed the hang issue of `ProfilerAPITest` of `runFeatureTests` on AMD Instinct MI300 accelerator +- Fixed the hang issue of `ProfilerAPITest` of `runFeatureTests` on AMD Instinct MI300 accelerator - Fixed segment fault on Navi32