Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update CHANGELOG.md #149

Open
wants to merge 5 commits into
base: amd-staging
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 44 additions & 3 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -335,9 +335,50 @@ Example for file plugin output:

- pcsampler sample code has been removed due to deprecation from v2.

## ROCprofiler for rocm 6.3
## ROCProfiler for ROCm 6.3
bgopesh marked this conversation as resolved.
Show resolved Hide resolved

### Added

- JSON output plugin for rocprofv2, the JSON file matches Google Trace Format, so it should be easily loaded to perfetto, chrome tracing or speedscope. For Speedscope, `--disable-json-data-flows` option will be needed as speedscope doesn't work with data flows.
- Added `--no-serialization` flag to disable kernel serialization when rocprofv2 is in counter-collection mode. This change was added to allow rocprofv2 to avoid deadlock when profiling certain programs in counter-collection mode.
- JSON output plugin for `rocprofv2`. The JSON file matches Google Trace Format making it easy to load on Perfetto, Chrome tracing, or Speedscope. For Speedscope, use `--disable-json-data-flows` option as speedscope doesn't work with data flows.
- `--no-serialization` flag to disable kernel serialization when `rocprofv2` is in counter collection mode. This allows `rocprofv2` to avoid deadlock when profiling certain programs in counter collection mode.
- `FP64_ACTIVE` and `ENGINE_ACTIVE`
bgopesh marked this conversation as resolved.
Show resolved Hide resolved
- New HIP APIs with struct defined inside union.
- Early checks for ELF file
bgopesh marked this conversation as resolved.
Show resolved Hide resolved
- Support for kernel name filtering in `rocprofv2`
- Barrier bit to read and stop packets
- ROCProfiler support for gfx1150 and gfx1151
- ATT support for gfx12
- gfx12 support

### Changed

- Extended lifetime for proxy queues
- Setting the `trace-start` option for `rocprof` to `off` now disables kernel tracing
- Load `libpciaccess-dev` functions with `dlopen`
- Initialize `PcieAccessApi*` api and `void* libpciaccess_handle` to `nullptr`

### Removed

- Extra licenses
bgopesh marked this conversation as resolved.
Show resolved Hide resolved
- `libsystemd-dev` from `CMakeLists.txt`

### Optimized

- ROCProfiler Performance improved to reduce profiling time for large workloads of counter collection

### Resolved issues

- Fixed bandwidth measurement in MI300
- Fix for `s_delay_alu` followed by `s_waitcnt` on gfx11
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgopesh What was the exact issue?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ApoKalipse-V Can you please help Swati with a little more detailed description?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s_delay_alu doesnt generate tokens, so if we had s_delay_alu followed by s_waitcnt, the attribution would be incorrect

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, i`m not sure we should comment on it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, maybe we can generalize by saying Fixing token parsing in ATT

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bgopesh @ApoKalipse-V So, can I write: Fixed token parsing in ATT for gfx11 where s_delay_alu is followed by s_waitcnt ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that also reveals some details: I would recommend: Fixed token parsing in ATT for gfx11

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not say anything?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I will remove it.

- Fixed Perfetto plugin issue of `roctx` trace not getting displayed
- Fixed `--help` for counter collection
- Fixed signal management issues in `queue.cpp`
- Fixed Perfetto tracks for multi-GPU
- Fixed `rocsys`
bgopesh marked this conversation as resolved.
Show resolved Hide resolved
- Fixed incorrect number of columns in the CSV file
bgopesh marked this conversation as resolved.
Show resolved Hide resolved
- Fixed the ROCProfiler hang issue when running kernel trace, thread trace, or counter collection on Iree benchmark for AMD Instinct MI300 accelerator
- Fixed build errors thrown during parsing of unions
- Fixed the system hang caused while running `--kernel-trace` with Perfetto for certain applications.
- Fixed missing profiler records issue caused while running `--trace-period`
- Fixed the hang issue of `ProfilerAPITest` of `runFeatureTests` on AMD Instinct MI300 accelerator
- Fixed segment fault on Navi32