You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
I use OmniPerf to profile the execution of Stable Diffusion XL (SDXL) on MI300X, where a single matmul_transpose_b kernel is executed 180 times. My focus is on the performance behavior of this matmul_transpose_b kernel. However, when I tried to filter by kernel and dispatch, I noticed some inconsistencies. Please check the below snapshots for the details, in which you can see the difference in the reported L2 cache hit rate.
Development Environment:
Linux Distribution: [Ubuntu 22.04.2 LTS]
Omniperf Version: [ 2.0.1 (release)]
GPU: [ MI300X]
Custer (if applicable): [e.g. Crusher, ]
To Reproduce
Steps to reproduce the behavior:
Maybe just found one application that will run the same kernel many times on GPUs, and then check the difference between filtering by dispatch and kernel
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered:
Thanks @bangtianliu. For the record, I've tried reproducing this issue on an MI250 with the latest version of Omniperf (e.g. dev) and could not find the issue. The next step in this ticket would be to try reproducing on an MI300X.
Describe the bug
I use OmniPerf to profile the execution of Stable Diffusion XL (SDXL) on MI300X, where a single matmul_transpose_b kernel is executed 180 times. My focus is on the performance behavior of this matmul_transpose_b kernel. However, when I tried to filter by kernel and dispatch, I noticed some inconsistencies. Please check the below snapshots for the details, in which you can see the difference in the reported L2 cache hit rate.
Development Environment:
To Reproduce
Steps to reproduce the behavior:
Maybe just found one application that will run the same kernel many times on GPUs, and then check the difference between filtering by dispatch and kernel
Expected behavior
A clear and concise description of what you expected to happen.
Screenshots
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: