-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mo/8223 fd2 dispatch core profiler support #8609
Merged
mo-tenstorrent
merged 1 commit into
main
from
mo/8223_FD2_dispatch_core_profiler_support_2
Jun 5, 2024
Merged
Mo/8223 fd2 dispatch core profiler support #8609
mo-tenstorrent
merged 1 commit into
main
from
mo/8223_FD2_dispatch_core_profiler_support_2
Jun 5, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
mo-tenstorrent
force-pushed
the
mo/8223_FD2_dispatch_core_profiler_support_2
branch
4 times, most recently
from
May 28, 2024 22:32
49ec429
to
5a43477
Compare
mo-tenstorrent
force-pushed
the
mo/8223_FD2_dispatch_core_profiler_support_2
branch
5 times, most recently
from
June 5, 2024 14:27
4280a0f
to
a263835
Compare
pgkeller
approved these changes
Jun 5, 2024
tt-aho
approved these changes
Jun 5, 2024
TT-BrianLiu
approved these changes
Jun 5, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments.
tt_metal/programming_examples/profiler/test_dispatch_cores/test_dispatch_cores.cpp
Show resolved
Hide resolved
tt_metal/programming_examples/profiler/test_dispatch_cores/test_dispatch_cores.cpp
Outdated
Show resolved
Hide resolved
aliuTT
approved these changes
Jun 5, 2024
mo-tenstorrent
force-pushed
the
mo/8223_FD2_dispatch_core_profiler_support_2
branch
from
June 5, 2024 14:54
a263835
to
809f5c5
Compare
Dispatch kernels can be profiled using the DeviceZoneScopedND( name , nocBuffer, nocIndex ) macro. noc Buffer and index are globals to dispatch and prefetch kernels. Dispatch profiling is disabled by default to avoid the overhead. It is enabled by env var `TT_METAL_DEVICE_PROFILER_DISPATCH=1`
mo-tenstorrent
force-pushed
the
mo/8223_FD2_dispatch_core_profiler_support_2
branch
from
June 5, 2024 16:06
809f5c5
to
181458d
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This brings profiling dispatch cores.
Both
cq_prefetch
andcq_dispatch
can now be profiled with a stack of parent and child functions.DeviceZoneScopedND( name , nocBuffer, nocIndex )
macro is dedicated to dispatch core profiling. Noc Buffer andindex are global to dispatch and prefetch kernels that need to be passed.
e.g.
The main while loops of prefetcher and dispatcher are committed with the profiling macro.
Dispatch profiling is disabled by default to avoid the overhead. It is enabled by env var
TT_METAL_DEVICE_PROFILER_DISPATCH=1
.Because dispatch cores have much more activity, their profiling overhead can add up and slow the entire model run down.
Dispatch kernel now runs on NCRISC, this brought the requirement for providing profiler push to DRAM for NCRISC as well.
For a much more efficient usage of the NOC,
quick_send
was introduced that pushes L1 data to DRAM when profiler L1 buffer is full. This allowed for about 100 iterations of the dispatch loops to happen before a costly L1 to DRAM NOC transactions.quick_send
is marked in tracy with red as shown below,Green CI
Post Commit: https://github.com/tenstorrent/tt-metal/actions/runs/9375214697
T3K Profiler: https://github.com/tenstorrent/tt-metal/actions/runs/9386432350
Device Perf: https://github.com/tenstorrent/tt-metal/actions/runs/9375217111
uBenchmark: https://github.com/tenstorrent/tt-metal/actions/runs/9375219316