Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(nvidia): Add cupti support #323

Merged
merged 1 commit into from
Oct 22, 2024
Merged

feat(nvidia): Add cupti support #323

merged 1 commit into from
Oct 22, 2024

Conversation

cvonelm
Copy link
Member

@cvonelm cvonelm commented Mar 11, 2024

This commit adds a --nvidia option, which injects a library into the program under measurement, which records entry and exit into CUDA kernels via CUPTI

We might think about bumping the CMake requirement to 3.24 with this version, as older FindCUDAToolkit.cmake fail to correctly detect CUPTI headers[1].

This implements #294

[1] https://gitlab.kitware.com/cmake/cmake/-/merge_requests/7608

CMakeLists.txt Show resolved Hide resolved
include/lo2s/monitor/process_monitor_main.hpp.in Outdated Show resolved Hide resolved
src/config.cpp Outdated Show resolved Hide resolved
src/config.cpp Show resolved Hide resolved
src/cupti/lib.cpp Outdated Show resolved Hide resolved
include/lo2s/cupti/ringbuf.hpp Outdated Show resolved Hide resolved
include/lo2s/cupti/ringbuf.hpp Outdated Show resolved Hide resolved
include/lo2s/cupti/ringbuf.hpp Outdated Show resolved Hide resolved
include/lo2s/cupti/ringbuf.hpp Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file need comments. I have no idea what is happening here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document, what the functions bufferRequested, bufferCompleted, and callbackHandler do and when they will be called from cupti.

Copy link
Member

@bmario bmario left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After disccussing the ringbuffer a bit with @tilsche, we think it would be better to not let individual events split up on the wrap up, thus removing the need for all the mallocs, tracking thereof and simplifying the code at the same time. Also, fill shouldn't be necessary.

include/lo2s/cupti/ringbuf.hpp Outdated Show resolved Hide resolved
Copy link
Member

@bmario bmario left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have these comments still pending. Don't know if they still apply.

CMakeLists.txt Show resolved Hide resolved
include/lo2s/bfd_resolve.hpp Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
include/lo2s/config.hpp Outdated Show resolved Hide resolved
include/lo2s/monitor/poll_monitor.hpp Outdated Show resolved Hide resolved
src/config.cpp Outdated Show resolved Hide resolved
src/config.cpp Outdated Show resolved Hide resolved
src/cupti/lib.cpp Show resolved Hide resolved
src/cupti/lib.cpp Outdated Show resolved Hide resolved
@cvonelm cvonelm force-pushed the issue-294-cupti branch 2 times, most recently from f5b30d8 to 3701440 Compare July 26, 2024 13:54
src/cupti/lib.cpp Outdated Show resolved Hide resolved
include/lo2s/ringbuf.hpp Show resolved Hide resolved
include/lo2s/ringbuf.hpp Show resolved Hide resolved
include/lo2s/ringbuf.hpp Show resolved Hide resolved
include/lo2s/ringbuf.hpp Outdated Show resolved Hide resolved
include/lo2s/ringbuf.hpp Outdated Show resolved Hide resolved
Copy link
Member

@bmario bmario left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I blame vscode

@cvonelm cvonelm force-pushed the issue-294-cupti branch 2 times, most recently from 08da629 to 1bddf27 Compare August 2, 2024 08:24
@cvonelm cvonelm force-pushed the issue-294-cupti branch 2 times, most recently from 732d27c to c950a74 Compare August 16, 2024 08:40
CMakeLists.txt Outdated Show resolved Hide resolved
src/cupti/lib.cpp Outdated Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document, what the functions bufferRequested, bufferCompleted, and callbackHandler do and when they will be called from cupti.

src/monitor/process_monitor_main.cpp Outdated Show resolved Hide resolved
src/trace/trace.cpp Outdated Show resolved Hide resolved
include/lo2s/shared_memory.hpp Outdated Show resolved Hide resolved
include/lo2s/shared_memory.hpp Outdated Show resolved Hide resolved
include/lo2s/shared_memory.hpp Outdated Show resolved Hide resolved
include/lo2s/shared_memory.hpp Outdated Show resolved Hide resolved
include/lo2s/ringbuf.hpp Outdated Show resolved Hide resolved
@cvonelm cvonelm force-pushed the issue-294-cupti branch 3 times, most recently from 7884ac2 to 4485359 Compare August 21, 2024 13:04
include/lo2s/ringbuf.hpp Show resolved Hide resolved
include/lo2s/ringbuf.hpp Show resolved Hide resolved
include/lo2s/ringbuf.hpp Show resolved Hide resolved
src/monitor/process_monitor_main.cpp Outdated Show resolved Hide resolved
This commit adds a --nvidia option, which injects a library into the program under measurement, which records entry and exit into CUDA kernels via CUPTI
@cvonelm cvonelm merged commit 975d2d8 into master Oct 22, 2024
35 of 36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants