Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement NVTXW export #54

Merged
merged 2 commits into from
Oct 10, 2024

Conversation

evanramos-nvidia
Copy link
Contributor

This PR contains the second iteration of our NVTXW export feature. The new code looks significantly different than before, keeping in mind the feedback you've given so far. We may still revise some details of these changes, particularly where the Rust code is perhaps too C-like, but overall it is ready for review. It is at a point where getting the code reviewed by someone with more Rust experience will help inform us how we want to finalize it.

I've marked this PR as a draft while we determine where we will put the crate. For review and CI purposes, I've set it up in the subdir crate-tmp, but this is temporary. If you have any feedback to offer on the crate in addition to the Legion Prof Viewer code itself, we would definitely appreciate that, since we are new to Rust and still learning.

Copy link
Contributor

@elliottslaughter elliottslaughter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't had time to look at the NVTXW code yet, but here are my preliminary thoughts on the part of this that integrates into the prof-viewer code.

Overall this is dramatically better, thanks for doing this work. I appreciate that you work with the existing abstractions rather than working around them. There are a handful of places where I think it would be better to avoid so much hard-coding, but this is hopefully a relatively minor change from what you've got now.

There is one structural change you might consider, which I left a comment about in the source code. At the moment, you essentially read the entire data set into memory, interleave it into one massive table, and then de-interleave it as you write it back out. First of all this uses memory proportional to the size of the profile. Second, it just seems like a lot of work to do this (and probably inefficient). I think that an approach in which you match tiles with their corresponding meta tile, and then write that as a unit, would probably be more efficient. As a bonus it would avoid needing to have so many streams open on the NVTXW side. I think you can do this with fixed memory usage while still allowing a degree of parallelism by configuring the number of outstanding calls you allow.

src/nvtxw.rs Outdated Show resolved Hide resolved
src/nvtxw.rs Outdated Show resolved Hide resolved
src/nvtxw.rs Outdated Show resolved Hide resolved
src/nvtxw.rs Outdated Show resolved Hide resolved
src/nvtxw.rs Outdated Show resolved Hide resolved
src/nvtxw.rs Outdated Show resolved Hide resolved
src/nvtxw.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@elliottslaughter elliottslaughter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments on the NVTXW code itself.

crate-tmp/src/nvtxw.rs Outdated Show resolved Hide resolved
crate-tmp/src/nvtxw.rs Outdated Show resolved Hide resolved
crate-tmp/src/nvtxw.rs Outdated Show resolved Hide resolved
@evanramos-nvidia
Copy link
Contributor Author

Updated to fix introduced clippy diagnostics

@elliottslaughter
Copy link
Contributor

I'm on vacation but I will hit the "run" button as long as you keep pushing changes.

I'll get back to you with a review on August 26.

src/nvtxw.rs Outdated Show resolved Hide resolved
src/nvtxw.rs Outdated Show resolved Hide resolved
src/nvtxw.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@elliottslaughter elliottslaughter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the Rust NVTXW crate.

Overall my highest level comment is that you may want to think about exactly how high- or low-level you want this interface to be. The current interface is serviceable and does, strictly speaking, remove unsafe from user code. But the code that users have to write is still a bit questionable: ideally, users wouldn't have to think that hard about struct packing, CString, and raw pointers at all. So this may be fine for now, but just recognize that as written it's not a great example of best-in-class Rust API wrappers for C.

I added some minor code comments to improve quality in various ways throughout the code, and suggested some more idiomatic practices.

crate-tmp/Cargo.toml Outdated Show resolved Hide resolved
crate-tmp/src/nvtxw.rs Outdated Show resolved Hide resolved
crate-tmp/src/nvtxw.rs Outdated Show resolved Hide resolved
crate-tmp/src/nvtxw.rs Outdated Show resolved Hide resolved
crate-tmp/src/nvtxw.rs Outdated Show resolved Hide resolved
crate-tmp/src/nvtxw.rs Outdated Show resolved Hide resolved
src/nvtxw.rs Show resolved Hide resolved
crate-tmp/src/nvtxw.rs Outdated Show resolved Hide resolved
@elliottslaughter
Copy link
Contributor

Some other things you might think about for NVTXW, which I did not specifically address:

  • Can you add tests for NVTXW? Rust has a pretty good test infrastructure, and you can write both unit and integration tests in the crate (which will both be run by cargo test). Here's the documentation on how to do this: https://doc.rust-lang.org/book/ch11-01-writing-tests.html
  • Once you have tests it's considered a best practice to run your code through Miri, which is sort of like asan/ubsan for Rust. This matters because of the unsafe code in the crate; otherwise Rust's type system would make it unnecessary to do anything of the sort: https://github.com/rust-lang/miri
  • If you were to deploy this in a standalone repo, you would want to copy the test infrastructure inside .github/workflows/rust.yml (or implement something like it). It has a bunch of standard CI jobs to check the build, tests, format, lints, etc.

@evanramos-nvidia
Copy link
Contributor Author

Thanks for your feedback, it's really helpful both to make the code better and to build my Rust skills.

I've updated the branch to address some items, and I will continue with the rest and update it further. Right now, I am particularly interested in feedback on my revised implementation of tile matching, using a single BTreeMap where the value is a pair of Options.

src/nvtxw.rs Outdated Show resolved Hide resolved
@evanramos-nvidia
Copy link
Contributor Author

The CI is now failing with

Clippy: nvtxw_bindings.rs#L3
use of unstable library feature 'offset_of'

The only thing I can think of that might have introduced this is specifying the version of bindgen in the crate's Cargo.toml.

@elliottslaughter
Copy link
Contributor

Try reverting to bindgen 0.69? That seems to be what you used in the last successful build:

https://github.com/StanfordLegion/prof-viewer/actions/runs/10411469581/job/28835561058

@evanramos-nvidia evanramos-nvidia force-pushed the nvtxw branch 3 times, most recently from d6a8e73 to 6d41688 Compare September 5, 2024 22:42
src/nvtxw.rs Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
@evanramos-nvidia evanramos-nvidia force-pushed the nvtxw branch 2 times, most recently from 947b741 to 8ed87b9 Compare September 10, 2024 23:53
@evanramos-nvidia
Copy link
Contributor Author

@elliottslaughter I've fixed the build on macOS. Thanks for bringing that to my attention.

@evanramos-nvidia
Copy link
Contributor Author

I've implemented this request.

@evanramos-nvidia evanramos-nvidia force-pushed the nvtxw branch 3 times, most recently from 84c51df to 3726193 Compare September 16, 2024 23:26
@evanramos-nvidia
Copy link
Contributor Author

@elliottslaughter

In order to get this into the release, here's what we need to do:

Other than the discussion about the enums in the NVTXW crate, I believe I have addressed all review feedback. Once I have resolved it and have your approval, I am prepared to proceed with the remaining steps for publishing the crate and updating the branches. We should be able to complete this within the week, in time for your merge window.

@evanramos-nvidia evanramos-nvidia force-pushed the nvtxw branch 3 times, most recently from 45af5c7 to 9583a40 Compare September 17, 2024 22:44
@elliottslaughter
Copy link
Contributor

@evanramos-nvidia I wanted to address one point mentioned in your previous comment:

We should be able to complete this within the week, in time for your merge window.
#54 (comment)

I posted the release timeline here on August 29:
https://legion.zulipchat.com/#narrow/stream/187787-general/topic/legion.20release.2024.2E09.2E0/near/466074958

In particular, feature freeze was September 16.

The intent of our release system is to follow a train model. The goals are (a) to run frequent releases so it matters less what specific release a feature gets into, and (b) merge features when they're ready (rather than sprint to get them ready by some deadline). Even though we have a cutoff date we usually shy away from large merges toward the end of the cycle anyway.

Based on the fact that this is new code being added and we're past the feature freeze date, it would be my plan to merge this right after the release so that it's early in the next cycle. My understanding is that cuNumeric tracks the master branch so this will make it available to cuNumeric as soon as they update, so it will not delay adoption on their side.

Let me know if you have any questions.

@elliottslaughter
Copy link
Contributor

@evanramos-nvidia are you ready to move forward with this? We can make this available to the cuNumeric folks as soon as it hits master.

@evanramos-nvidia
Copy link
Contributor Author

  • Split the nvtxw crate back out into its own repository and publish it (not necessarily to crates.io, can be just Github for now)
  • Update this PR to reference that (either via git URL or version, depending on what you chose for the last step)

Updated to complete these steps.

Cargo.toml Outdated Show resolved Hide resolved
@evanramos-nvidia evanramos-nvidia force-pushed the nvtxw branch 4 times, most recently from fcd668d to ca6c5ba Compare October 8, 2024 23:15
@evanramos-nvidia evanramos-nvidia marked this pull request as ready for review October 9, 2024 00:05
@elliottslaughter elliottslaughter merged commit 3396c8b into StanfordLegion:master Oct 10, 2024
7 checks passed
@evanramos-nvidia
Copy link
Contributor Author

Elliott, thank you for your help in reviewing this PR. It has been valuable to get your feedback and coaching since this is the first Rust project I've worked on.

@elliottslaughter
Copy link
Contributor

Yup, and thanks for putting in the work to make all the changes. It's been a long road but I think the code is in a much better place now, with a much better foundation for building future work.

The Legion side MR got merged so this should be available to the Legate folks as soon as they update Legion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants