Implement NVTXW export #54

evanramos-nvidia · 2024-04-03T22:13:59Z

This PR contains the second iteration of our NVTXW export feature. The new code looks significantly different than before, keeping in mind the feedback you've given so far. We may still revise some details of these changes, particularly where the Rust code is perhaps too C-like, but overall it is ready for review. It is at a point where getting the code reviewed by someone with more Rust experience will help inform us how we want to finalize it.

I've marked this PR as a draft while we determine where we will put the crate. For review and CI purposes, I've set it up in the subdir crate-tmp, but this is temporary. If you have any feedback to offer on the crate in addition to the Legion Prof Viewer code itself, we would definitely appreciate that, since we are new to Rust and still learning.

elliottslaughter

I haven't had time to look at the NVTXW code yet, but here are my preliminary thoughts on the part of this that integrates into the prof-viewer code.

Overall this is dramatically better, thanks for doing this work. I appreciate that you work with the existing abstractions rather than working around them. There are a handful of places where I think it would be better to avoid so much hard-coding, but this is hopefully a relatively minor change from what you've got now.

There is one structural change you might consider, which I left a comment about in the source code. At the moment, you essentially read the entire data set into memory, interleave it into one massive table, and then de-interleave it as you write it back out. First of all this uses memory proportional to the size of the profile. Second, it just seems like a lot of work to do this (and probably inefficient). I think that an approach in which you match tiles with their corresponding meta tile, and then write that as a unit, would probably be more efficient. As a bonus it would avoid needing to have so many streams open on the NVTXW side. I think you can do this with fixed memory usage while still allowing a degree of parallelism by configuring the number of outstanding calls you allow.

src/nvtxw.rs

crate-tmp/src/nvtxw.rs

elliottslaughter

Some comments on the NVTXW code itself.

crate-tmp/src/nvtxw.rs

evanramos-nvidia · 2024-08-15T22:31:35Z

Updated to fix introduced clippy diagnostics

elliottslaughter · 2024-08-15T22:36:16Z

I'm on vacation but I will hit the "run" button as long as you keep pushing changes.

I'll get back to you with a review on August 26.

src/nvtxw.rs

elliottslaughter

I reviewed the Rust NVTXW crate.

Overall my highest level comment is that you may want to think about exactly how high- or low-level you want this interface to be. The current interface is serviceable and does, strictly speaking, remove unsafe from user code. But the code that users have to write is still a bit questionable: ideally, users wouldn't have to think that hard about struct packing, CString, and raw pointers at all. So this may be fine for now, but just recognize that as written it's not a great example of best-in-class Rust API wrappers for C.

I added some minor code comments to improve quality in various ways throughout the code, and suggested some more idiomatic practices.

crate-tmp/Cargo.toml

crate-tmp/src/nvtxw.rs

src/nvtxw.rs

crate-tmp/src/nvtxw.rs

elliottslaughter · 2024-08-27T22:18:42Z

Some other things you might think about for NVTXW, which I did not specifically address:

Can you add tests for NVTXW? Rust has a pretty good test infrastructure, and you can write both unit and integration tests in the crate (which will both be run by cargo test). Here's the documentation on how to do this: https://doc.rust-lang.org/book/ch11-01-writing-tests.html
Once you have tests it's considered a best practice to run your code through Miri, which is sort of like asan/ubsan for Rust. This matters because of the unsafe code in the crate; otherwise Rust's type system would make it unnecessary to do anything of the sort: https://github.com/rust-lang/miri
If you were to deploy this in a standalone repo, you would want to copy the test infrastructure inside .github/workflows/rust.yml (or implement something like it). It has a bunch of standard CI jobs to check the build, tests, format, lints, etc.

evanramos-nvidia · 2024-08-27T23:30:02Z

Thanks for your feedback, it's really helpful both to make the code better and to build my Rust skills.

I've updated the branch to address some items, and I will continue with the rest and update it further. Right now, I am particularly interested in feedback on my revised implementation of tile matching, using a single BTreeMap where the value is a pair of Options.

src/nvtxw.rs

evanramos-nvidia · 2024-08-27T23:55:55Z

The CI is now failing with

Clippy: nvtxw_bindings.rs#L3
use of unstable library feature 'offset_of'

The only thing I can think of that might have introduced this is specifying the version of bindgen in the crate's Cargo.toml.

elliottslaughter · 2024-08-28T00:00:39Z

Try reverting to bindgen 0.69? That seems to be what you used in the last successful build:

https://github.com/StanfordLegion/prof-viewer/actions/runs/10411469581/job/28835561058

crate-tmp/src/nvtxw.rs

src/nvtxw.rs

Cargo.toml

evanramos-nvidia · 2024-09-11T00:30:04Z

@elliottslaughter I've fixed the build on macOS. Thanks for bringing that to my attention.

crate-tmp/src/nvtxw.rs

evanramos-nvidia · 2024-09-12T19:58:52Z

Note: make sure you see https://gitlab.com/StanfordLegion/legion/-/merge_requests/1001#note_2097198025 as that may require work here

I've implemented this request.

crate-tmp/src/nvtxw.rs

evanramos-nvidia · 2024-09-16T23:49:49Z

@elliottslaughter

In order to get this into the release, here's what we need to do:

Other than the discussion about the enums in the NVTXW crate, I believe I have addressed all review feedback. Once I have resolved it and have your approval, I am prepared to proceed with the remaining steps for publishing the crate and updating the branches. We should be able to complete this within the week, in time for your merge window.

elliottslaughter · 2024-09-18T21:26:00Z

@evanramos-nvidia I wanted to address one point mentioned in your previous comment:

We should be able to complete this within the week, in time for your merge window.
#54 (comment)

I posted the release timeline here on August 29:
https://legion.zulipchat.com/#narrow/stream/187787-general/topic/legion.20release.2024.2E09.2E0/near/466074958

In particular, feature freeze was September 16.

The intent of our release system is to follow a train model. The goals are (a) to run frequent releases so it matters less what specific release a feature gets into, and (b) merge features when they're ready (rather than sprint to get them ready by some deadline). Even though we have a cutoff date we usually shy away from large merges toward the end of the cycle anyway.

Based on the fact that this is new code being added and we're past the feature freeze date, it would be my plan to merge this right after the release so that it's early in the next cycle. My understanding is that cuNumeric tracks the master branch so this will make it available to cuNumeric as soon as they update, so it will not delay adoption on their side.

Let me know if you have any questions.

elliottslaughter · 2024-09-26T16:48:10Z

@evanramos-nvidia are you ready to move forward with this? We can make this available to the cuNumeric folks as soon as it hits master.

evanramos-nvidia · 2024-09-30T19:21:18Z

Split the nvtxw crate back out into its own repository and publish it (not necessarily to crates.io, can be just Github for now)

Update this PR to reference that (either via git URL or version, depending on what you chose for the last step)

Updated to complete these steps.

Cargo.toml

evanramos-nvidia · 2024-10-10T21:37:50Z

Elliott, thank you for your help in reviewing this PR. It has been valuable to get your feedback and coaching since this is the first Rust project I've worked on.

elliottslaughter · 2024-10-11T00:10:43Z

Yup, and thanks for putting in the work to make all the changes. It's been a long road but I think the code is in a much better place now, with a much better foundation for building future work.

The Legion side MR got merged so this should be available to the Legate folks as soon as they update Legion.

elliottslaughter reviewed Apr 17, 2024

View reviewed changes

crate-tmp/src/nvtxw.rs Outdated Show resolved Hide resolved

elliottslaughter reviewed Apr 17, 2024

View reviewed changes

crate-tmp/src/nvtxw.rs Outdated Show resolved Hide resolved

crate-tmp/src/nvtxw.rs Outdated Show resolved Hide resolved

crate-tmp/src/nvtxw.rs Outdated Show resolved Hide resolved

evanramos-nvidia force-pushed the nvtxw branch from 09fe90c to fb2c6ac Compare June 10, 2024 19:14

evanramos-nvidia force-pushed the nvtxw branch 3 times, most recently from 278ca3e to 2e257de Compare August 15, 2024 22:30

elliottslaughter reviewed Aug 26, 2024

View reviewed changes

src/nvtxw.rs Outdated Show resolved Hide resolved

elliottslaughter reviewed Aug 26, 2024

View reviewed changes

src/nvtxw.rs Outdated Show resolved Hide resolved

elliottslaughter reviewed Aug 26, 2024

View reviewed changes

src/nvtxw.rs Outdated Show resolved Hide resolved

elliottslaughter reviewed Aug 27, 2024

View reviewed changes

evanramos-nvidia force-pushed the nvtxw branch from 2e257de to 77f22ee Compare August 27, 2024 23:22

evanramos-nvidia force-pushed the nvtxw branch from 77f22ee to 519e2fb Compare August 27, 2024 23:33

elliottslaughter reviewed Aug 27, 2024

View reviewed changes

src/nvtxw.rs Outdated Show resolved Hide resolved

evanramos-nvidia force-pushed the nvtxw branch from 519e2fb to 3c95db6 Compare August 27, 2024 23:39

evanramos-nvidia force-pushed the nvtxw branch 3 times, most recently from d6a8e73 to 6d41688 Compare September 5, 2024 22:42