Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[metal] Use objc2-metal #5641

Open
wants to merge 6 commits into
base: trunk
Choose a base branch
from
Open

[metal] Use objc2-metal #5641

wants to merge 6 commits into from

Conversation

madsmtm
Copy link
Contributor

@madsmtm madsmtm commented Apr 30, 2024

Description

Use the objc2-metal crate instead of the metal crate. This will:

  • Improve memory management and soundness.
  • Make it easier to quickly support new Metal APIs (they're either already generated for you, or is basically just an update of Xcode away).
  • Likely allow reducing the usage of Arc and Mutex, as Metal objects are already reference-counted (depending on thread-safety details, not entirely sure).
  • Likely improve performance, we use objc_retainAutoreleasedReturnValue underneath the hood to avoid putting objects into the autorelease pool when possible, reducing memory pressure.
  • Make it possible to properly support tvOS, watchOS and visionOS.

Background

The metal crate contains bindings to the Metal framework. This uses objc to manually perform the message sends, which is quite error-prone, see gfx-rs/metal-rs#284, gfx-rs/metal-rs#319 and gfx-rs/metal-rs#209 for a few examples of unsoundness (out of many).

To solve such problems in the Rust ecosystem in general, I created a successor to objc called objc2, which contains most notably the smart-pointer Retained and the macro msg_send_id!, which together ensure that Objective-C's memory-management rules are upheld correctly.

This is only part of the solution though - we'd still have to write the bindings manually. To solve this, I created a tool (planning to integrate it with bindgen, but that's likely a multi-year project) to generate such framework crates automatically. In acknowledgement that this tool is by far not perfect, and never will be, I've ensured that there's a bunch of options to modify each generated crate.

The modifications for objc2-metal in particular are currently just a few hundred lines of code, weak evidence that the generator is fairly good at this point. I'll also bring attention to the file where unsafe methods are marked safe - I have plans to investigate ways to semi-automatically figure out if something is safe or not, or at least reduce the burden of doing so, but it's a hard problem to ensure is completely sound, so for now it's a bit verbose.

Connections

gpu-allocator is also transitioning to objc2-metal in Traverse-Research/gpu-allocator#225.

gfx-rs/metal-rs#241 is an old open PR for using objc2 in metal internally instead. There currently isn't really a clear path forwards there, and it's a lot of work for less direct benefits to the ecosystem (wgpu-hal is by far the biggest user of metal). But more fundamentally IMO, it's a problem of separation of concerns; metal defines several Foundation types like NSArray, NSString, NSError and so on, and even CAMetalLayer from QuartzCore, and that's a bad idea for interoperability compared to having separate crates for each of these frameworks.

#5752 removed the link feature which is not available in these crates.

Implementation

The first commit implements the actual migration, by using a branch of objc2, with a method naming scheme that (more closely) matches metal, to make it easier to review and test what's changed.

The second commit moves to the real naming scheme that objc2 uses.

I'd strongly recommend you review these two commits separately.

Testing

Tested by using the checklist below, as well as running each example individually, and checking that they seem to work.

During the development of this I made two quite critical typos, which were luckily found by the test suite, but there's bound to be at least one more lurking in here somewhere, please test this thoroughly!

Checklist

  • Run cargo fmt.
  • Run cargo clippy.
  • Run cargo xtask test to run tests.
  • Add change to CHANGELOG.md. See simple instructions inside file.

madsmtm added a commit to madsmtm/objc2 that referenced this pull request May 20, 2024
This may technically be a breaking change if the user implemented these
protocols themselves on a `Mutable` class, but that'd be unsound anyhow,
so I'll consider this a correctness fix.

This is useful for wgpu, see gfx-rs/wgpu#5641,
and the hack will become unnecessary after
#563.
@madsmtm madsmtm force-pushed the objc2-metal branch 3 times, most recently from bd69f0d to 943f2a7 Compare May 23, 2024 01:24
@madsmtm madsmtm marked this pull request as ready for review May 23, 2024 01:35
@madsmtm madsmtm requested a review from a team as a code owner May 23, 2024 01:35
Copy link
Member

@teoxoy teoxoy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great step forward from having to maintain our own set of bindings manually.

I do have some questions/comments:

  • We still seem to be using the msg_send! macro in a bunch of places, could we use the generated methods instead?
  • We also use autoreleasepools in a bunch of places because we had issues with leaks but as far as I understand from the docs of Retained, a lot of these (if not all) should now be unnecessary. Is this correct? I think it would make sense to remove them in this PR.
  • While I appreciate the naming being more consistent with metal's, some of the enum variants now have the enum name as their prefix which feels redundant; but not always which feels odd. Examples:
    • MTLFeatureSet's variants are not prefixed
    • MTLLanguageVersion's variants are all prefixed
    • one of MTLReadWriteTextureTier's variants is not prefixed while the other 2 are)
  • Some of the function names are quite long now (ex: copyFromTexture_sourceSlice_sourceLevel_sourceOrigin_sourceSize_toBuffer_destinationOffset_destinationBytesPerRow_destinationBytesPerImage_options 😆) but I'm not sure how they can be shortened by the generator while also keeping things easy to search for on Apple's docs.
  • It would be great if all objects had docs from Apple's docs website and/or at least a link to the page on said website but no hurry, just giving my +1 for it :).

wgpu-hal/src/metal/device.rs Outdated Show resolved Hide resolved
Comment on lines +1326 to +1400
// TODO: `newComputePipelineStateWithDescriptor:error:` is not exposed on
// `MTLDevice`, is this always correct?
fn new_compute_pipeline_state_with_descriptor(
device: &ProtocolObject<dyn MTLDevice>,
descriptor: &MTLComputePipelineDescriptor,
) -> Result<Retained<ProtocolObject<dyn MTLComputePipelineState>>, Retained<NSError>> {
unsafe { msg_send_id![device, newComputePipelineStateWithDescriptor: descriptor, error: _] }
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's odd that it doesn't exist since newRenderPipelineStateWithDescriptor:error: does.
We could use newComputePipelineStateWithDescriptor:options:reflection:error: though and pass MTLPipelineOptionNone for options and nil for reflection.

@crowlKats
Copy link
Collaborator

crowlKats commented May 23, 2024

@madsmtm

The first commit removes the link feature which is not available in these crates. This was added by @crowlKats in #3853, but Deno doesn't actually use it from what I can tell, they instead specify this using -weak_framework, which is the correct solution to this problem IMO. (If I'm wrong about this, please say so, I could be persuaded to add a similar feature to the objc2-... crates).

so we did use it for a while, but then were able to remove the use on it, so yes, its not necessary anymore

@madsmtm
Copy link
Contributor Author

madsmtm commented May 23, 2024

Thanks for taking a look!

We still seem to be using the msg_send! macro in a bunch of places, could we use the generated methods instead?

I've already done this in a separate branch, but there were some issues around the semantics not being quite the same, so I wanted to keep it for a separate PR where that discussion could be more focused.

We also use autoreleasepools in a bunch of places because we had issues with leaks but as far as I understand from the docs of Retained, a lot of these (if not all) should now be unnecessary. Is this correct? I think it would make sense to remove them in this PR.

There's a lot of nuance to this question, including the fact that the optimization that allows us to avoid an autorelease is not guaranteed, only likely (depends on the exact emitted assembly, inlining, and the phase of the moon).

What has changed though is that autorelease pools no longer have any effect on the program, other than in terms of memory usage (autoreleased objects can be reclaimed sooner) and runtime performance (pushing and popping an autorelease pool has a slight overhead).

I'd really prefer to keep it out of this PR, mostly because it'll screw with the diff even more than it already is (indentation of large function bodies will change), but also partly because I do not know why each of these are here, and I'd like to retain that memory-usage profile until deemed wise to do otherwise.

While I appreciate the naming being more consistent with metal's, some of the enum variants now have the enum name as their prefix which feels redundant; but not always which feels odd. Examples:

  • MTLFeatureSet's variants are not prefixed
  • MTLLanguageVersion's variants are all prefixed
  • one of MTLReadWriteTextureTier's variants is not prefixed while the other 2 are)

Yup, that's a bug, the logic for implementing this name translation is very simplistic and a bit hastily thrown together - Swift has the correct rules written down, but it'll be breaking to change, so I'll fix it in v0.3 of the framework crates.

Some of the function names are quite long now (ex: copyFromTexture_sourceSlice_sourceLevel_sourceOrigin_sourceSize_toBuffer_destinationOffset_destinationBytesPerRow_destinationBytesPerImage_options 😆) but I'm not sure how they can be shortened by the generator while also keeping things easy to search for on Apple's docs.

Yeah :/

Feel free to comment on madsmtm/objc2#284 if you think of a good solution (or just any solution)!

It would be great if all objects had docs from Apple's docs website and/or at least a link to the page on said website but no hurry, just giving my +1 for it :).

There's madsmtm/objc2#309 open for it, the local Xcode documentation is stored in an undocumented format that I spent a few hours on, but couldn't immediately reverse-engineer.

Linking is similarly difficult, not for class names, but for methods, they have some ID in them, which is why I didn't pursue this - but I guess just linking to the class name would still be a step up, will try to prioritize it.

@madsmtm
Copy link
Contributor Author

madsmtm commented May 23, 2024

I just tried to actually benchmark this, but it's not an art that I'm familiar with, and the results seemed inconclusive (both improvements and regressions, without any clear pattern that I could discern). I'd suggest that someone more familiar with that tries to benchmark this.

@grovesNL
Copy link
Collaborator

@madsmtm For what it's worth we used to see a huge amount of time in retain/release for a lot of the sampling profilers we tried years ago, so reference counting overhead could be a useful data point.

@madsmtm madsmtm mentioned this pull request May 29, 2024
4 tasks
@madsmtm madsmtm changed the title Use objc2-metal [metal] Use objc2-metal Jun 3, 2024
@madsmtm madsmtm requested a review from teoxoy June 3, 2024 13:41
To keep the diff smaller and easier to review, this uses a temporary
fork of `objc2-metal` and `objc2-quartz-core` whose methods use the
naming scheme of the `metal` crate.

One particular difficult part with this is that the `metal` crate has
several methods where the order of the arguments are swapped relative
to the corresponding Objective-C methods.

This includes most perilously (since these have both an offset and an
index argument, both of which are integers):
- `set_bytes`
- `set_vertex_bytes`
- `set_fragment_bytes`
- `set_buffer`
- `set_vertex_buffer`
- `set_fragment_buffer`
- `set_threadgroup_memory_length`

But also:
- `set_vertex_texture`
- `set_fragment_texture`
- `set_sampler_state`
- `set_vertex_sampler_state`
- `set_fragment_sampler_state`
@madsmtm
Copy link
Contributor Author

madsmtm commented Aug 25, 2024

We still seem to be using the msg_send! macro in a bunch of places, could we use the generated methods instead?

I've already done this in a separate branch, but there were some issues around the semantics not being quite the same, so I wanted to keep it for a separate PR where that discussion could be more focused.

I've opened #6107 to fix the semantic issues, btw. Whichever of these two PRs merge first, I'll update the other to remove the remaining msg_send!.

@jimblandy
Copy link
Member

jimblandy commented Sep 12, 2024

This PR has been open since the spring, so I want to let folks know what's going on with it.

We (the wgpu maintainers) think objc2 is the future of Objective-C bindings for Rust, and wgpu should use it, as soon as possible. The emphasis on soundness and idiomatic Rust are music to our ears. I'm really grateful to Mads for coming in and writing this PR; it's a very valuable contribution. Everything has to be reviewed, of course, but based on looking at a bit of his prior work, I'm expecting this PR to be very high quality.

Firefox has been incorporating wgpu trunk roughly weekly into our development source tree, Mozilla Central, which is the basis for Nightly Firefox releases. We (Firefox) don't import from some separate branch, because a delta between what Firefox sees and what wgpu's direct users see means a slip in collaboration and an increase in risk.

Firefox uses a supply-chain auditing tool called cargo vet, which ensures that every line of Rust source that we incorporate into Firefox from public crates has been audited by someone who understands Firefox's security constraints. (For generated code, Mozilla audits the generator, and if its output is checked in, we verify that the output is actually what the generator produces.) There's a lot of grandfathered Rust code in Firefox that hasn't been audited, but yes, for the last few years, we actually really have looked at every change to every random upstream crate, because, well, there's no responsible alternative. cargo vet has features that allow us to share the auditing load with Google and a few other trusted orgs. Other groups are free to publish their own audit lists, but incorporating those is a pretty serious policy decision that requires looking into their processes, track record, and so on.

But this means that taking this PR means that Firefox cannot update the wgpu sources in Mozilla Central until we have audited every line of objc2-metal and its dependencies (block2, etc.) This is our only process, at the moment, for getting bug fixes in wgpu-core into Firefox, so if we take this PR, then Firefox is dead in the water until that vetting is done. Obviously, we'd prefer simply to audit the objc2 crates in advance, land this PR, and then do delta audits to cover any changes between that first audit and whatever gets incorporated.

The next steps for this PR are:

  • Get it audited for cargo vet. The Firefox WebGPU team is only three full-time people. However, I've started poking around within the org to get some additional people with the necessary Objective-C and Rust background to help us with this, and things are looking promising.

  • Finish reviewing this PR. In a few days, @teoxoy is going to be on PTO for a few weeks, but I'm going to find someone else to do it. (I'm seriously behind in my review obligations already, so if I were to claim I was going to do it myself, people would just laugh.)

What I'm hoping to get across here is that, despite the delay, this PR is important to us, and although we have some very stern constraints, we're moving the process along.

@madsmtm
Copy link
Contributor Author

madsmtm commented Sep 12, 2024

Thank you very much for the update @jimblandy! It's really nice to see that you're taking supply chain attacks seriously, it's something we very much need to be better at in our industry!

Regarding the ability to audit/review the objc2 codebase, I recognize that I'm the person "under scrutiny" so to say, but is there anything I can do to make things easier for you?

Code-wise, I can give a quick primer:

  • The only thing running on the host machine is a small build script in objc-sys, there's no proc-macros (yet).
  • The hardest part to review is probably the objc2 crate itself, and the declarative macros therein, it consists of hacks, hacks and nothing but hacks. I apologize in advance to whoever has to look at it.
  • Most of the code in the framework crates, for Wgpu's case objc2-metal, objc2-quartz-core and objc2-foundation, is autogenerated ahead of time, put in a separate git repo, and symlinked into each crate. You can verify the generated code by removing it, re-generating it with cargo run --bin header-translator (requires, currently, Xcode 15.4 to be the active developer directory), and checking git status.
  • block2, objc2-encode and the manual fixes/additions in the framework crates should be comparatively easy to review.

Feel free to ask if there's something here that you need help with, or if there's something I can change to make things easier to audit (now and in the future)!

On the social/people side, would it help if we had an online meeting or something? Though I guess that could be construed as a form of social engineering attack, and wouldn't really add anything trust-wise.

In any case, thanks again for the update, and I'm totally fine with waiting a few more months, or however long it takes you!

@ErichDonGubler ErichDonGubler mentioned this pull request Nov 25, 2024
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

Successfully merging this pull request may close these issues.

5 participants