Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: reduce size of Id and add IdRef #31

Closed
wants to merge 1 commit into from

Conversation

jalil-salame
Copy link
Contributor

@jalil-salame jalil-salame commented Aug 29, 2024

This reduces the size of Id from 24 bytes to 16 bytes on 64-bit systems, and from 12 bytes to 8 bytes in 32-bit systems.

It also adds an IdRef type that we can use instead of a &Id, this avoids a double indirection (&str vs &String). In trustfall-rustdoc-adapter we see a 7% perf improvement when using &IdRef instead of &Id.

Sadly, AFAIK, there is no way to safely coerce Id(Box<str>) to &IdRef(str) so we need to do an unsafe std::mem::transmute(&str) -> &IdRef.

zulip discussion

@jalil-salame
Copy link
Contributor Author

jalil-salame commented Aug 29, 2024

Apparently bytemuck's TransparentWrapper handles this case, so we could switch to that instead and have it execute the unsafe code.

This reduces the size of `Id` from 24 bytes to 16 bytes on 64-bit
systems, and from 12 bytes to 8 bytes in 32-bit systems.

It also adds an `IdRef` type that we can use instead of a `&Id`, this
avoids a double indirection (`&str` vs `&String`). In
`trustfall-rustdoc-adapter` we see a [7% perf improvement][1] when using
`&IdRef` instead of `&Id`.

Sadly, AFAIK, there is no way to safely coerce `Id(Box<str>)` to
`&IdRef(str)` so we need to do an unsafe
`std::mem::transmute(&str) -> &IdRef`.

[1]: obi1kenobi/trustfall-rustdoc-adapter#423 (comment)
Copy link
Member

@aDotInTheVoid aDotInTheVoid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments:

  1. This repo shouldn't be used for development of the crate source code. See here for why rust-lang/rust is the right place to submit this. (aside: if you did read that section and it wasn't clear to you, please let me know)

  2. I'm unclear what the benchmark numbers are showing: The linked PR changes the hashing algorithm, not anything to do with how ID's are represented. I'd like it to be clearer what's being measured.

  3. I'm fully in support of changing the definition of Id. One thing to note is that this makes the inner string not-accessible to users, but I'd like to move towards that. (but this will have complications as librustdoc needs to be able to construct an Id. Maybe we can add more filtering to the update.sh, so the upstream version exposes the inner string but we don't here)

  4. I'm hesitant about adding IdRef. The additional complexity is unfortunate. What are the performance numbers like if you don't include it, but do change Id to be narrower?

  5. It seems the root of this problem is that our Id's allocate. This would take much more work, but have you thaught about changing rustdoc to use integer id's instead of strings? This would probably have even better perf implications, and avoid more api complexity. However, it would be much more involved to implement. I'd be happy to land Id (but more hesitant for IdRef) changes as an interim measure if you think that'd be worthwhile.

#[doc(hidden)]
#[repr(transparent)]
#[derive(Clone, Debug, PartialEq, Eq, Hash, Serialize, Deserialize)]
pub struct IdType<T: ?Sized>(T);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this? I think It'd be better to just duplicate all the derives, rather than using a type alias here.

pub struct Id(pub String);
pub type Id = IdType<Box<str>>;

/// A reference to an [`Id`]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But an IdRef isn't a reference to an Id, but an usized typed than can be taken as a reference to. &IdRef is a reference to an Id.

Also, this doc comment should explain the motivations for doing this.

@aDotInTheVoid
Copy link
Member

Closing as this should be sent to rust-lang/rust

@jalil-salame
Copy link
Contributor Author

Sorry, I didn't read the contributing guidelines. The comment addresses experimental changes where we switch from HashMap<&Id, _> to HashMap<&IdRef, _> instead of changing to FxHashMap<&Id, _>. I'm preparing a PR where I do add the IdRef.

The generics are from an experiment on unsized coercion, I thought I might be able to coerce Box<str> to str (because you can't build &IdRef safely otherwise). This works for [_; N] -> [_] coercion, but not Box<str> -> str sadly.

@jalil-salame jalil-salame deleted the id-ref-type branch August 30, 2024 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants