Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Clone bound from more things #72

Open
jplatte opened this issue Mar 11, 2024 · 5 comments
Open

Remove Clone bound from more things #72

jplatte opened this issue Mar 11, 2024 · 5 comments

Comments

@jplatte
Copy link
Contributor

jplatte commented Mar 11, 2024

As a follow-up to #71, I would like to remove the Clone bound from Deserialize too. However, this requires a lot of other changes, so I'd like to discuss it here first.

The solution I'm thinking of is to duplicate a large part of the current API internally to be able to add items to a container while knowing that no Clone is actually called (there might be an alternative around a #[repr(transparent)] wrapper that implements Clone for anything and always panics, but then it's really hard to ensure that that impl is never used). That is, use something like UniqueArc and build UniqueVector, UniqueRRB, UniqueNode and such on top of it. Then FromIterator can use that to drop its Clone bound, and Deserialize doesn't need the Clone bound either anymore.

Wdyt?

@jneem
Copy link
Owner

jneem commented Mar 11, 2024

There are a few things I don't quite understand. I guess the most important one is: why is it worth the effort? You can't really do much with an imbl::Vector<T> unless T is Clone, so why is it worth doubling our code size to allow for one to be constructed?

I also didn't understand how the proposed implementation would work. Presumably, UniqueVector<T> wouldn't support push_back unless T: Clone, so how do you propose to implement FromIterator<T> for UniqueVector<T>?

@jplatte
Copy link
Contributor Author

jplatte commented Mar 11, 2024

So why is it worth doubling our code size to allow for one to be constructed?

I don't know if it's worth ^^
It would allow deriving things on generic containers that contain a imbl::Container field more easily / with more "primitive" derive macros that don't support bound rewriting like serde. And who knows, maybe there are cases where being able to create Vectors or other imbl containers of non-Clone items is useful (e.g. to use them with some generic code that only needs to read from the container), even if you can no longer modify them once they're shared.

I also didn't understand how the proposed implementation would work. Presumably, UniqueVector<T> wouldn't support push_back unless T: Clone

Why do you presume that? It would have to support that.

@jneem
Copy link
Owner

jneem commented Mar 12, 2024

I don't know if it's worth ^^

Great, so let me play the role of the grumpy maintainer that wants to minimize effort (but of course feel free to push back). First of all, one easy way to drop the Clone requirement is just to make a Vector<Arc<T>>. If I'm not mistaken, this is what rpds does.

It would allow deriving things on generic containers that contain a imbl::Container field more easily / with more "primitive" derive macros that don't support bound rewriting like serde.

If supporting this in impl requires duplicating everything, I'd argue it's easier to fix those proc macros instead. Deriving conditional on bounds is useful enough that they'd probably have applications other than imbl.

And who knows, maybe there are cases where being able to create Vectors or other imbl containers of non-Clone items is useful (e.g. to use them with some generic code that only needs to read from the container).

Without an example of such code it's hard to be sure, but if it only reads from the container why bother putting an imbl::Vector there? Why not be generic over C: FromIterator<T> + IntoIterator<T>?

Why do you presume that?

I'm not sure, I think I misunderstood the proposed implementation before. If I understood it correctly now, the idea is that every time an Arc::make_mut clones an Arc in the current implementation, instead you'd just mutate the UniqueArc directly?

Anyway, having just argued against including this, let me also suggest some possible alternate implementations:

  1. Rewrite FromIterator. In principle, we can build up the tree in one pass without cloning instead of calling push_back repeatedly. We'd need know the iterator's length first (in order to know the tree structure), so we could collect into a Vec first. That's a bit of extra copying and allocation, but it's still probably more efficient than all the allocation involved in building up the tree element-by-element. It would also be a non-trivial amount of code, but still probably less than duplicating everything.
  2. Do everything with GATs, replacing Arc with a generic associated Ptr type. I don't know how feasible this is (I haven't done much with GATs in rust), but it would be useful for more than just this feature. For example, it would allow reviving the Rc version.

@jplatte
Copy link
Contributor Author

jplatte commented Mar 12, 2024

Thanks for the very detailed reply, makes a lot of sense. Having this kind of discussion is what I opened the issue for before putting too much effort into the implementation :)

I'll only respond to some bits for now that I find most interesting, and will get back to the usefulness of this feature later (in short, I think you're probably right that it doesn't pull its own weight).

Why do you presume that?

I'm not sure, I think I misunderstood the proposed implementation before. If I understood it correctly now, the idea is that every time an Arc::make_mut clones an Arc in the current implementation, instead you'd just mutate the UniqueArc directly?

Yes, exactly!

Anyway, having just argued against including this, let me also suggest some possible alternate implementations:

  1. Rewrite FromIterator. In principle, we can build up the tree in one pass without cloning instead of calling push_back repeatedly. We'd need know the iterator's length first (in order to know the tree structure), so we could collect into a Vec first. That's a bit of extra copying and allocation, but it's still probably more efficient than all the allocation involved in building up the tree element-by-element. It would also be a non-trivial amount of code, but still probably less than duplicating everything.

This is actually the kind of thing I wanted to do at first, but since I'm much more comfortable with generics and weird memory management tricks like UniqueArcthan with tree container internals, I went down the other path pretty quickly.

  1. Do everything with GATs, replacing Arc with a generic associated Ptr type. I don't know how feasible this is (I haven't done much with GATs in rust), but it would be useful for more than just this feature. For example, it would allow reviving the Rc version.

I think this should be feasible. I'm personally not terribly interested in the rc stuff as a user, but actually this could also allow using other Arc types (e.g. one from triomphe) to be used. Then again, maybe the std Arc could just be replaced with one that's more optimal for imbls use cases unconditionally? Would you be up for that in principle?

I have a little bit of experience with GATs and a lot of experience with the trait system in general, and it would be a real fun puzzle to slot this all together. Three Qs about your idea for a GATified imbl:

  • You mentioned elsewhere that you want the pool feature to work more like std's allocator API when supported again, not like im's pool feature. Does that mean it would be worth first kicking out the current fakepool + Cargo feature? That should make the newly-introduced trait(s) simpler.
  • Would you want GATs to be part of the public API, or just internal? The latter would be more boilerplate of course, having to duplicate all public function signatures and delegate to the actual implementation. On the other hand, it probably makes the crate more beginner-friendly (although recent rustdoc improvements for generic type aliases alleviate a lot of the pain of docs for highly generic APIs, c.f. https://docs.rs/ndarray/latest/ndarray/type.Array.html)
    • If the generics are public for reduced boilerplate, would you want the traits to be sealed to be able to make more changes to them without breaking the API, or not so users can plug in their own (a)rc replacement?

@jneem
Copy link
Owner

jneem commented Mar 13, 2024

I think this should be feasible. I'm personally not terribly interested in the rc stuff as a user, but actually this could also allow using other Arc types (e.g. one from triomphe) to be used. Then again, maybe the std Arc could just be replaced with one that's more optimal for imbls use cases unconditionally? Would you be up for that in principle?

In principle, sure, I don't have strong feeling either way. I think I'd probably want to see some measurable benefit (e.g. some speedup in a benchmark) to justify the added dependency.

I have a little bit of experience with GATs and a lot of experience with the trait system in general, and it would be a real fun puzzle to slot this all together. Three Qs about your idea for a GATified imbl:

* You mentioned elsewhere that you want the pool feature to work more like std's allocator API when supported again, not like im's pool feature. Does that mean it would be worth first kicking out the current fakepool + Cargo feature? That should make the newly-introduced trait(s) simpler.

Sure, if the pool feature is getting in the way, feel free to rip it out.

* Would you want GATs to be part of the public API, or just internal? The latter would be more boilerplate of course, having to duplicate all public function signatures and delegate to the actual implementation. On the other hand, it probably makes the crate more beginner-friendly (although recent rustdoc improvements for generic type aliases alleviate a lot of the pain of docs for highly generic APIs, c.f. https://docs.rs/ndarray/latest/ndarray/type.Array.html)

Good question, and without knowing what the API looks like it's hard to say. Probably it would make sense to expose the GATs for flexibility, and if they're particularly annoying to use then maybe they could go in a submodule and there could be a simplified facade in the crate root? But if it's as simple as an extra defaultable generic parameter (like Vector<T, Ptr=ArcPtr>, where ArcPtr is some type we provide that implements some GAT-having trait) then maybe just expose it directly?

  * If the generics are public for reduced boilerplate, would you want the traits to be sealed to be able to make more changes to them without breaking the API, or not so users can plug in their own (a)rc replacement?

I'm not too worried about breaking the API; it isn't exactly like we have a hard-to-keep-up-with release cadence here. But for ease-of-use I'd probably try to guess the main candidates for (a)rc replacements and provide support for them within imbl (behind feature flags if they require taking on additional dependencies).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants