Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add second lifetime to FromPyObject #4390

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open

Conversation

Icxolu
Copy link
Contributor

@Icxolu Icxolu commented Jul 28, 2024

This adds the second lifetime to FromPyObject and removes FromPyObjectBound. I hope I adjusted all the bounds correctly. I believe the equivalent to our current implementation is use HRTB for the input lifetime on any generic bound. But this should only be necessary for containers that create temporary Python references during extraction. For wrapper types it should be possible to just forward the relaxed bound.

For easier trait bounds FromPyObjectOwned is introduced. This is blanket implemented for any FromPyObject type that does not depend on the input lifetime. It is intended to be used as a trait bound, the idea is inspired by serde (Deserialize <=> DeserializeOwned)

I tried to document different cases in the migration guide, but it can probably still be extended. Changelog entry is still missing, will write that tomorrow.

Breaking changes:

  • the second lifetime
  • the extract method without default
  • the addition lifetime bound on extract_bound to make the default work

@mejrs
Copy link
Member

mejrs commented Jul 28, 2024

I propose that we add a FromPyObjectOwned trait similar to serde's DeserializeOwned. It's much easier to explain "just use this trait if the output doesn't borrow from Python" rather than "you need higher ranked trait bounds".

@Icxolu
Copy link
Contributor Author

Icxolu commented Jul 29, 2024

Thanks! I think that's a great idea. It's essentially the same thing, but it's much easier to grasp by giving it a distinct name, plus we can better document it in the API docs. I will adapt this to make use of that.

@Icxolu Icxolu force-pushed the from-pyobject branch 3 times, most recently from c960124 to a1c0bed Compare July 29, 2024 18:16
Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing on with this! Overall it looks good to me, and I only have a few docs suggestions.

Before we definitely commit to this, I want to take a brief moment just to reflect on the choice of Borrowed over &Bound. There's at least two good technical reasons, which is the lifetime constraint and performance by avoiding pointer-to-pointer. The codspeed branch does show a few slight perf improvements in the ~4% range on tuple extraction and we have further possibilities like PyRef containing Borrowed once we do this.

That said, we deliberately kept the Borrowed smart pointer out of the way as much as possible in the original 0.21 design to try to keep new concepts down. Is it fine to increase its use? I think the upsides justify this, though we might want to increase the quality of documentation and examples on Borrowed (which I'd left quite minimal so far).

guide/src/migration.md Outdated Show resolved Hide resolved
guide/src/migration.md Outdated Show resolved Hide resolved
guide/src/migration.md Outdated Show resolved Hide resolved
guide/src/migration.md Outdated Show resolved Hide resolved
pyo3-macros-backend/src/frompyobject.rs Outdated Show resolved Hide resolved
@@ -616,8 +616,9 @@ pub fn build_derive_from_pyobject(tokens: &DeriveInput) -> Result<TokenStream> {
let ident = &tokens.ident;
Ok(quote!(
#[automatically_derived]
impl #trait_generics #pyo3_path::FromPyObject<#lt_param> for #ident #generics #where_clause {
fn extract_bound(obj: &#pyo3_path::Bound<#lt_param, #pyo3_path::PyAny>) -> #pyo3_path::PyResult<Self> {
impl #trait_generics #pyo3_path::FromPyObject<'_, #lt_param> for #ident #generics #where_clause {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be an interesting question for later if we ever support the 'a lifetime in our #[derive(FromPyObject)]. Definitely not for now 😂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, something to experiment with in the future, but out of scope for now for sure 😅

/// the normal `FromPyObject` trait. This trait has a blanket implementation
/// for `T: FromPyObject`.
/// Note: depending on the implementation, the lifetime of the extracted result may
/// depend on the lifetime of the `obj` or the `prepared` variable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use a rather than prepared here and below? Also maybe written as 'py or 'a, as we do elsewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, I was a bit lazy and just uncommented the docs that we temporarily removed during gil-ref migration.

How about

/// Note: depending on the implementation, the extracted result may
/// depend on the Python lifetime `'py` or the input lifetime `'a` of `obj`.

Also the example below that seem a bit confusing, what do you think about using Cow<str> as an example, which may or may not borrow from the input ('a) depending on the Python runtime type and maybe Bound<'py, PyString> as an example that depends on the Python lifetime? Maybe we also add an example of a collection type (maybe Vec) to introduce FromPyObjectOwned...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all of that sounds great, yes please 👍

src/conversion.rs Outdated Show resolved Hide resolved
Comment on lines +309 to +553
impl<'py, T> FromPyObject<'_, 'py> for PyRef<'py, T>
where
T: PyClass,
{
fn extract_bound(obj: &Bound<'py, PyAny>) -> PyResult<Self> {
fn extract(obj: Borrowed<'_, 'py, PyAny>) -> PyResult<Self> {
obj.downcast::<T>()?.try_borrow().map_err(Into::into)
}
}

impl<'py, T> FromPyObject<'py> for PyRefMut<'py, T>
impl<'py, T> FromPyObject<'_, 'py> for PyRefMut<'py, T>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And then the follow-up future PR will be what to do about PyRef. Should we split into PyRef / PyRefBorrowed, or keep the one type PyRef<'a, 'py, T> and accept some breakage? 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think I we want to provide both variants, I would prefer PyRef<'a, 'py, T> and PyRefOwned<'py, T> since that would be more consistent with FromPyObjectOwned.

In any case we would need to find a good way to differentiate their constructors. Neither obj.borrow_borrowed() nor obj.borrow_owned() feel to appealing to me 😄

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, it's tough to find names. I'm somewhat tempted to wait until after 0.23 and to first see how all these trait changes we've already made play out before we change PyRef.

@Icxolu
Copy link
Contributor Author

Icxolu commented Aug 3, 2024

Thanks for the review! You are right, the choice of Borrowed over &Bound is definitely something to discuss about. I used it here because it was the final form of FromPyObjectBound that we landed on. Looking back, the change to Borrowed happened in #3959, which was related to the gil-refs API, so it might not be strictly necessary anymore (haven't checked). The points you mentioned seem like good justification why it still makes sense to introduce additional complexity here.

I think the upsides justify this, though we might want to increase the quality of documentation and examples on Borrowed (which I'd left quite minimal so far).

I agree, if we put Borrowed in such a core part of the infrastructure it deserves a documentation overhaul. We might also want to take a look at the current (public) API and additionally expose some helpers. Something like to_any comes to mind, which is currently internal only.

Copy link
Member

@mejrs mejrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO the recently added FromPyObject documentation is too "implementation detail"-y.

I'd prefer if this documentation is kept short and concise, one paragraph max.

For example:

Depending on the python version and implementation some `FromPyObject`
 implementations may produce Rust types that point into the Python type.
For example `PyString` and `PyBytes` may convert into `str` and `[u8]` references
 that borrow from the original Python object.

Types that do not borrow from the input should use `FromPyObjectOwned` instead.

src/conversion.rs Outdated Show resolved Hide resolved
src/conversion.rs Outdated Show resolved Hide resolved
/// borrow from the input lifetime `'a`. The behavior depends on the runtime
/// type of the Python object. For a Python byte string, the existing string
/// data can be borrowed (lifetime: `'a`) into a [`Cow::Borrowed`]. For a Python
/// Unicode string, the data may have to be reencoded to UTF-8, and copied into
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or not, depending on the python version the string will be interned in the unicode object.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm aware of that, but that felt like too much detail for me, and is not really relevant for the point I am trying to make here. That's why I tried to phrase it in a rather vague way.

@Icxolu
Copy link
Contributor Author

Icxolu commented Aug 4, 2024

Thanks for the feedback! I took another pass over it and tried to make it a bit more concise. I do however think that it's worth to talk about the individual lifetimes at play here, since this trait is right in the center of PyO3. I moved the paragraph about collections into the FromPyObjectOwned docs, because it works much better there with the code example, and replaced it with a smaller note for more info. I reworded my examples a bit and moved them under a # Details section, maybe this works better?

Let me know what you think about this.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much for keeping this branch alive, I guess let's move forward with this before it gets too painful to keep updating.

I had yet another reflection on &Bound vs Borrowed and I'm feeling reasonably comfortable that Borrowed is the correct choice.

src/conversion.rs Show resolved Hide resolved
src/conversion.rs Show resolved Hide resolved
@@ -656,8 +656,8 @@ macro_rules! tuple_conversion ({$length:expr,$(($refN:ident, $n:tt, $T:ident)),+
}
}

impl<'py, $($T: FromPyObject<'py>),+> FromPyObject<'py> for ($($T,)+) {
fn extract_bound(obj: &Bound<'py, PyAny>) -> PyResult<Self>
impl<'a, 'py, $($T: FromPyObject<'a, 'py>),+> FromPyObject<'a, 'py> for ($($T,)+) {
Copy link
Member

@davidhewitt davidhewitt Aug 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for the record and the decision to stick with Borrowed, the reason is that if we used &'a Bound here for the input argument, then tuple extraction would be forced to restrict elements to FromPyObjectOwned.

That maybe wouldn't be a terrible thing, as this is currently a special case when compared with every other container, which is forced to use FromPyObjectOwned due to their mutability.

But I think it would be a breaking change, so it's probably not a good change overall. (E.g. users would lose the ability to extract tuples containing &str, &Bound and Borrowed)

I think as a secondary effect, by using Borrowed here we might be able to change extract_argument.rs to make more use of Borrowed, which IIRC is a possible performance win.

@Icxolu
Copy link
Contributor Author

Icxolu commented Aug 31, 2024

I believe this now also finishes the _bound methods deprecations 🎉

@davidhewitt
Copy link
Member

Fantastic, thank you so much for your help on pushing that over the line. When I get some time I'll try to write up a list of what we might have left for 0.23 (hopefully not much).

@davidhewitt davidhewitt added this pull request to the merge queue Aug 31, 2024
@davidhewitt davidhewitt removed this pull request from the merge queue due to a manual request Aug 31, 2024
@davidhewitt
Copy link
Member

Ah, I just started playing with this locally and found one additional finding: having Borrowed<'a, 'py, T> as the input type means that it's impossible to implement FromPyObject<'a, 'py> for &'a Bound<'py, T> - because the input borrowed can't be cast into a &'a Bound<'py, T> for a sufficient lifetime (pointer goes in, pointer-to-pointer goes out).

Is that a problem? I think not, because that's what .downcast() is for (and it's potentially nice to separate them like this), however just worth a brief double-check that others agree.

@davidhewitt
Copy link
Member

Oh, and one more case which comes up in local testing. At the moment from_py_with extractors take &Bound<'py, PyAny>. Should we be considering changing those to Borrowed too, for consistency?

@davidhewitt
Copy link
Member

Should we be considering changing those to Borrowed too, for consistency?

It feels to me like we might want to consider macro tricks such that users can use both &Bound and Borrowed.

@Icxolu
Copy link
Contributor Author

Icxolu commented Sep 1, 2024

having Borrowed<'a, 'py, T> as the input type means that it's impossible to implement FromPyObject<'a, 'py> for &'a Bound<'py, T>

That's true, but as you mention below, I actually think this a good thing and makes a clearer use case between .extract() and .downcast().

Oh, and one more case which comes up in local testing. At the moment from_py_with extractors take &Bound<'py, PyAny>. Should we be considering changing those to Borrowed too, for consistency?

Hmm, good question. I guess that is part of the question about how prominent we want Borrowed to be. Providing some flexibility via macro magic as you suggested sounds like a good compromise. I can explore that as a followup.

@davidhewitt
Copy link
Member

Yes agreed, it's definitely a follow-up to make changes to expand from_py_with.

I wonder, are there any other options we have here? E.g. does it work to have a trait with both extract and extract,_borroeed, maybe with a default implementation of extract,_borrowed? I suspect that might not work because of the same reason we can't currently have &Bound as an output.

I am still feeling a little uneasy about the choice to commit Borrowed more directly into the face of users. I think it is probably the right choice still, as long as we improve the documentation and API on borrowed. Maybe I will quickly try to get the perf wins out of extract_argument.rs later today to add another data point.

@davidhewitt
Copy link
Member

And (sorry to keep dangling ideas on here), inspired by the new IntoPyObject, I have two further perf related thoughts while we are making breaking changes here:

  • In the past we have noted that the PyResult return type forced conversion of errors to PyErr and made .extract() a performance footgun compared to .downcast(). There is still this problem for FromPyObject for Bound<T>. Should we add a type Error in the same way we are adding for IntoPyObject?
  • It is possible to call .extract() on e.g. Bound<PyString> to get a String, but this comes at the cost of a redundant Python type check because FromPyObject always works on PyAny. Could we add a type Source, and add a way to make Bound<T> directly offer extract() where U: FromPyObject<Source = T>?

Both changes would significantly complicate the trait. I wonder, is there a way that we can support a simple trait like FromPyObject and a more advanced FromPyObjectImpl, which has a blanket from FromPyObject, so users can just implement the simple trait if that's good enough?

All food for thought and I think worth discussing while we're committing to breaking changes here.

@Icxolu
Copy link
Contributor Author

Icxolu commented Sep 1, 2024

And (sorry to keep dangling ideas on here), inspired by the new IntoPyObject, I have two further perf related thoughts while we are making breaking changes here:

No problem 😄

  • In the past we have noted that the PyResult return type forced conversion of errors to PyErr and made .extract() a performance footgun compared to .downcast(). There is still this problem for FromPyObject for Bound<T>. Should we add a type Error in the same way we are adding for IntoPyObject?

If we're willing to do more breaking changes here, that sounds like a good idea to me. The only downsides I can see

  • It is possible to call .extract() on e.g. Bound<PyString> to get a String, but this comes at the cost of a redundant Python type check because FromPyObject always works on PyAny. Could we add a type Source, and add a way to make Bound<T> directly offer extract() where U: FromPyObject<Source = T>?

This would be really cool indeed. I guess we would maybe somehow need to take advantage of method precedence between trait and inherent methods, to make the fall through to PyAny work...

@davidhewitt davidhewitt mentioned this pull request Sep 13, 2024
3 tasks
@Icxolu Icxolu added this to the 0.24 milestone Oct 5, 2024
@davidhewitt
Copy link
Member

If we're breaking FromPyObject, I wonder if we can solve #2888 similar to how we handled bytes in IntoPyObject 🤔

@Icxolu
Copy link
Contributor Author

Icxolu commented Oct 12, 2024

That would make a very nice symmetry. I think if we do it, then it should work for all containers, which would probably mean one hidden method per container (vec, smallvec, arrays), maybe these could be unified with "return position impl trait in trait" (with something like impl Iterator) once MSRV allows. The more challenging thing seems to be to restrict a default implementation in FromPyObject to FromPyObjectOwned...

@Icxolu
Copy link
Contributor Author

Icxolu commented Nov 16, 2024

If we're breaking FromPyObject, I wonder if we can solve #2888 similar to how we handled bytes in IntoPyObject 🤔

I've looked into this again. I think we would (for example) want something like this

trait FromPyObject<'a, 'py>: Sized {
   // snip

    #[doc(hidden)]
    fn extract_array<const N: usize>(
        obj: Borrowed<'_, 'py, PyAny>,
        _: private::Token,
    ) -> PyResult<[Self; N]>
    where
        Self: FromPyObjectOwned<'py>,
    {
        todo!()
    }
}

which we then use inside of extract on the array implementation and overwrite on the u8 implementation.

Unfortunately this currently does not compile. I believe this is the corresponding rustc issue: rust-lang/rust#34979, which refers to the next trait solver as the solution and -Znext-solver does indeed make this work. Unless there is something I have overseen, I think we have to wait with the bytes specialization until -Znext-solver stabilizes.

@davidhewitt
Copy link
Member

Ah, nice research 👍. Agreed we can make a reminder for ourselves to improve bytes specializations once MSRV uses the next solve (gonna be a while 😂).

Given the early reception of IntoPyObject seems to be that it's hard, I'm currently feeling like we would build up some goodwill by being gentler on users for a few releases so am feeling like backing down on my other proposals for advanced optimisations above. We can introduce them incrementally in the future perhaps.

This branch really deserves to be merged asap, so please forgive me the delay, I think this now is highest thing on my to-do perhaps excluding a 0.23.1 patch release. I will try to play around with this branch and see how I feel about the ergonomics & perf, maybe once the kids are in bed tonight.

@Icxolu
Copy link
Contributor Author

Icxolu commented Nov 16, 2024

Given the early reception of IntoPyObject seems to be that it's hard, I'm currently feeling like we would build up some goodwill by being gentler on users for a few releases so am feeling like backing down on my other proposals for advanced optimisations above. We can introduce them incrementally in the future perhaps.

I tend to agree here. We can maybe think about adding the Error type. That seem pretty low hanging and fairly intuitive.

This branch really deserves to be merged asap, so please forgive me the delay,

Absolutely no worries :)

@davidhewitt
Copy link
Member

davidhewitt commented Nov 17, 2024

One idea which struck me, what if we keep FromPyObject with just a single lifetime and &Bound as input, and we instead expose what's currently called FromPyObjectBound as FromPyObjectBorrowed?

The idea would be that here we are currently breaking the main trait to get to two lifetimes, and we are introducing FromPyObjectOwned for simpler trait bounds, which kinda makes sense because that matches serde's DeserializeOwned.

But I think unlike serde's DeserializeOwned, most uses of FromPyObject do not want care about the second lifetime, so maybe it makes sense to keep that to a specialised trait .

This idea might have the nice boost that it's probably less breaking than adding the second lifetime to FromPyObject.

I feel like there's some detail about PyRef and second lifetimes which I might have overlooked with this idea. But it still seems interesting to think about further.

@Icxolu
Copy link
Contributor Author

Icxolu commented Nov 19, 2024

Interesting thought, I have't played around with it yet, but I opened with #4720 based on this including the changes adding the second lifetime to PyRef and PyRefMut (was quite a bit of lifetime reviewing until the borrow checker was happy 😅 )

Maybe we can use that to play around with and judge whats the best option here. My initial feeling says that the one single trait approach might have a bigger hurdle in the beginning but is easier in the long run (less traits, using common patters) but I haven't tried the other approach yet.

@Icxolu
Copy link
Contributor Author

Icxolu commented Nov 23, 2024

I did play around with the idea a bit. It looks like it is possible to stick the PyRef(Mut) changes from #4720 onto it without too many changes. However from a usability perspective it doesn't feel that great IMO. With the two trait approach one has to remember to implement FromPyObject if possible, because it is more general due to the blanket, but for uses in trait bounds FromPyObjectBorrowed should be used (if possible), because it includes more types. So by default one would need two different traits, depending on the context it is used in. I think there is a good chance that users will get this wrong and use a more restrictive bound than necessary (both in impls and trait bounds). I'm curious what you think about this.

@davidhewitt
Copy link
Member

Hmm, interesting. One nice thing about

one has to remember to implement FromPyObject if possible

is that I'd guess most users would choose to implement it anyway as it's the simpler to implement. It's quite appealing to me that the default is simple. Plus it's less breaking than adding the lifetime & using Borrowed as the input.

I also feel like it's probably quite rare to have a case where FromPyObjectBorrowed trait bound is actually critical. At worst case it probably forces an extra clone or reference counting operation if the bound is FromPyObject instead? And it's so hard to soundly get borrowed references from python operations (reading from a tuple is possibly the only way?) that probably most cases where a FromPyObjectBorrowed bound might actually help, it's probably where the caller already has the reference and could probably just use .extract() up front.

I'm quite happy to be wrong here, though my gut feeling is that if cheap-and-cheerful FromPyObject works 99% of the time and FromPyObjectBorrowed is the power tool to use for edge cases which borrow their input python objects then that might be an easier library for users to learn.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants