Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hierarchy of Sized traits #3729

Open
wants to merge 49 commits into
base: master
Choose a base branch
from

Conversation

davidtwco
Copy link
Member

@davidtwco davidtwco commented Nov 15, 2024

All of Rust's types are either sized, which implement the Sized trait and have a statically known size during compilation, or unsized, which do not implement the Sized trait and are assumed to have a size which can be computed at runtime. However, this dichotomy misses two categories of type - types whose size is unknown during compilation but is a runtime constant, and types whose size can never be known. Supporting the former is a prerequisite to stable scalable vector types and supporting the latter is a prerequisite to unblocking extern types. This RFC proposes a hierarchy of Sized traits in order to be able to support these use cases.

This RFC relies on experimental, yet-to-be-RFC'd const traits, so this is blocked on that. I haven't squashed any of the previous revisions but can do so if/when this is approved. Already discussed in the 2024-11-13 t-lang design meeting with feedback incorporated.

See this comment for the most recent summary of changes to this RFC since it was opened.

Rendered

@davidtwco davidtwco added the T-lang Relevant to the language team, which will review and decide on the RFC. label Nov 15, 2024
Co-authored-by: León Orell Valerian Liehr <[email protected]>
text/3729-sized-hierarchy.md Outdated Show resolved Hide resolved
text/3729-sized-hierarchy.md Outdated Show resolved Hide resolved
text/3729-sized-hierarchy.md Outdated Show resolved Hide resolved
text/3729-sized-hierarchy.md Outdated Show resolved Hide resolved
text/3729-sized-hierarchy.md Outdated Show resolved Hide resolved
text/3729-sized-hierarchy.md Outdated Show resolved Hide resolved
text/3729-sized-hierarchy.md Outdated Show resolved Hide resolved
text/3729-sized-hierarchy.md Outdated Show resolved Hide resolved
text/3729-sized-hierarchy.md Outdated Show resolved Hide resolved
@tmandry tmandry added the I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. label Nov 15, 2024
@scottmcm
Copy link
Member

One reason, IIRC, is that it's backwards from how you normally think about traits. We'd generally rather that you write the easy thing, it's minimally-constrained, and if you use something in the body that needs another trait, we'll give you an error message saying that you should add the bound.

Anywhere you'd have to think "did I opt out of those 4 other things that I need to remember to think about?" is a much worse experience. That's why auto traits in libraries might never be stable, for example.

@Aloso
Copy link

Aloso commented Nov 19, 2024

@davidtwco

We can agree to disagree. ?Sized is notoriously confusing for new users, and this been at least part of the motivation for the language team's historical reluctance to add new ?Trait syntax.

If a new user sees T: ?Sized for the first time, they may be confused for a moment, then google it and find the documentation, which explains it.

If a new user sees T: ValueSized for the first time, they will not be confused because it looks familiar. They will not google it, and stay oblivious to the fact that this bound removes the default const Sized bound.

If a new user runs into an error due to a missing ?Sized bound, they see something like

help: consider relaxing the implicit `Sized` restriction
  |
2 |     type Item: ?Sized;
  |              ++++++++

I understand that this is confusing at first, but is this better?

help: consider adding a `ValueSized` bound, which relaxes the implicit `Sized` restriction
  |
2 |     type Item: ValueSized;
  |              ++++++++++++

It requires you to learn about two traits instead of one, and you still find out that Sized is a default bound and needs to be relaxed. The ?Trait syntax is not a problem, people don't struggle to learn Rust because of its syntax. Learning syntax is easy.

I'm only arguing that ?Sized is undesirable as:

  • They're more confusing than my proposed alternative
  • They don't scale very well to constness
  • They don't scale very well to hierarchies

I agree with the second point. I don't agree with the 3rd point: When I see ?Trait and Trait has a sub-trait, it is natural to assume that the sub-trait is relaxed as well. So ?const Sized means Sized, ?Sized means ValueSized, and ?ValueSized means no bounds (since there is no need for the Pointee trait). But a const ValueSized bound would have to be written as ?Sized + const ValueSized.

P.S. I just realized that ?Sized should be equivalent to const ValueSized according to this RFC, which is not as intuitive. Unless ?Trait only relaxes the trait, ?const Trait relaxes only the constness, and ?const ?Trait relaxes both. But this is pretty ugly.

@davidtwco
Copy link
Member Author

If a new user sees T: ?Sized for the first time, they may be confused for a moment, then google it and find the documentation, which explains it.

If a new user sees T: ValueSized for the first time, they will not be confused because it looks familiar. They will not google it, and stay oblivious to the fact that this bound removes the default const Sized bound.

This is conjecture, we have no reason to believe that users will only research unfamiliar syntax like ?Sized, but not unfamiliar traits like ValueSized.

Even if we suppose that your assertion holds and a user sees a parameter with a ValueSized bound and doesn't know what it is and just continues on anyway, they're likely to be able to pass whatever types they'd like to that parameter and not need to think about it. It would only be if they were writing a function, had a ValueSized-bounded parameter and tried to pass it to something like size_of that they'd run into a compilation error. That sounds like an appropriate time for a user to be introduced to that trait and need to understand it.

If a new user runs into an error due to a missing ?Sized bound, they see something like

help: consider relaxing the implicit `Sized` restriction
  |
2 |     type Item: ?Sized;
  |              ++++++++

I understand that this is confusing at first, but is this better?

help: consider adding a `ValueSized` bound, which relaxes the implicit `Sized` restriction
  |
2 |     type Item: ValueSized;
  |              ++++++++++++

These aren't significantly different. I don't believe users would find the former of these approachable and intuitive any more so than the latter.

It requires you to learn about two traits instead of one, and you still find out that Sized is a default bound and needs to be relaxed. The ?Trait syntax is not a problem, people don't struggle to learn Rust because of its syntax. Learning syntax is easy.

I agree that in learning how to relax a default Sized bound users would be introduced to new traits like ValueSized. If we went ahead with this RFC using the alternative that kept the ?Sized syntax, a user is unlikely to want a type unconstrained by all of our sizedness traits due to the limitations these have, so they'll need to add additional bounds using these new traits after using ?Sized.

I don't think it will be especially common, but a user that needs to relax Sized will be introduced to these traits regardless of whether we use ?Sized or what this RFC proposes. If users are going to be introduced to these traits anyway, then if they use ?Sized to opt-out of the default bound or what this RFC proposes is just a matter of syntax, and as you've said, syntax is easy.

Don't get me wrong, adding these traits is adding complexity to the language, but I'd argue that it is essential complexity that reflects the complexity of platforms that Rust targets, rather than incidental complexity.

@ChayimFriedman2
Copy link

There is a point that I don't see discussed here: you discuss what will be the learning effect for new users, but we also need to consider experienced user. Thus will understand both more easily, but it'll be much easier for them to learn and remember the existing ?Trait syntax, since they already know and use it.

And a related point: introducing a different way to name what is essentially the same thing introduces inconsistency to the language.

@davidtwco
Copy link
Member Author

There is a point that I don't see discussed here: you discuss what will be the learning effect for new users, but we also need to consider experienced user. Thus will understand both more easily, but it'll be much easier for them to learn and remember the existing ?Trait syntax, since they already know and use it.

Yeah, that's definitely a downside of this proposal. I think it's worth it on balance, but it's definitely a downside.

And a related point: introducing a different way to name what is essentially the same thing introduces inconsistency to the language.

I think this should be okay as the proposal removes the previous approach over an edition. It won't be entirely gone, it can't be, but it's as good as we can get it.

@cramertj
Copy link
Member

cramertj commented Nov 19, 2024

One other concern is the ability of reviewers to check for backwards-compatibility.

When reviewing a patch which removes a trait bound, I'd generally assume that doing so is relaxing the requirements on the type being bound-- a backwards-compatible change. However, this would be a rare example where removing the bound would be a breaking change, and adding the bound would be the backwards-compatible change. This is unintuitive to me.

Personally, I prefer the T: ?Trait syntax, which I read as "T may not be an instance of Trait." Relevant to this proposal, I'd also assume that T: ?SuperTrait means T: ?Trait, just as T: SubTrait means T: Trait.

@davidtwco
Copy link
Member Author

Personally, I prefer the T: ?Trait syntax, which I read as "T may not be an instance of Trait." Relevant to this proposal, I'd also assume that T: ?SuperTrait means T: ?Trait, just as T: SubTrait means T: Trait.

I discussed this with @traviscross too and added another alternative based on this, it actually ends up really quite clean and I think is a compelling alternative to the positive bounds proposal that the RFC has.

@kpreid
Copy link
Contributor

kpreid commented Nov 19, 2024

If a new user sees T: ValueSized for the first time, they will not be confused because it looks familiar. They will not google it, and stay oblivious to the fact that this bound removes the default const Sized bound.

… the proposal removes the previous approach over an edition.

I agree with the previous comments that it would be undesirable to hide the strangeness of the weakening bound behind a lack of syntax, compared to the status quo. However, I have a suggestion for a third option, if there is going to be an edition change regardless: add a new syntax which is neither a normal bound nor a removal like ?, but a “baseline” bound that nails down where we start.

Let's say the syntax is @Trait (symbol subject to bikeshedding, but we can think of it as “begin @ this point”; it could also perhaps be a contextual keyword). What it would mean is: if no baseline bound is present, the baseline bound is implicitly chosen by the edition — in all current editions, it would be Sized. In future editions, it might be something weaker or stronger. Thus,

  • <T> is “use implicit default bounds from the current edition”.
  • <T: Sized> is “use the union of the implicit baseline and Sized”, thus usually useless as today, but if ValueSized becomes the default over an edition, it strengthens the bound to Sized.
  • <T: ValueSized> is “use the union of the implicit baseline and ValueSized”, so it is useless unless an even weaker bound is made default in a future edition.
  • <T: @Sized> is “the bound is Sized, regardless of the current edition” — it matches the 2015-2024 behavior of <T>.
  • <T: @ValueSized> is “the bound is ValueSized, regardless of the current edition”.
  • <T: @SomeOtherTrait> is an error by default; the @ bound syntax can only be used if:
    • The trait is one of the traits which have ever been an implicit bound (i.e. Sized today, perhaps ValueSized in the future).
    • (Optional, if we want to allow user-defined traits to participate) SomeOtherTrait has an @ bound as a supertrait.
  • <T: ?Sized> has the 2015-2024 meaning forevermore; edition migration should replace it with @ValueSized.
  • <T: ?SomeOtherTrait> does nothing and issues a warning, as today, even if SomeOtherTrait = ValueSized, Pointee, etc — the idea is to migrate away from the ? syntax, not to expand it.

Every type variable always has either an @ explicit baseline bound, or an edition-dependent implicit baseline bound.

The advantages of this schema are:

  • The syntax tells you that something is going on, and the documentation of the trait can tell you what exactly that is.

  • It doesn’t involve any subtraction of bounds.

  • All code that uses an @ bound is now edition-change-proof; it has picked a named baseline and has opted out of all implicit bounds. This simplifies language evolution questions to the separate choices,

    • “is there room to split off a new weaker trait to serve this need?”
    • “what should the implicit baseline bound for the next edition be?”

    and “which of these things can you ? and what does that mean and is it clear?” doesn’t need to be asked. Because the baseline is named, there’s room to add another baseline without saying “this one is the correct one; we definitely got it right this time”.

  • If @ is allowed with user-defined traits (that opt in by having their own @ supertrait bound), then <T: @Foo> is a concise way to express “this is bounded by Foo, Foo’s supertraits, and nothing else”; thus, it simplifies use-cases where one must today write <T: ?Sized + Foo> in every generic parameter.

Caveat: I haven’t thought about how this interacts with const traits. Also, this is certainly adding complexity to the language; it just might be worth it to unblock extern types and thin DSTs while adding room for even more refinements to the language’s default assumptions about types.

[Update: This idea has been crossposted to https://internals.rust-lang.org/t/baseline-bounds-an-extensible-replacement-for-sized/21892 for visibility.]

Copy link

@Skepfyr Skepfyr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm excited to see this RFC, I lost enthusiasm for #3396 because extern types alone didn't feel motivating enough for such an invasive change, so I'm glad to see more motivation. In general I think it's sensible, although I think it skates over a bunch of the issues that #3396 was also struggling with.


Prior to the introduction of `ValueSized` and `Pointee`, `Sized`'s implicit bound
(now a `const Sized` implicit bound) could be removed using the `?Sized` syntax,
which is now equivalent to a `ValueSized` bound in non-`const fn`s and
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is true, specifically Mutex<T: ValueSized> cannot be ValueSized as conceptually it would require locking the mutex in order to call size_of_val on the wrapped type. That feels terrifying and it's currently observable that's not the case because it doesn't deadlock when you call size_of_val while holding the Mutex.

Copy link
Member

@programmerjake programmerjake Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this reminds me of C++ classes where you can figure out the size of the dynamic type by reading the vtable pointer, but, unlike Rust, that requires dereferencing the data part of the pointer to the type, whereas in Rust that info is passed as the pointer's metadata. So maybe we need both ValueSized and PointeeSized where for PointeeSized you can pass any old pointer to the type (since the metadata of a pointer must be valid but the data pointer doesn't need to), but for ValueSized you have to be able to dereference the data pointer, so takes something like &T but without the aliasing guarantees.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is true, specifically Mutex<T: ValueSized> cannot be ValueSized as conceptually it would require locking the mutex in order to call size_of_val on the wrapped type. That feels terrifying and it's currently observable that's not the case because it doesn't deadlock when you call size_of_val while holding the Mutex.

Could you give the full code example of what you have in mind? I want to check that what I have in mind here exactly matches what you do.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tad conceptual as it requires types that don't exist in rust as it exists today, but would exist with something like custom DSTs. Imagine this:

// Magic syntax that means that CStr isn't Sized
struct CStr(..);

impl ValueSized for CStr {
    // Not being proposed here but could exist
    // with custom DSTs.
    fn size_of_val(&self) -> usize {
        let data = self as *const u8;
        // Find first null byte and return length
    }
}

As you can see this type has to inspect the data behind the &self pointer in order to determine its size. (Admittedly I'm not entirely convinced we'd ever want this in rust because it feels like a performance foot gun, but this matches the semantics of ValueSized as described in the RFC)

Now consider size_of_val(&Mutex<CStr>) the only way that function can execute is if the size_of_val implementation for Mutex acquires a lock on the inner data, this is bad and it's observably not the case because this code doesn't deadlock:

let mutex = Mutex::new(7);
let _guard = mutex.lock().unwrap();
size_of_val(&mutex);

The issue is that ValueSized as described doesn't match current rust's rules, where a type is only allowed to access the pointer metadata in order to determine its size. Changing the semantics to be that would mean that CStr couldn't implement ValueSized, meaning that size_of_val(&Mutex<CStr>) would throw a compile error as Mutex<CStr> also wouldn't implement ValueSized.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this interaction with mutexes and size_of_val would be that surprising.

If you have a type that you know requires a value to compute its size then computing the size of a mutex containing it would require the mutex lock itself. If we had reason to believe it was common for generic code that always works today to start deadlocking when instantiated with a ValueSized type, then that would give me pause, but otherwise I don't think this is too bad, it's just the natural interaction of mutexes and value-sized types. That said, I do think that MetaSized is a perfectly acceptable alternative that captures everything that we want from ValueSized (to the best of my knowledge at least).

I've added an alternative that describes how MetaSized could be included in this RFC instead of ValueSized if this were an issue we wanted to avoid.

Copy link
Member

@lukas-code lukas-code Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the mutex example highlights the importance of MetaSized enough if we ever want to allow custom code in size_of_val (such as locking a mutex).

On stable rust it is currently possible to call size_of_val on &UnsafeCell<dyn Trait>, so this must continue to compile.

If we were to define T: ValueSized as "the size of a value of type T can be determined from T and a reference to the value", then there is no way to express the (compiler-builtin) impl of ValueSized for UnsafeCell without MetaSized.

For most normal types this impl would be:

impl const ValueSized for Foo where tail(Foo): ~const ValueSized {}

However, for UnsafeCell this would be incorrect:

impl<T> const ValueSized for UnsafeCell<T> // the size of a value of UnsafeCell<T> can be known from a reference (of type &UnsafeCell<T>) ...
where T: ~const ValueSized // ... if the size of a value of T can be known from a reference (of type &T)
{}

The size of a value of type UnsafeCell<T> cannot be determined by a reference to the UnsafeCell if computing the size from &T requires to run user code, because there is no way to safely obtain a reference to the inner type.

Instead, we need some other trait bound to correctly express under what circumstances we can obtain the size of a value of type UnsafeCell<T> from a reference to the value and maintain backwards compatibility with stable rust -- this trait bound is exactly T: MetaSized!

Here, T: MetaSized would mean "the size of a value of type T can be determined from a T and the metadata of a pointer to the value".

impl<T> const ValueSized for UnsafeCell<T> // the size of a value of UnsafeCell<T> can be known from a reference (of type &UnsafeCell<T>) ...
where T: ~const MetaSized // ... if the size of a value of T can be known from the metadata (of type `<T as ptr::Pointee>::Metadata`)
{}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tbh that is why I think that this RFC needs MetaSized (which imo is a better name than my suggestion), since MetaSized is necessary to represent existing semantics, whereas ValueSized is just nice to have at some point in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added more to this alternative's section based on your UnsafeCell example.

If traits with a `Sized` supertrait are later made const, then their supertrait
would be made `~const Sized`.

An implicit `const ValueSized` bound is added to the `Self` type of traits. Like
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels scary to me, while I agree it does match current behaviour, having most things have an implicit const Sized but traits having an implicit const ValueSized bound feels hard to teach (and remember). It also implies that no existing std traits could be implemented for ValueSized types and below (although maybe it's backwards compatible to relax that bound?).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels scary to me, while I agree it does match current behaviour, having most things have an implicit const Sized but traits having an implicit const ValueSized bound feels hard to teach (and remember).

I'm not concerned about the difference here, parameters have default bounds today, Sized, and traits have no default supertrait (const ValueSized with this RFC). There's always been a difference here, so giving them names isn't much different than the status quo, which isn't too difficult to teach or remember.

It also implies that no existing std traits could be implemented for ValueSized types and below (although maybe it's backwards compatible to relax that bound?).

I don't think it is, the following code would break if there wasn't the implicit const ValueSized or if that were relaxed on an existing trait (such as std::io::Read):

trait Sub: std::io::Read {
    fn example() -> bool { std::mem::needs_drop::<Self>() }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've also added this as an example in the RFC.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree, the difference between having no default supertrait and having one is massive. Not having a default supertrait means that traits naturally apply to all types and library authors don't have to remember (or be asked) to relax that bound where possible (which would be a breaking change).

Not being able to implement any std traits on these types feels like a big problem to me, it would make them a nightmare to use without Debug, Clone, PartialEq, etc. Admittedly, this is probably less of a problem for the vector types as it's less clear (to me at least) what they could implement given they might contain uninitialised bytes. I wonder if it would be possible to relax the bounds on existing traits by introducing explicit bounds on traits with default implementations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree, the difference between having no default supertrait and having one is massive.

I wasn't intending to argue that this limitation wasn't significant, just that I didn't think it was hard to teach or remember.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed another change related to this, I think the implicit bound on Self is necessary but it should be possible to relax it backwards-compatibily.

text/3729-sized-hierarchy.md Show resolved Hide resolved
a `const ValueSized` bound. As the `?Trait` syntax is currently accepted for any trait
but ignored for every trait except `Sized`, `?ValueSized` and `?Pointee` bounds would
be ignored. In the next edition, any uses of `?Sized` syntax will be rewritten to
a `const ValueSized` bound. Any other uses of the `?Trait` syntax will be removed as
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be really nice to backwards compatibly allow Pointee bounds on associated types of existing trait, with Deref being the main example I can think of. I think it is possible to do this by adding implicit T::Assoc: const ValueSized bounds approximately everywhere in previous edition code. I don't know how feasible that actually is though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is possible, there is a future possibility in this RFC that very briefly touches on how we could relax associated types, but I don't think adding implicit T:Assoc: const ValueSized bounds would work, because something like this compiles today..

trait Foo {
    type Bar;
}

fn qux<T: Foo>() -> usize {
    std::mem::size_of::<<T as Foo>::Bar>()
}

..and wouldn't if we added that bound.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that, if we relaxed the bound, it'd work like this:

trait Tr {
    type Ty: ?Sized default Sized;
    //       ~~~~~~~        ~~~~~
    //  The relaxed bound   |
    //                      ^ The backward compatible default for
    //                        use-site bounds.
}

Where at the use site then, if you wrote this...

fn f<T: Tr> {}

...it would be treated according to that default as though you had written:

fn f<T: Tr<Ty: Sized>> {}
//        ~~~~~~~~~~~
//      Added implicitly.

This works because loosening the bound doesn't break implementors. The only reason it breaks callers is because it changes what they can assume by default without explicitly bounding the associated type. But we could conceivably separate what callers can assume by default from the minimum that implementors are allowed to implement.

Then, conceivably, over an edition, we could change the default. Use sites in older edition code would still get the old default. During the edition migration, we would elaborate the code according to that default, which would work in both editions and preserve the semantics.

I asked around about this; unfortunately there are apparently some reasons it could be difficult to implement.


If a user of a runtime-sized type or a `Pointee` type did encounter a bound that
needed to be relaxed, this could be changed in a patch to the relevant crate without
breaking backwards compatibility as-and-when such cases are encountered.
Copy link

@Skepfyr Skepfyr Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is an overly rosy outlook. It may be very annoying for types that aren't const ValueSized to be locked out of implementing traits from other crates as it will make it much harder for them to be used like normal types, also we'd need to teach crate authors to write new code as permissively as possible.
Additionally it's not always going to be possible to relax these bounds without a breaking change, my personally scariest trait is serde's Serializer trait as it's impossible to relax that bound as serializers might rely on it but it prevents these new types from being first class members of the serde ecosystem.

This comment was marked as resolved.

Copy link

@tmccombs tmccombs Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a more self contained example consider a library with the following trait:

trait A {
  fn a<T: ?Sized>(&mut self, x: &T);
}

An impl of that trait in a different crate may call size_of_val, on x so the crate that defines A can't backwards compatibly relax the type of T to Pointee instead of const ValueSized.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And you run into a similar problem with associated types. Consumers of that type might depend on being able to call size_of_val on values of the associated type, so relaxing the bound wouldn't be backwards compatible.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A problematic case of this in std is Deref.

Ideally, I think that the Target type should be relaxed to Pointee, but that wouldn't be backwards compatible, because existing code might depend on being able to call size_of_val on &<T as Deref>::Target.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned here, I think that it is possible to relax associated types in a backwards compatible way, as long as we do it with the edition migration required for this RFC. However, I agree with your minimal example on a situation where I don't think there's any backwards compatible solution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good catch, I didn't realise trait methods would an issue. I don't have a good answer to how to deal with those, but I've noted this in the RFC.

text/3729-sized-hierarchy.md Outdated Show resolved Hide resolved

However, despite not implementing `Sized`, these are value types which should
implement `Copy` and can be returned from functions, can be variables on the
stack, etc. These types should implement `Copy` but given that `Sized` is a
Copy link
Contributor

@petrochenkov petrochenkov Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does stack allocation work for the "runtime constant" types, at binary level? Is it different from dynamic stack allocation?
Does it work for SVE specifically, or it's possible to do for other "runtime constant" types too?

I assume that the "runtime constant" value becomes available somewhere before the call to main?
Or earlier (e.g. during linking), and you need to specify the "runtime env" during the build?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does stack allocation work for the "runtime constant" types, at binary level? Is it different from dynamic stack allocation?
Does it work for SVE specifically, or it's possible to do for other "runtime constant" types too?

I believe it's just "dynamic alloca", I don't recall ever seeing anything in LLVM about "runtime constant" stack frame sizes.
(while searching for examples to add elsewhere in this comment I've found that while LLVM alloca is used, it's more "typed" than I expected, and the stack frame layout is aware of the distinction - cc @nikic, not sure how this will look once alloca Type is gone)

While it would be neat to treat such values (and those derived from them with simple arithmetic) as "relocation-time constants" and have the dynamic linker patch them into e.g. as many instruction immediates as possible, I have recently argued against the desirability of storing vector registers to memory (outside of e.g. context switching and other low-level tasks), so I'm skeptical of the value of relocation-level support.


I assume that the "runtime constant" value becomes available somewhere before the call to main?

In theory, some mechanism like ELF auxv could be used, but AIUI it's just an instruction reading a "special register":

  • ARM has mrs/msr instructions to move between general-purpose and special registers
  • RISC-V more specifically calls them "CSR"s ("Control and Status Register"s)
    • the relevant size_of value for RVV, vlenb, is a read-only CSR
    • (while RVV has other CSRs and instructions, see below why they're not relevant for size_of)

I'd compare it to using x86 cpuid to e.g. determine the SIMD extension with the largest registers (and it similarly is "not supposed to change during execution" AFAICT).

LLVM seems to abstract this concept in its @llvm.vscale.iN intrinsic, documented as:

vscale is a positive value that is constant throughout program execution, but is unknown at compile time.

  • anything more fine-grained than vscale has to go either through target-specific intrinsics or the newer predicated vector intrinsics, which take both a mask and an "EVL" ("Effective Vector Length") and in theory could (eventually) be useful for a core::simd-like abstraction

There's also Linux-specific syscalls/documentation:

  • SVE: https://docs.kernel.org/arch/arm64/sve.html
    • prctl(PR_SVE_GET_VL) as alternative to getting VL through instructions
    • can also set current VL (or defer to execve, i.e. set it for a child process)
    • not expecting changing VL on the fly to be compatible with code using SVE C intrinsics
  • RVV: https://docs.kernel.org/arch/riscv/vector.html
    • less control than SVE, as RVV only has these two lengths:
      • vlenb: the hardwired µarch implementation maximum (which is what non-const size_of would supposedly return, much to my surprise)
      • vl: the dynamic length, meant to be set by intrinsics and auto-vectorized loops on the fly (to remove the need for any scalar loops, as e.g. the last iteration of the vector loop can still work with vl as low as 1)
    • To get the availability of V in an ELF program, please read COMPAT_HWCAP_ISA_V bit of ELF_HWCAP in the auxiliary vector.

      • so there is something available through ELF auxv, but it's just extension bits (and ARM has them too)

Out of curiosity, I've also looked at the ELF auxv HWCAP for x86 and AFAICT it's cpuid(eax=1).edx which sadly only goes up to SSE2 in terms of extension bits (newer ones being in ECX), and HWCAP2 is a couple of kernel-controlled features.


Also, while looking up the LLVM-specific examples to link above, I came across SME/SVE2 "streaming mode" which is a whole new can of worms, and I hope doesn't require even stranger types.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(while searching for examples to add elsewhere in this comment I've found that while LLVM alloca is used, it's more "typed" than I expected, and the stack frame layout is aware of the distinction - cc @nikic, not sure how this will look once alloca Type is gone)

I'd expect that alloca would accept a TypeSize, which is either a fixed size or a size multiplied by vscale. I think that would cover it?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This thread has made me much more sceptical that these types provide good motivation for this feature. It's interesting that externref style types have similar requirements: they need to be Copy, they can be passed as arguments, returned as results, used as local variables, etc, but importantly they aren't Sized (or even ValueSized)... Similarly, references to both types are weird because they aren't pointers (or at least shouldn't be in the case of vector types as I think @eddyb is suggesting).

I haven't yet come up with a productive solution though...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, references to both types are weird because they aren't pointers (or at least shouldn't be in the case of vector types as I think @eddyb is suggesting).

As I understand it, scalable vectors can be used behind pointers, only externref style types cannot.

@davidtwco
Copy link
Member Author

davidtwco commented Nov 25, 2024

For those following along or catching up, these are the notable the changes to the RFC since this was posted:

  • Clarify proposed behaviour for ?Trait syntax for non-Sized, which is currently accepted
  • Stop re-using std::ptr::Pointee and make Pointee its own new marker trait to avoid backwards incompatibility
  • Clarify backwards compatibility implications of ?Sized syntax and add alternatives to removing the default bound using positive bounds which continue to use ?Sized
  • Add that relaxing existing bounds in trait methods would be backwards incompatible
  • Elaborate on necessity of implicit const ValueSized bound on Self type of traits
  • Add MetaSized alternative to ValueSized which would resolve interactions with mutexes
  • Clarified that bounds on return types can never be relaxed.

And these are all the other smaller changes that don't materially impact what is being proposed:

  • Fixed some minor wording errors where supertrait/subtrait were used backwards
  • Removed HackMD's rust= syntax from codeblocks
  • Fixed referring to the introduction of a const Sized trait, but rather adding a const modifier to the existing Sized trait
  • Added some background/context on dynamic stack allocation
  • Use current experimental const trait syntax
  • Corrected incorrect syntax for traits
  • Listed all alternate bounds (adding ~const ValueSized to a list of bounds that it was missing from)
  • Fixed bound in description of size_of_val changes
  • Corrected description of current size_of_val and align_of_val behaviour
  • Corrected description of extern type usage in structs
  • Mention Rust for Linux's interest in extern type
  • Weaken language in the externref future possibility to make it clear this proposal would not be sufficient on its own to support these
  • Re-write Aligned future possibility so that it is clear Aligned couldn't be added to the proposed hierarchy

I've yet to respond to and/or incorporate the following comments, but will be working on those this week:

At the moment, I prefer the following alternatives to the primary proposal of the RFC, and may re-write to incorporate these as the primary proposal:

custom DSTs on top of this RFC. None of these have been considered thoroughly, and are
written here only to illustrate.

- Allow `Pointee` to be implemented manually on user types, which would replace
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Allow `Pointee` to be implemented manually on user types, which would replace
- Allow `ptr::Pointee` to be implemented manually on user types, which would replace

Presumably this refers to the existing Pointee and not the new one from this RFC, with the idea being that one can define a custom metadata for their custom DST?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-lang-nominated Indicates that an issue has been nominated for prioritizing at the next lang team meeting. T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.