-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unsafe fields #3458
base: master
Are you sure you want to change the base?
Unsafe fields #3458
Conversation
|
||
// Unsafe field initialization requires an `unsafe` block. | ||
// Safety: `unsafe_field` is odd. | ||
let mut foo = unsafe { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure I like that the entire struct expression is now inside an unsafe block (though I’m not sure what a better syntax would be). If the safety invariant requires the entire struct, this makes sense. However, if it is more specific, a larger unsafe block is too broad as it also allows unsafe code usage to initialise the safe fields, which should get their own unsafe blocks. Though perhaps this could just be linted against, e.g. don’t use unsafe expressions in a struct initialiser without putting them inside nested unsafe blocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The alternative is to do this:
let mut foo = Foo {
safe_field: 0,
unsafe_field: unsafe { 1 },
};
That feels worse to me.
Presumably the safety invariant being promised is an invariant within the struct, even if it is not strictly required that that be the case.
Ideally we'd also have partial initialization, such that it would be possible to do
let mut foo = Foo { safe_field: 0 };
unsafe { foo.unsafe_field = 1; }
but I expect that's quite a bit farther away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that just unsafe in one field initialiser expression is insufficient as it means something very different. I do like however that it clearly shows which field the unsafety applies to.
It reminds me a bit of unsafe blocks in unsafe functions (#2585). Ideally, the fact that struct init is unsafe would not allow unsafe field init expressions. I doubt that special-casing struct inits to not propagate outer unsafe blocks would be backwards compatible, so a new lint would be the only avenue in this direction.
Some new syntax like this
let mut foo = unsafe Foo {
safe_field: 0,
unsafe unsafe_field: 1,
};
would communicate intent better but looks quite unnatural to me.
Perhaps in the future there might be explicit safe blocks to reassert a safe context within an unsafe block, so that it could be written as follows:
let mut foo = unsafe {
Foo {
safe_field: safe { /* some more complex expr here */ },
unsafe_field: safe { /* some more complex expr here */ },
}
};
which could be encouraged with clippy lints. Though for now just moving safe non-trivial struct field initialisers into variables that are initialised outside the unsafe block.
Overall, I think going with the original syntax of
let mut foo = unsafe {
Foo {
safe_field: 2,
unsafe_field: 1,
}
};
looks like a good and intuitive solution, though I still think that a lint against using unsafe expressions inside the struct expression without a nested unsafe block would still help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this?
let mut foo = unsafe Foo {
safe_field: 0,
unsafe_field: 1,
};
unsafe
would be followed immediately by a struct expression, and doing so would only indicate that unsafe fields may be initialized. It would not introduce an unsafe context.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like it - the syntax is still close enough to the unsafe block syntax (except when initialising tuple structs) that it's relatively intuitive what the unsafety applies to, but coupled strongly to the struct name so that it also makes sense why no unsafe context is introduced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's something nice about
let mut foo = unsafe Foo {
safe_field: 0,
unsafe_field: 1,
};
but to me that would go with unsafe struct Foo
, like how unsafe trait Bar
leads to unsafe impl Bar
.
Spitballing: if per-field safety is really needed, then maybe
let mut foo = Foo {
safe_field: 0,
unsafe unsafe_field: 1,
};
though maybe the answer is the same as for unsafe { unsafe_fn_call(complicated_expr) }
: if you don't want the expr in the block, use a let
.
After all, there's always field shorthand available, so you can do
let safe_field = unsafe { complex to construct };
let unsafe_field = easy and safe to construct;
unsafe { Foo { safe_field, unsafe_field } }
Perhaps the more important factor would be the workflow for the errors the programmer gets when changing a (internal and thus not semver break) field to unsafe
. Anything at the struct level wouldn't give new "hey, you need unsafe
here" if you already had another field that was unsafe.
Of course, if the safety is at the struct level (not the field level) then that doesn't come up.
Hmm, the "you added an additional requirement to something that's already in an unsafe block and there's no way to help you make sure you handed that" is a pre-existing problem that needs tooling for that everywhere, so maybe it's not something this RFC needs to think about.
If we had unsafe(aligned(a), nonzero(b)) { ... }
so that tooling could help ensure that people thought about everything declared in the safety section, then we'd just have unsafe(initialized(ptr, len)) { ... }
so that it acknowledged the discharge of the obligation for the type invariant, and the "I need 100 unsafe blocks" problem goes away.
I think the motivation could point out who would benefit from this. I assume it's library authors a making unsafe fields a "reminder to self" about upholding some invariant as opposed to say expecting a unsafe field in an API surface. I.e we still expect |
this seems closely related to |
Not necessarily. It is entirely reasonable that
Related, sure, but beyond the mention in unresolved questions, I'm not sure how it could be mentioned. Pretty much the only overlap is what's considered a "mutable access", which I didn't feel necessary to be copy-pasted. |
How does this look with functional struct update?
? Edit: or I guess it doesn't need unsafe if the source had been initialized with unsafe. |
@jsgf Great question. I don't have an immediate answer, though I believe the mechanism currently in the compiler would require the entire initializer to be unsafe. |
I love it! This is much better than getters and setters, both for library authors and users. typed-builder would have to adjust, but I would love to implement this feature there :) @idanarye (the main author) Maybe you have some input regarding the builder pattern? |
The implicit notion that only mutation is unsafe (and reading is not) seems tricky. Do you have reasoning to prove that we'll never need fields that are unsafe to read? Maybe there should be an alternative syntax proposal (like |
@mo8it I don't want to spam the comments here with a discussion about typed builder, so I've opened a discussion in my repository instead: idanarye/rust-typed-builder#103 |
Regarding the RFC itself - I think you are trying to solve a visibility issue with the safety mechanism, which is the wrong tool for the job. You gave But why? I mean, it's obvious why you don't want to give regular write access to I'm aware of the Another thing - conceptually it never makes sense to define only one field as let mut vec = Vec::new();
vec.len = 4; // UNSAFE!!! The why not this? let mut vec = Vec::from([1, 2, 3, 4]);
vec.buf = RawVec::new(); // perfectly safe apparently Yes, Whatever the semantics of unsafe fields will be - conceptually it makes more sense to put the |
## "Mutable use" in the compiler | ||
|
||
The concept of a "mutable use" [already exists][mutating use] within the compiler. This catches all | ||
situations that are relevant here, including `ptr::addr_of_mut!`, `&mut`, and direct assignment to a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One could argue that ptr::addr_of_mut!
on an unsafe
field need not be unsafe, because writes through the pointer are unsafe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider this in conjunction with the examples present in the RFC.
fn make_zero(p: *mut u8) {
unsafe { *p = 0; }
}
let p = ptr::addr_of_mut!(foo.unsafe_field);
make_zero(p);
Ignoring thread-safety, which can be easily achieved with locks, no single step appears wrong. make_zero
does not do anything wrong — assigning zero to an arbitrary *mut u8
is fine. Passing a pointer to the method is naturally okay. Yet it still results in unsoundness, as foo.unsafe_field
must be odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"assigning zero to an arbitrary *mut u8
is fine" what‽ No it is not fine‽ An arbitrary *mut u8
could be null, dangling, aliased... fn make_zero
is unsound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True — I typed that far too quickly and without thinking. Regardless, it's not immediately obvious to me that ptr::addr_of_mut!
should be allowed safely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not fully convinced either way, but I fear it would just be confusing to require unsafe
for an operation that can't lead to unsoundness, especially as addr_of!(struct.field) as *mut _
would do the same thing with no unsafe
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
especially as
addr_of!(struct.field) as *mut _
would do the same thing
While you can do that, it's undefined behavior to actually mutate the resulting mut pointer. I've just confirmed this with miri.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's UB under Stacked Borrows, but MIRIFLAGS=-Zmiri-tree-borrows
accepts it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's surprising. I'm not familiar with tree borrows, admittedly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ptr::from_mut(&mut struct).wrapping_offset(offset_of!(Struct, field))
should achieve the same thing as addr_of_mut!
, and I doubt you'd want to make offset_of
unsafe for unsafe fields. And even without offset_of
, you could use ptr::from_mut(&mut struct).wrapping_offset(addr_of!(stuct.Field) - ptr::from_ref(&struct))
. Both of these should be safe in every borrowing model.
Ain't clear this proposal handles auto-traits correctly, even if some use case exists. If you need this, then define your own type which provides this. We've the inner-builder or whatever deref polymorphism pattern, which comes up far more in practice, and can simulate this of desired.
|
How could a field of a struct be unsafe to read?
Within the module it's defined in (as the field is private), it is currently safe to assign any value, despite the fact that it can lead to undefined behavior. Said another way, the current behavior is inherently unsound. Nothing in the RFC so much as hints at
I never claimed that was the case.
Inclusion of one example does not mean that everything not included is forbidden. There is simply no reason to repeat the same thing for every field. Of course
For I have real world code where this is not the case. The
I don't follow. What problems do you see with handling auto traits? The type of the field is unchanged, so there would be no impact on auto traits. |
@@ -0,0 +1,137 @@ | |||
- Feature Name: `unsafe_fields` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The appears to be missing the https://github.com/rust-lang/rfcs/blob/master/0000-template.md#rationale-and-alternatives section. Please add one, with a bunch of subsections for various "here's why I picked this way over some other way" decisions you made.
For example, that's a great place to talk about unsafe structs vs unsafe fields.
(@jhpratt I'd love for you to steal something like the text in this post for the RFC)
A huge incentive for it, to me at least, is helping avoid misunderstandings about the model. For example, a 2020 PACMPL paper contains the following statement:
Thus a huge win of this would be to avoid the (from the same paper)
Having the body of Especially combined with other accepted work like https://rust-lang.github.io/rfcs/2316-safe-unsafe-trait-methods.html we could start even doing things like clippy lints for "why is this unsafe when it doesn't do anything unsafe? Should one of the types involved be marked unsafe?" If we don't have to link to tootsie pops as often because changing things that are relied on by As another way to look at this, it's weird that when I'm writing a method on by type with a safety invariant that I can do Tidy should nag about a safety comment for the constructor too, so that I'm not disincentivized to use the other, correctly-marked- |
We do not need In other words, rust does not have unsafe types because you must enforce invariants at module boundaries anyways. Also various discussions in https://github.com/rust-lang/unsafe-code-guidelines clarify this point. A method like Anyways.. If I understand, you want this type:
It's similar to You still enforce invariants by visibility but
Why? All those Anecdotally, this type winds up being much more common:
And In fact, if you wrap the We made this a local type for visibility modifiers, so a language level construct helps here. Also conversely, if you do not require visibility modifiers then simple types like I suppose one might imagine |
```rust | ||
fn change_unsafe_field(foo: &mut Foo) { | ||
// Safety: An odd number plus two remains an odd number. | ||
unsafe { foo.unsafe_field += 2; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: include a section talking about how both of the following are reasonable types:
/// This type has a *correctness* invariant that it holds an even number,
/// but it's not something on which `unsafe` code is allowed to rely.
pub struct Even(u32);
impl Even {
pub fn new(x: u32) -> Option<Self>;
pub fn new_i_promise(x: u32) -> Self;
pub fn get_value(&self) -> u32;
}
/// # Safety
///
/// The value of this type must always be an even number.
/// (`unsafe` code is allowed to rely on that fact.)
pub unsafe struct Even(u32);
impl Even {
pub fn new(x: u32) -> Option<Self>;
pub unsafe fn new_unchecked(x: u32) -> Self;
pub fn get_value(&self) -> u32;
}
and it's up to the type author to decide which is appropriate for the expected uses of the type.
EDIT: see below; it looks like the thing I was worried about here is probably impossible for other reasons. I do thing that "safe to read; unsafe to modify" is the 99%+ case, and should certainly be the default, but
Well, the field in Or the So perhaps some nuance for |
I don't think that is correct? The unsafe operation is dereferencing the pointer returned by |
Ah, I guess an access can't actually read an So I think the |
The nomicon describes current behavior; using it as an argument against this RFC is counter to the purpose of the RFC. It's circular reasoning at best.
The invariants are "merely" there for soundness. If the invariant is violated, the result is undefined behavior. That's far more serious than you make it sound.
I have never seen anyone write code like this in practice. The standard library and my own code (in @scottmcm I'll definitely include parts of that into the RFC. Also reading that blog post now — I'd never seen it before. |
This RFC is not better because memory safety also helps when writing unsafe code. The In other words, we always convert regular code invariants into memory safety assurance, but these regular code invariants have exactly the same risks as other regular code, including their own memory safety concerns. This RFC confuses the memory safety consumed in maintaining the regular code invariant with the actually unsafe options the code requires. In the past,
Anyways.. I think this discussion belongs in https://github.com/rust-lang/unsafe-code-guidelines where at least some people think formally about the unsafe code boundary. |
This is your fundamental misunderstanding. Other code in
Frankly, it's thinking like this that led to the creation of Rust. Thread safety can be maintained if everyone is super careful, but we all know how that works out. Likewise with a million other things. Programmers can not be relied upon to do the right thing. We have to force them to do it by leveraging the compiler wherever possible.
I have no idea why you think the discussion belongs there, particularly as you're the one that initiated it here. |
Yes, but this does not make altering What happens if I've unsafe code which relies upon an invariant between the values in a slice, so your unsafe field is a We often have this "catch your tail" phenomenon in formalalisms.
No. There is a formal model from the rustbelt project about how unsafe works, which gets discussed in the unsafe code guidlines repo. We should only expand what falls under unsafe code if the formal model says so. This RFC confuses those really important formal models with mere "read this" markers. In brief, you don't have a formal conception of when unsafe types should be used. It's unlikely one exists. That makes this change a regression towards the C days. |
It actually is. If something can result in undefined behavior, it is unsafe. Where that happens is wholly irrelevant. You're using circular logic regarding the UCG, which is by definition written about Rust as it currently is. You seem to be claiming that library undefined behavior doesn't exist, which is bizarre. I'll leave it at that as there's clearly no convincing you that the mere concept of an unsafe field has merit. I won't be responding further to anything along those lines. |
Could you add another motivating example to the RFC? It seems like the |
@tgross35 I linked this earlier in the thread. Is that sufficient for you? It demonstrates that field-level safety is appropriate, as some of the fields are perfectly safe to set as they have no interaction with any other field. Yet at the same time, all fields are one logical unit that should not be split into separate structs. |
Thank you for the link, I meant specifically to add an example to the RFC document (which perhaps you planned to do anyway). Even with that example, it does seem to me that it would be more correct to nest
Niche optimization was mentioned as well as a reason for not wanting to split off types, but I think this is more applicable to all separations of logic, and not just those with safety concerns. I don't wish to derail this conversation, but the thought has crossed my mind before about how Rust could potentially use a I think that investigating something like that may be more widely useful than using niche optimization as a justification for why a single flat struct is the correct solution. (edit: I brought this up for some discussion on Zulip https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/.60.23.5Bflatten.5D.60.20struct.20field.20attribute) |
As you're replying by email, I'll un-edit my above comment and paste it here. There exist alternatives that avoid "unsafe creap" while still doing enforcement in rustc: Idea 1. Explicit rolls An unsafe field can only be written from special We'd permit In future, we could add In this, In this way, you can mark as much code as you like Idea 2. Implicit rolls If people dislike the verbosity of
You could lint against unsafe blocks having multiple inferred rolls too. |
That sounds quite similar to ideas around more explicitly discharging proof obligations, which have been floating around for a while. As I already said in my previous message: this is not a new problem. Calling more than one unsafe fn in an unsafe block happens regularly. I don't think unsafe fields make this so much worse that we have to block them on a more fine-grained unsafety tracking mechanism.
|
Alright, thanks for clarifying everything. |
I concur with Ralf and Mario that the rule-of-thumb that "any function that is declared unsafe ought to have an unsafe op in its body" is a very reasonable one. As an unsafe code author I rely on unsafe warnings to flag potentially dangerous operations for me -- and indeed to remind me where such operations live in the code when re-reading after an absence.
…On Fri, Jun 7, 2024, at 3:06 AM, Jeff Burdges wrote:
> That "linter" may as well be the programmer themselves, trying to figure out what is unsafe an where, placing SAFETY comments.
>
Alright, but adding unsafe blocks incurs some cost: You must check that only the desired `unsafe fn`s get invoked. If some type has both `Vec` and an unsafe field, then you've lost any extra warnings when calling `Vec::set_let` while accessing the unsafe field.
If the linter is the programmer, then it's more constrained to simply append `_unsafe` to the field name. This avoids "unsafe creep" where more code becomes unsafe, while exposing when unsafe fields get accessed.
An alternative that'd avoid "unsafe creap" while still doing enforcement in rustc:
An unsafe field can only be written from special `unsafe(TypeName) { .. }` blocks, but these "roll focused" unsafe blocks disallow regular unsafe operations. A regular `unsafe { .. }` block cannot directly write to unsafe fields.
We'd permit `unsafe(TypeName) fn` too, even if `TypeName` is a ZST or lacks unsafe fields, so `unsafe fn`s can be roll focused too, with their rolls again being multuly exclusive to regular unsafe blocks.
In future, we could add `unsafe(TypeName1,TypeName2,..,TypeNamen,number) { .. }` blocks that permit exactly `number` regular unsafe operations, and any number of roll focused unsafe operations for `TypeName1,TypeName2,..,TypeNamen`.
In this, `TypeName` is the identifier after `struct` or `union`, not a fully qualified type. Alternatively, rolls could be specified like `unsafe(module) { .. }` where `module` is the module where `TypeName` gets declared.
In this way, you can mark as much code as you like `unsafe` for rolls you create, without fearing that regular `unssafe fn`s get invoked more often.
—
Reply to this email directly, view it on GitHub <#3458 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABF4ZQHAIMCJT4O3ZX5VGLZGFLXXAVCNFSM6AAAAAA2ILY666VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNJUGIZDIMZQG4>.
You are receiving this because you commented.Message ID: ***@***.***>
|
Putting it that way does make it sound more overtly appealing. If nothing else, it should discourage people trying to diverge from the " (EDIT: Sorry to any people following via e-mail for accidentally hitting the wrong Enter key combo prematurely.) |
IIRC there's even been academic papers that count "unsafe functions that don't do anything unsafe", despite that being a mostly-meaningless metric today (as demonstrated by So agreed, big fan of an obvious "this is why it's unsafe" you can point at locally inside the |
Should implementing/deriving I think types containing unsafe fields should require |
|
@CodesInChaos There is no
Meaning the So you need to implement impl Clone for X {
fn clone(&self) -> Self {
*self
}
}
impl Copy for X {} I think copying a new value impl Clone for X {
fn clone(&self) -> Self {
// SAFETY: we determined the unsafe field is safe to copy.
unsafe { *self }
}
}
impl Copy for X {} But then elsewhere It seems the best is just prohibit |
@kennytm I don't think Implementing @idanarye Copying raw pointers is safe. It's only safety critical if you use them somewhere where they're assumed valid. Since such assumptions come with an safety critical invariant, they should be |
So…I'm conflicted. Yes, initialization is included, but it is already known that the data contained in the struct is valid. The situation that is motivation for this RFC is when a field has its own invariant and/or relies on another field in the struct. In my head, I view the Is there a plausible situation where the safety of a field would rely on external factors to a struct? If not, I'll need to figure out a way to rework the RFC to make |
We're talking about safety invariants here, not validity invariants. |
That is true. In any situation, I feel that there must be a way to implement |
I think requiring Consider |
The problem is that (last I checked) you can't |
Yes, that rule would need a special case for
Ibelieve |
There are more traits that just |
I think it's limited to The only similar issue I see are unsafe auto-traits ( edit: Thinking about it a bit more, it applies to traits which meet all three of these conditions:
|
Okay the
You can't #[derive(Clone)]
struct MyRange(std::ops::Range<u32>);
impl Copy for MyRange {}
//~^ error[E0204]: the trait `Copy` cannot be implemented for this type IMO "prohibiting" while restrictive at least has a precedent. While |
Say I have a Naturally, I should not But I can This problem I described already applied to current Rust, without unsafe fields. My point here is that this is basically the same problem as implementing |
Oh wow, I had never realized that... but it makes sense, the trait system cares about coherence which works per-crate, and there's no visibility question arising here. Is there any chance we can fix |
I think in a green field design, @kennytm Types which have unsafe invariants but don't own heap allocated data can often can be @idanarye One of the main selling points of unsafe fields is that it avoids relying on visibility for safety. It reduces the scope to unsafe code, and code that unsafe code relies on. See withoutboats's post. |
Hey @rust-lang/lang, I'd like to nominate this RFC for experimental implementation. We've had lots of recent discussion on Zulip culminating in a living design doc and a proposed project goal. At this point, we'd most benefit from upstreaming an experimental implementation that we can all play with and get a feel for. |
Speaking for myself: enthusiastic support, I'd love to see this experiment. |
Draft implementation of unsafe-fields. RFC: rust-lang/rfcs#3458 Tracking: - rust-lang#132922 r? jswrenn
Fixes #381
Rendered
Tracking: