The implementation of `fd_filestat_get` in wasi-preview1-component-adapter is unsound #8956

whitequark · 2024-07-14T20:24:08Z

The implementation uses the value of metadataHash as st_ino:

wasmtime/crates/wasi-preview1-component-adapter/src/lib.rs

Lines 950 to 951 in 225d20e

    
           dev: 1, 
        
           ino: metadata_hash.lower,

This is unsound. To quote the documentation for metadataHash:

Return a hash of the metadata associated with a filesystem object referred to by a descriptor.

This returns a hash of the last-modification timestamp and file size, and may also include the inode number, device number, birth timestamp, and other metadata fields that may change when the file is modified or replaced. It may also include a secret value chosen by the implementation and not otherwise exposed.

Implementations are encourated to provide the following properties:

If the file is not modified or replaced, the computed hash value should usually not change.

If the object is modified or replaced, the computed hash value should usually change.

The inputs to the hash should not be easily computable from the computed hash.

However, none of these is required.

Applications will commonly use st1.st_ino == st2.st_ino in order to determine if two files are the same (for example, Clang does it). My implementation of metadataHash, which always returns 0 and is compliant with the definition above, caused the combination of Clang and the wasip1 component adapter, to always treat all read files as the same file (which in practice meant the first #include caused an infinite loop).

The text was updated successfully, but these errors were encountered:

whitequark · 2024-07-14T20:28:43Z

It's unclear to me how to provide the desired semantics (st1.st_ino == st2.st_ino iff isSameObject(f1, f2)), or, in fact, any reasonable semantics that doesn't make private assumptions about the exact value of metadataHash (i.e. any sound semantics). It's pretty late though so I may be missing something obvious.

whitequark · 2024-07-14T20:39:24Z

On reflection, I think the intended behavior here is to maintain a mapping from metadataHash to "potential files" (i.e. both files corresponding to file descriptors, and files that could be opened from a directory descriptor) that have previously been stat'd or opened, and to assign each "potential file" a fresh inode value.

This would still be problematic, in a subtler way: nothing in the documentation for metadataHash implies that always returning 0 may lead to undesirable performance implications down the line. Normally, we treat quadratic behavior arising in hash tables as a security issue in cases like this! So clearly either this implementation also doesn't work, or the contract for metadataHash must be amended.

whitequark · 2024-07-14T21:12:29Z

On more reflection, since metadataHash changes if the file is modified, it can't be used to index into such a table, so we're back to fd_filestat_get can't be sound.

How was this supposed to be implemented?

alexcrichton · 2024-07-15T16:07:50Z

I can link WebAssembly/wasi-filesystem#81 as more possible background on this, but I don't have any extra info beyond that.

pchickey · 2024-07-15T17:44:10Z

Like Alex, the only understanding I have of this topic is from PR 81 on the spec. It is quite possible that both the spec and our implementation are questionable, and that you are the first person to encounter problems with it in the wild. I don't have the expertise or time to devote to really digging into what the best path forward here is right now. Dan is out on leave right now, so I'm standing in as a wasi-filesystem champion in his absence, and I will defer to your judgement for changes to the spec and/or implementation.

whitequark · 2024-07-15T17:51:07Z

Thank you. I've been feeling not the best lately, and this has directly impacted my ability to reason on really intricate topics like this one. I'll keep this issue on my radar and try to work out a way forward that works for everyone involved. I do not yet have any firm conclusions and I'll need a deeper understanding of the topic and more consultation with others before I'll have them.

Something that I noticed seems to have been an omission is the question of polyfilling the APIs. In one direction it's straightforward: wasip2 can be done on top of wasip1 easily. But it seems like nobody has drilled in detail into how to emulate wasip1 on top of wasip2, and given the state of the ecosystem (LLVM only directly knowing how to emit wasip1 binaries with wasip2 requiring a post-processing step) I think that's actually the more important direction?

@pchickey Do you think there is possibility or appetite for a sort of "sidecar" specification that builds on top of wasip2 in order to provide a sound polyfill for fd_filestat_get where if it's not available then st_dev/st_ino end up having zeroes in them? Or should I not consider that as a possible direction?

alexcrichton · 2024-07-15T20:40:06Z

Oh that's actually the most important direction for us, too, implementing p1 on top of p2. The crate you linked in the OP is the current canonical source for implementing p1 on top of p2 within wasm. There's a host-side implementation at https://github.com/bytecodealliance/wasmtime/blob/main/crates/wasi/src/preview1.rs as well where the p1 APIs are all defined primarily in terms of their p2 counterparts.

Modifying the adapter (wasi_snapshot_preview1.wasm) is quite difficult, however, as it's written in "Rust" but that's in quotes because you can't do a lot you can do in normal Rust (e.g. use a hash map). Allocations are very carefully managed and such, so doing something fancy with std_{dev,ino} when metadata_hash is zero is probably going to be difficult.

pchickey · 2024-07-15T20:46:49Z

Agreed with Alex: implementing p1 on top of p2 is an important direction. There are some places we are choosing to do that with low fidelity (the whole rights system in p1 is a mess and we want to leave it behind as much as possible, but it ends up mattering for corner cases where wasi-libc ended up depending on it) but I think this case matters a whole lot more.

As for the incompleteness of the wasip2 support in wasi-libc, so that LLVM could emit a purely wasip2 binary, thats something thats "just" gated on someone who can work on wasi-libc having the time to do so, which is why it has stalled the last few months. We do consider that important to get done at some point, Alex has been filling in some related missing pieces recently, mostly focusing on the parts the Rust toolchain will need.

whitequark · 2024-07-15T23:44:24Z

Allocations are very carefully managed and such, so doing something fancy with std_{dev,ino} when metadata_hash is zero is probably going to be difficult.

In this case, given how p1 and p2 are defined, there must be a spec change to maintain soundness. For example there could be an additional wit interface that returns an arbitrary 128-bit unique identifier for a descriptor or a descriptor based path that is then stuffed into st_dev/st_ino.

whitequark · 2024-07-16T12:06:11Z

I think a backstop solution that prevents the most egregious consequences (e.g. Clang considering every input file as the same) is to use a monotonically increasing global counter for filling in st_dev/st_ino. I can't think of any application where this would fail as badly, though of course this is still not a reasonable behavior for stat to exhibit. It's possible though that the reason I can't think of them is because my experience is limited still.

alexcrichton · 2024-07-16T14:39:37Z

Could you perhaps hash the filename in the meantime? And pretend that each filename is its own unique dev/ino?

whitequark · 2024-07-16T14:44:38Z

What about fstat?

I guess st_dev would be the fd and st_ino would be the hash of a filename, if any, opened off that fd.

alexcrichton · 2024-07-16T14:47:22Z

Ah sorry I'm assuming that the structure you're maintaining on the host corresponding to the file would contain the hashed filename or the full filename or something like that. If all you have is the fd and nothing else I don't know what would be done.

whitequark · 2024-07-16T14:52:26Z

Oh there is a misunderstanding. What I was considering is the ability to solve this issue in the preview1 component adapter for an arbitrary host. In my particular host I have basically added fake but consistent inodes to everything, I return them as the metadata hash lower (in accordance with what the adapter currently does) and my code works fine now; it's not a blocker for me personally, I'm raising this because it is an ecosystem health issue.

alexcrichton · 2024-07-16T14:57:15Z

Ah makes sense. I would personally come to the same conclusion as you then that there's no way to actually "fix" things without changing the standard and/or docs there. Given the weak guarantees of metadata-hash and the strong guarantees of st_{dev,ino} I don't know how the adapter can work.

One possibility would be to add something which indicates whether metadata-hash is "strong", e.g. is suitable as-is for st_{dev,ino}. In such a situation the adapter could set dev/ino based on metadata-hash. If the "strong" bit is false then the hash of the filename plus whatever metadata-hash is could perhaps be used for dev/ino. That's just a guess though and is still sort of situation solving this. In any case this is probably best left for discussion on wasi-filesystem itself at this point.

whitequark · 2024-07-16T15:00:22Z

I agree, which one of us should kick off that discussion?

alexcrichton · 2024-07-16T17:40:51Z

If you wouldn't mind I'd defer to you as I fear I wouldn't have enough context on this

cfallin · 2024-07-16T18:01:17Z

Potentially bad idea, but I'm curious if it spawns other thinking: what if the inode were synthesized from an actual hash of file contents?

Basically, imagine the filesystem like a content-addressed store (e.g. git). The inode is then just the hash that addresses some content.

This certainly satisfies the "check the inode and timestamp first to see if it's the same file contents" use-case; it's no worse than what one would have to do manually to verify same file contents in the absence of an inode/timestamp, i.e., actually read the data.

It's also conceptually "minimal" in the sense that it doesn't require any properties or guarantees from the p2 API.

The main downside is that it's slow: it does not satisfy what the user expects to be an O(1)-ish metadata check. This is especially an issue since the inode number must be provided as the result of a stat operation (i.e., is not requested separately). Maybe that alone is enough to throw out this idea. But maybe not?

bjorn3 · 2024-07-16T18:03:29Z

what if the inode were synthesized from an actual hash of file contents?

That could confuse tar into thinking two files are hardlinks to each other and thus turn them into actual hard links when unpacking.

whitequark · 2024-07-16T20:16:45Z

See WebAssembly/wasi-filesystem#153.

whitequark added a commit to YoWASP/runtime-js that referenced this issue Jul 14, 2024

Work around bytecodealliance/wasmtime#8956.

8312479

whitequark mentioned this issue Jul 16, 2024

Providing a wasip1 stat API on top of wasip2 is infeasible WebAssembly/wasi-filesystem#153

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The implementation of `fd_filestat_get` in wasi-preview1-component-adapter is unsound #8956

The implementation of `fd_filestat_get` in wasi-preview1-component-adapter is unsound #8956

whitequark commented Jul 14, 2024 •

edited

Loading

whitequark commented Jul 14, 2024

whitequark commented Jul 14, 2024 •

edited

Loading

whitequark commented Jul 14, 2024

alexcrichton commented Jul 15, 2024

pchickey commented Jul 15, 2024

whitequark commented Jul 15, 2024

alexcrichton commented Jul 15, 2024

pchickey commented Jul 15, 2024

whitequark commented Jul 15, 2024 •

edited

Loading

whitequark commented Jul 16, 2024 •

edited

Loading

alexcrichton commented Jul 16, 2024

whitequark commented Jul 16, 2024 •

edited

Loading

alexcrichton commented Jul 16, 2024

whitequark commented Jul 16, 2024 •

edited

Loading

alexcrichton commented Jul 16, 2024

whitequark commented Jul 16, 2024

alexcrichton commented Jul 16, 2024

cfallin commented Jul 16, 2024

bjorn3 commented Jul 16, 2024

whitequark commented Jul 16, 2024

The implementation of fd_filestat_get in wasi-preview1-component-adapter is unsound #8956

The implementation of fd_filestat_get in wasi-preview1-component-adapter is unsound #8956

Comments

whitequark commented Jul 14, 2024 • edited Loading

whitequark commented Jul 14, 2024

whitequark commented Jul 14, 2024 • edited Loading

whitequark commented Jul 14, 2024

alexcrichton commented Jul 15, 2024

pchickey commented Jul 15, 2024

whitequark commented Jul 15, 2024

alexcrichton commented Jul 15, 2024

pchickey commented Jul 15, 2024

whitequark commented Jul 15, 2024 • edited Loading

whitequark commented Jul 16, 2024 • edited Loading

alexcrichton commented Jul 16, 2024

whitequark commented Jul 16, 2024 • edited Loading

alexcrichton commented Jul 16, 2024

whitequark commented Jul 16, 2024 • edited Loading

alexcrichton commented Jul 16, 2024

whitequark commented Jul 16, 2024

alexcrichton commented Jul 16, 2024

cfallin commented Jul 16, 2024

bjorn3 commented Jul 16, 2024

whitequark commented Jul 16, 2024

The implementation of `fd_filestat_get` in wasi-preview1-component-adapter is unsound #8956

The implementation of `fd_filestat_get` in wasi-preview1-component-adapter is unsound #8956

whitequark commented Jul 14, 2024 •

edited

Loading

whitequark commented Jul 14, 2024 •

edited

Loading

whitequark commented Jul 15, 2024 •

edited

Loading

whitequark commented Jul 16, 2024 •

edited

Loading

whitequark commented Jul 16, 2024 •

edited

Loading

whitequark commented Jul 16, 2024 •

edited

Loading