-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable binary encoding and printing of multi-package wit files #1609
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this! On a skim this looks reasonable enough, but there's one part that will require careful review I haven't gotten to yet. Specifically this is affecting the binary format for wasm-encoded WIT packages and I need to review to see if this has an impact on how this interacts with older versions of wasm-tools
. For example older versions need to be able to decode binaries produced by this PR and vice-versa, binaries produced here additionally need to be able to be decoded by older versions (assuming the new feature of more-than-one-package isn't used).
To perhaps help with this can you add some tests? There's no tests currently for this functionality and I think it would be good to have some in crates/wit-component/tests/interfaces/*.wit
. Additionally would it be possible to avoid changing the v1
encoding? That in theory isn't being emitted by deafult any more and may even be able to be removed, it's been a bit since I reviewed the state of that.
Ok reading over this some more I don't think that this is going to work as-is. It looks like this is basically adding a few loops in a few places, but I think that the changes required are going to be more significant than that. For example each interface/world in a package, when encoded today, is exported as the kebab-cased-name of the interface/world. Each kebab-cased export is a standalone component type which has full context for just that interface/world. With multiple packages something will need to change about this format. Otherwise it's not possible to encode two packages which contains |
cc @lukewagner for possible brainstorming on the format here. The question being that today we have:
What's the best way to represent this WIT package: package a:b1 { interface foo {} }
package a:b2 { interface foo {} } I'm tempted to say: (component
(component $a:b1 (; output of `wasm-tools component wit -t for just `package a:b1` ;) )
(component $a:b2 (; output of `wasm-tools component wit -t for just `package a:b2` ;) )
(export "a:b1" (component $a:b1))
(export "a:b2" (component $a:b2))
) This isn't currently valid though as |
Ooh yeah, really good question. Agreed that the natural first step seems to be to put each nested WIT package into its own nested component. Then to your question: maybe this is weird but: what if we just didn't export those nested WIT packages at all? That would imply that when WIT tooling was slurping up a binary WIT package, it would need to recurse through all nested components and populate its internal namespace with all What's neat about this is that it preserves the distinction between: package ns:a;
package ns:b { interface i {} }
interface i {} and package ns:a { interface i {} }
package ns:b { interface i {} } The former you could say is defining the package |
I think that could work yeah and that seems reasonable to me. For the start I think it'd just walk the root component looking for subcomponents rather than spidering the whole thing, but either probably wouldn't be too hard to implement. I'm thinking though it'd be best to only accept a conservative shape of components to keep our options open in the future to different organizations if we want. Otherwise though this WIT: package ns:a;
package ns:b { interface i {} }
interface i {} currently isn't valid where WIT text files are either |
Ah right, good point; unless and until we add |
So if I'm following correctly then for the scope of this PR, since mixed package declaration styles isn't yet supported, then in Alex's example where we have:
this would end up encoded with no exports:
Is that right? Then whenever we end up with mixed declaration styles being supported (maybe I"ll do that as a follow up or if it doesn't seem so bad I may try to include it in this PR) we could see both this and the current day encodings appear side by side in a single encoded wit file. |
Indeed! So the encoder will basically say if there's one package do what it does today but with multiple packages being encoded they're encoded in this new format. Similarly the decoding process would need to recognize the new multi-package format and decode that in addition to supporting the previous single-package format. There's still sort of a question of defaults, but I think that's reasonable to tackle at a later date. Specifically I'm wondering for example what the output of |
So I have implemented these changes locally for encoding multiple wit packages. When decoding though, I found that some modifications will be have to made here as the current tooling will try to parse the binary as a component instead of a wit package if no exports are present. I think a convention needs to be decided on how to discern whether these are nested components in a component implementation vs being used to describe a wit package? I was thinking that maybe the component names can can provide a mechanism? If not then we could go back to expecting exports for these cases, and the convention could be established in how they're exported? |
After some more thought... I'll actually probably investigate refactoring ComponentInfo into |
That's a good point! I agree though that we can probably update how metadata is tracked and extend the "is empty" check to exclude the case of "no exports but has a bunch of components which all look like WIT packages" |
Cool. I think I've essentially accomplished the thrust of what's involved in order to do to that now (locally)... As I'm tidying it up before requesting additional review, I'll be evaluating whether or not to include support for mixing package declaration styles or have that be a follow up. |
301c1a4
to
fd677d1
Compare
Hmm seems I botched something up switching branches... should be able to look into what went wrong shortly... |
fd677d1
to
27576a1
Compare
27576a1
to
3e8b0f8
Compare
I think this should be ready for review now... just investigating why it's struggling to get gcc with yum in CI/CD. |
let mut names = NameMap::new(); | ||
for pkg in self.packages { | ||
let package = &self.resolve.packages[*pkg]; | ||
if let PackageKind::Explicit = package.kind { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused as to what this new PackageKind
field is and how it's processed here. I think ideally this would be removed entirely since it seems to be an artifact of the original text encoding and shouldn't affect this binary encoding here.
Instead what I might recommend is:
- If there's only a single package, encode in the old "v2 style".
- If there's multiple packages, encoded in this "explicit" style where a sub-component is created.
I'd also recommend extracing the shared code between these two branches of the if
since right now it looks like the body of the loop was copy/pasted.
Additionally I think long-term we'll want to move towards the "new v2 style" where all packages, even if it's just one, are encoded with a sub-component. In that sense I think it would make sense to have a few more knobs here like other transitions we've made with the binary format:
- An env var should be added which if specified and is 0 or 1 indicates which encoding branch to use.
- Faililng that a
const
with a doc block explaining what it's controlling.
Eventually in the future we'll invert the value of the const
to default to the new encoding. Further in the future we'll remove the env var. Further still we'll remove decoding support for the original encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After some thought, and keeping in mind that we'll eventually want to support a wit file that has both an "implicit"/"unscoped" package declaration as well as an "explicit"/"scoped" package declaration, is the number of packages actually what we want to use as the criteria? If we do things this way instead of capturing the PackageKind
... then I'm not sure how we can tell the difference between package a:b {}
and package a:b
if in the scoped case, it's the only scoped package. I think we'd end up transforming one package declaration style into the other when performing a roundtrip if this choice is made based on the number of packages that exist, which may be fine, but just wanted to make sure that we're considering this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the limit of time I would prefer to not spec/support multiple encoding formats for a WIT package in wasm. Right now we support v1/v2 and this PR is adding a third. I'd eventually like to sunset all the prior versions, however.
Given that my goal would be to use this new "v3" unconditionally. I'm proposing to distinguish based off 1-or-more packages purely for backwards compatibility. That way preexisting workflows don't break with older versions since the new format wouldn't be used by default. Once the new format is in enough supporting tooling it would then be enabled by default.
Personally I don't think it's too important to preserve package a:b { ... }
vs package a:b; ...
since they're semantically the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah ok that makes sense to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last thought that struck me...
Is there a use case where the scoped packages are used similarly to how the deps folder is used today?
In this case where we're mixing package declaration styles, the implicit package may be the one being used by guest lang tooling for knowing which world to target, and the others are there as deps. If we needed to encode in this case, then I think this logic would treat the implicit package like the deps, rather than it being treated as "higher" than the others? I guess perhaps we could rely on guest lang tooling to keep track of which package is the one that matters for targeting?
} | ||
// Finally convert the decoded package to wasm again and make sure it | ||
// matches the prior wasm. | ||
let wasm2 = wit_component::encode(Some(true), resolve, &[decoded_package])?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit surprised this test is passing because I would expect that the decoded_package
argument should be the result of decoded.packages()
now that multiple packages are supported. Could you investigate if a new test is needed to reflect that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the assertion was wrong! accidentally setting wat2
to be wasm
instead of wasm2
when checking the diff. You're right, when fixing the assertion, the test fails as is. Will push up changes to fix the assertion and handle decoded_package
properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out when I fixed it fixed the assertion, it came out that the docs are being stored in the wat in a different order depending on if a wit is being parsed vs a binary, in such a way that it doesn't impact how the docs get printed I believe. Can fix the tests by performing a sort on the metadata fields, but still investigating how to just get encode things in the same order regardless.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that a regression from this PR? Or a preexisting issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a preexisting issue.
if wasm != wasm2 {
let wat2 = wasmprinter::print_bytes(&wasm)?;
assert_eq!(wat, wat2, "document did not roundtrip correctly");
}
If I understand correctly, assert_eq
is receiving two different wat
samples created from wasm
, neither represent wasm2
... Was spending some time trying to dig into this, and it looks like modifying main also has this issue of doc metadata being reordered if I'm reading the code correctly...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, in that case feel free to leave this as-is and I can investigate it after landing this
cur_package.package_metadata = _package_metadata.clone(); | ||
} else if s.name() == "component-name" { | ||
if let KnownCustom::ComponentName(reader) = s.as_known() { | ||
let _packages = register_names(reader, &mut cur_package)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit hesitant to rely on a custom section for something so significant to the structure of the final WIT, would it be possible to avoid this inference of infer via other means?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I actually had similar thoughts when implementing... But I think that the way names of components are handled today is exclusively via custom sections? Will investigate some more, but if we want something similar to the wat outlined above where the name of the component is tied to the package name, I'm not sure of another way to get the information that informs a package name out of the binary format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation and @since
/@unstable
are definitely all handled with custom sections, but to me those are semi-ok to strip at any time where the name of a package is significant enough that you'd never want it stripped. I'm mostly wondering if this can be inferred from the exported items or something like that rather than having to be listed in the name section. An alternative could be to export the components themselves from the outer component where the export name has the full package name (of sorts at least)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess there's maybe an edge case here where you could have an empty scoped package? I think we could infer things from exports so long as there are exports, and as far as I know the only case where there wouldn't be is if the package is empty... Would you be comfortable erroring on empty scoped packages then? Or yeah, we could also do the alternative and export the components themselves...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lukewagner do you have thoughts on this point specifically? The question being how is the package name recovered when multiple packages are encoded in a component and one package is possibly empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or yeah... I think I like your idea of exporting the component, seems like the easiest way to deduce the name of the component without relying on the name section... How the PR handles it at the moment, even if we can tell it's an explicit/scoped package from the way the exports are structured, I think we still need the name section to figured out the actual name of the package that should be printed from a binary (unless there's another mechanism I'm missing). Also curious to hear @lukewagner's thoughts about if we exported the components.
df9a84d
to
955bf5b
Compare
955bf5b
to
d3d2d2c
Compare
1a0cb52
to
ed1a0ad
Compare
let mut names = NameMap::new(); | ||
for pkg in self.packages { | ||
let package = &self.resolve.packages[*pkg]; | ||
if let PackageKind::Explicit = package.kind { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the limit of time I would prefer to not spec/support multiple encoding formats for a WIT package in wasm. Right now we support v1/v2 and this PR is adding a third. I'd eventually like to sunset all the prior versions, however.
Given that my goal would be to use this new "v3" unconditionally. I'm proposing to distinguish based off 1-or-more packages purely for backwards compatibility. That way preexisting workflows don't break with older versions since the new format wouldn't be used by default. Once the new format is in enough supporting tooling it would then be enabled by default.
Personally I don't think it's too important to preserve package a:b { ... }
vs package a:b; ...
since they're semantically the same thing.
} | ||
// Finally convert the decoded package to wasm again and make sure it | ||
// matches the prior wasm. | ||
let wasm2 = wit_component::encode(Some(true), resolve, &[decoded_package])?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that a regression from this PR? Or a preexisting issue?
cur_package.package_metadata = _package_metadata.clone(); | ||
} else if s.name() == "component-name" { | ||
if let KnownCustom::ComponentName(reader) = s.as_known() { | ||
let _packages = register_names(reader, &mut cur_package)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lukewagner do you have thoughts on this point specifically? The question being how is the package name recovered when multiple packages are encoded in a component and one package is possibly empty.
name: Option<PackageName>, | ||
/// Wasmparser-defined type information learned after a component is fully | ||
/// validated. | ||
types: Option<types::Types>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be ok to only store the types at the "root" since I don't think a snapshot is needed along the way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, at this point though I think I've been able to combine ComponentInfo
with ExplicitPackageInfo
, so I think it'll just have a value of None
when for nested packages.
I'm going to close this as the conclusion of this PR manifested in #1700 which is to instead of supporting the feature being added here instead explicitly don't support it. The encoding/printing phases will continue to work on one "primary package". |
Wit files that leverage multiple packages should be able to be encoded into binary and printed back out with the changes in this PR