-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Symmetrical ABI for component fusion and shared everything #386
Comments
P³S: Previously I tried avoiding all memory allocation by only returning borrowed objects and maintaining |
Ideally this could come with a future symmetric API option for C++ as well, but this is a choice internal to the bindings code and not related to the ABI. |
Thanks for doing all this work and filing the idea; lots of interesting thoughts here, although I'm not sure I understand the whole picture.
Hm, I'm not sure I follow. If we make the core ABIs the exact same, we'll still need a trampoline function in-between to copy (values) and move (handles) between the distinct memories/tables of the caller and callee's core modules (b/c shared-nothing). And once we have that trampoline between the caller's and callee's core modules, is there a remaining benefit from them having the exact same core function signature? But maybe you're thinking of an optimization that merges the memories/tables of separate component together? Over the years, this idea comes up periodically b/c it seems enticing, but in general I don't think this is a valid optimization for arbitrary wasm content since there are many ways for core wasm code to subtly depend on having its own memory/table space to do whatever it likes, so this optimization is likely to introduce subtle bugs. Furthermore, as we add new |
The benefits of a symmetric ABI manifest in the shared everything case. For shared nothing the calling-into-a-component direction becomes more complex because the argument memory needs to be freed/reused by the caller after the call and the result might need multiple de-allocations instead of a single With shared everything, e.g. if the component "runtime" lives in the same address space, this picture suddenly flips, asymmetric requires complex copying code between components while symmetric enables direct calling without glue code. Please keep in mind that we use WIT to describe the boundary between native (e.g. x86-64) components. Here the symmetric ABI allowed me to significantly reduce the complexity of the overall system. If at some point we benefit from CPU architecture independence it just requires a recompilation of the already componentized source code to wasm32. If we feel the need to insulate parts from each other we just insert an IPC block at the ABI boundary, without the need to recompile either side. These cases will even benefit from regenerating the binding for the asymmetric ABI. I believe WIT (without wasm) is the best option if you need to combine any two languages with stream, future, option, or result data types. The source code compatibility to the component model unlocks the option to later easily compile for wasm's full insulation, CPU and OS independence. So it is a more extensive optimization/simplification for the transitional area (wasit2). Symmetric ABI would simplify native plugins (it provides a stable binary interface between Rust components which can directly link to each other) and many highly modular embedded use cases when performance is more important than CPU architecture portability. |
Ah, interesting, it sounds like your use case here is purely native code that is using WIT as an IDL to compose native code; is that right? |
Ah ok, sorry, I had missed that in the original motivation. In that case, it seems like we can decouple the ABI used for this native-to-native linking use case from the main wasm use case. This would let you, e.g., specify a mapping directly from a WIT interface to a C |
Separating symmetric ABI as an encoding convention for native compilation is a very interesting take on this. I can imagine wit-bindgen outputting target specific code supporting both ABIs with the same code, but for exported resources this would create a large mess of conditional code, due to the significant differences in complexity. I can also see symmetric ABI used for fusing several shared-everything components into a single core module, so wasit2 (or wasip2-module) could make use of it. If you envision a core module based component model with shared everything and dynamic linking, symmetric ABI fits the requirements. But maybe adding multiple address spaces to WAMR for full component model support is less work than the solution described in my previous chapter. |
If the goal of a symmetric native ABI is to have no adapters, I expect it will already require very different bindgen than what's currently emitted for wasm (since at least some of the code in the adapters will now have to be in the bindings), so I would think of native bindgen as a wholly different mode for wit-bindgen (perhaps as a third option after "guest" and "host").
This is the transformation that I'm far more skeptical of for the reasons mentioned in the last para of my above comment. To summarize, I think shared-everything linking shouldn't be going through WIT at all -- it should use the languages' traditional facilities for dynamic linking which gives the developer better language-level integration (e.g., letting C++ share classes or exceptions across module boundaries) and achieves fuller sharing of memory. |
I understand your sentiment about fused components, we either need separate binaries (shared objects) or tight control of the symbols across the resulting binary to avoid fusing separate global variables into one, or confusing unrelated function implementations (patchelf has been mentioned to me as a potential solution). I debugged into several of these errors over the past year. Typically I recommend to use shared libraries and using a linker script to exclude all defined symbols from getting externally visible which are not mapped via WIT. Also you can't easily run into the problem of multiple components (shared objects) exporting the same type of resource. To solve this I proposed to carry a virtual function table along with the object ID which identifies the original shared object which created the resource. These are a lot of complications, but for now I don't see any alternative to WIT for creating asynchronous foreign function interfaces between languages (or different versions of Rust or C++) providing high level data types. Last week I built a working prototype using symmetric ABI of a complex system deploying streams and futures (WASI 0.2 equivalent using pollable) of result, option, list and string, for now C++ only but I already created smaller proofs of concept of a Rust/C++ combination. Sharing classes or exceptions across modules works only as long as you remain within a single language. And a C based interface is always under-specified with respect to ownership across function calls. |
Oh, I totally forgot about this: Symmetric ABI is a variant of guest code with small changes to the generator (only exported functions work differently). Having maintained both guest and host code generators for C++ (with moderate shared code and many special cases), I rejoice at the idea of having a single code generator for guest and host side. For native plugins not having to distinguish between host and guest, or between import and export conventions is a huge simplification. I worked on above mentioned prototype for nearly a year using the canonical ABI (and a host side connection fabric) and never completed it, but with symmetric ABI it was a matter of days. |
Please excuse the large amount of replies! The additional choice to use the same componentized source code as this highly efficient native solution, re-applying the canonical code generator and then compiling to wasm32-wasip2 to create a fully portable binary to run with wasmtime is always a motivation for continuing this path. And I can move from this wasm32 component backwards, using wasm2c and a symmetrical ABI adapter to generate a binary compatible drop-in replacement for the original native component. |
I should mention that this is mostly due to needing multi-threading, multiple |
Right, this is my basic assumption: shared-everything dynamic linking is a single-language (or single-ABI-compatible-language-family) affair. I'm a bit skeptical that WIT would be the right long-term solution for shared-everything cross-language linking (given that sharing memory just works a lot differently, esp. once you want to optimize) and so I'm reluctant to focus too much on this use case for WIT or add extra requirements based on it. |
I think that WIT with shared everything dynamic linking is suited for native cross language componentization, especially since it paves the road for later full wasm insulation by selecting a different compilation target, or by transparently bridging to a different process. I support your hesitation to add alien baggage to the component model only to adjust for this use case and feel that there exists a middle path as well. Over the past year I created a working prototype of asynchronous publisher/subscriber with C++ and Rust using this technology and always had easy back and forth transitions between native and wasm components in mind. And the recently added integration tests really made the sample implementation much more solid. I plan to follow your proposal to create a document describing this as a potential specialization for native plugins ... and look forward to integrating your async lifting/lowering technology this fall. |
While the asymmetry between imported and exported functions and resources is directly caused by the asymmetry between guest and host, a symmetrical ABI option would enable direct fusion of components into a single core module.
Similarly this enables combining components compiled as shared objects without a runtime doing the impedance matching. Full insulation between components is still possible using this symmetric ABI, but calling exported functions becomes less elegant.
The proposal is to reuse the ABI for imported interfaces (functions and resources) also for exported interfaces. This means that arguments become only borrowed and return values are allocated from linear memory using
realloc
, no morecabi_post_
. Similarlydtor
is replaced with[resource-drop]
and[resource-new]
and[resource-rep]
are only used internally. Also the member functions always receive ID, never the rep.You can find a proof of concept at https://github.com/cpetig/wit-bindgen/tree/main/crates/cpp/tests/meshless_resources and a presentation at https://hackmd.io/@cpetig/rJp4l6vKC#/ , code generation for C++ and Rust are available.
PS: This also removes the conceptual difference between guest and host implementations.
PPS: To save overhead in a shared everything environment the resource ID should become
usize
instead ofi32
, so that the mapping table between rep and ID becomes optional for shared everything between friendly components. (Similarly IDs for async objects should becomeusize
wherever meaningful - no change for wasm32, but more handy on x86_64 or wasm64)The text was updated successfully, but these errors were encountered: