Replies: 4 comments 1 reply
-
High-level - we care a lot about streaming compilation on the Web, and believe the LEB representation improves end-to-end performance in this situation (see below).
Mostly it's to optimise the representations of static indices in situations where the majority of indices will take small values, but there may be a long tail of larger values. For example, with (less important) - LEBs also have a standards body advantage if we expect that an index will initially have a very small range, but that future features may expand this range. Otherwise, we'd have to "bake in" the max size of the index during initial feature design, which might be a political pain. I don't have perfect examples of this happening yet to hand, although the multi-memory and memory64 proposals might be imperfect examples.
I think this would need to be empirically validated - I'd have the opposite intuition about the examples above, especially since LEB representations can also be generically compressed.
Since Wasm is mostly compiled, this would only be optimising compilation time, which was historically already bounded by the speed of the network in a lot of Web streaming compilation cases (I've not checked this in a few years). It could be instructive to consider how an alternative discipline of fixed indices might have created different trade-offs - I'd expect a better experience for in-place interpreters and non-Web/on-disk uses of Wasm, in exchange for some increase in binary size and a correspondingly poorer experience on the Web. I'm not sure how explicitly this trade-off was considered/measured in the first days of Wasm - @titzer might have some additional context.
On the Web, end-to-end execution time is often dominated by start-up time, which includes download time, so binary size should be very important to us (although I think we've been angsting recently about the binary size characteristics of new features like GC). It could be that, empirically, we've ended up making the wrong decision in our index representation - for example if compression algorithms are relatively better for fixed size indices than I'm expecting, or if in-place/on-disk uses of Wasm are suffering more than I'm expecting, or if Web compilation has become no longer bounded by download speed and the processing speed differential between fixed size indices and LEBs would make an observable difference. Wearing my other hat as an academic, this could be an interesting research project! |
Beta Was this translation helpful? Give feedback.
-
There was a lot of discussion about encoding questions at the beginning of Wasm. The size effectiveness of the basic code format definitely mattered to us. For example, there were discussions about using LEB vs other variable-length integer formats, there was discussion about adding specialised get0 instructions just to save single bytes, but we also intentionally imposed a max-length restriction on LEBs so that decoders could unroll their loops. For quite a while we even wanted a second layer in the binary format that enabled custom macro opcodes and thus the ability to reduce code size more systematically. But that was difficult to design right, and it wasn't clear that it would give sufficient benefit over simple zip compression to justify the added complexity for both producers and consumers. In the end we deferred it but nobody has bothered to pick it up again since. As a general observation, it is always easy to cut down on the number of instructions in the future by introducing new, specialised instructions. But if the basic format is inherently wasteful, then that is almost impossible to fix. 16-bit opcodes plus 4-byte indices would easily blow up the size of almost all code by a factor of 2 to 3, and that certainly matters. Also, Wasm was primarily designed for streaming jit compilation, where maximally efficient decoding matters less. On the other hand, it is not at all clear that substantially larger code would benefit interpreters either, because of adverse cache effects. |
Beta Was this translation helpful? Give feedback.
-
Thanks @conrad-watt and @rossberg. (Sorry for spelling and grammar mistakes, trying to wrangle a toddler at the same time.)
This also seems like a sensible future proofing, assuming addresses spaces become 128bit etc at some point. Though at the risk of repeating the mistake of so many others ... I dare say it would not happen in my life time ... (Now the race is on :D )
Yup, ok, say no more.
I agree. I imagine at the time the initial specification was published, it wasn't really possible to do much empirical testing. I.e. chicken and egg problem. My intuition is based on three assertions:
To that extend, I've prepared two examples:
Now looking at the assembly generated, it seems likely that 3. will not only have the lowest CPU clock count (given it has 2 instructions), but that it will fit in the cache better and be transferred from system memory and into the cache faster. Now I haven't run it up to verify it yet, but I'll try and get some performance tests done in the next few days. (Need to unit tests those examples too.)
I can only talk to my own expectations here. I agree with this for general web browsing, but I wonder if start-up time is really that big of a factor for the types of applications that will be using webassembly. I.e. which one would you rather, wait a few more seconds for the download or have a lagy experience for the full duration of usage. This may also not matter either if you do full application compression since the bundle should be overall smaller than just compressing the indexes and values. I.e. every layer of compression adds entropy, so compressing indexes and values, and then the whole module will be physically larger than just compressing the whole module from the start. Again, rather not run my mouth without testing :) Assuming I have time over the next few weeks, I'll try and make a tool that convert wasm to "big-wasm" to see the impact on performance and compression.
I'm not an academic, but I like the topic and could be up for some collab work if you want to publish something.
This may ultimately be the crux of it. I'm admittedly looking to embed wasm into applications and hardware which favours local performance over streaming performance. I.e. closer to Java and C#. I'll do some more testing to make an evidence based decision about whether wasm is a sensible solution for my work. (May turn out that I'm overthinking things.) |
Beta Was this translation helpful? Give feedback.
-
Right. I had assumed that the wasm binary format was the compiled output and designed to be interpreted directly. So essentially the tools I'm writing is the jit compiler. |
Beta Was this translation helpful? Give feedback.
-
Hi
I've been writing a WASM VM in Rust for fun. I'm trying to understand why LEB128 is used for encoding integers rather than just storing the integers directly in 4 or 8 byte byte blocks. I.e.
I'm not sure if there is something that I am missing, but no amount of googling has provided me any real insight. To me, it seems to be needless complexity for no obvious gain.
I have similar questions around using 8 bit opcodes rather than 16bit ones that can be tightly packed sequentially. (I.e. allow the compiler to build a jump table rather than a more complex data structure that take more CPU cycles to propagate, specifically for variable length opcodes.)
My understanding is that the purpose of WebAssembly is to maximise performance and if this was a performance choice, I would really like to understand how / why. I've been thinking of writing an optimiser that will convert WASM binary into a hardware optimised format, but before I do that, I really would like to understand if I have grossly missed the point or fundamentally misunderstood something. Any feedback?
Beta Was this translation helpful? Give feedback.
All reactions