You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Later btree-based collection support was added to handle the use case where no_std environments can't seed their maps with a randomness source meaning that no_std usage was theoretically susceptible to collision attacks, where btree-based collections don't have this weakness.
The current state we then now have today is that wasmparser has its own set of collections but they are conditionally backed by hash maps or btree maps. This isn't great for wasmparser as the wrapper collections are nontrivial to have and maintain. Ideally they wouldn't exist at all!
Hash maps are considered not suitable as "only support this" due to the risk of collisions in no_std environments. The question this issue poses is could the btree collections be considered to be used 100% of the time with no options for hash maps? The main benefit of this would be that we could remove all the wrapper collections and simplify the features in the wasmparser crate here by simply always using btree-based maps. This works in no_std environments (at least those with an allocator) and others alike.
Personally I think the main decision here will come down to benchmarks. Put another way - is it expected for uses of wasmparser and large-ish wasm modules to strongly benefit from hash-based collections instead of btree-based collections? Benchmarking is starting to indicate this but the measurements there don't necessarily capture the full picture, so I wanted to write down some more questions here:
Is there a size of wasm module (probably in terms of export/import count) where hash maps perform measurably better than btree maps?
Is there a similar limit/size for components? (components are heavier on maps than core wasm)
Are there memory-size benefits to btree maps over hash maps?
To what degree does this performance affect embedders such as Wasmtime? For example not only during validation but also general operation with the wasm module/component?
In the limit I suspect that many of these questions, if not all, are basically un-answerable. They're ones worth considering but I don't think there's necessarily going to be a definitive answer one way or the other. Nevertheless I wanted to open an issue for this at least as a place of discussion for this point.
tl;dr should the usage of hash maps in
wasmparser
be entirely replaced withBTree{Map,Set}
-based equivalents?Some history here looks like:
wasmparser
was written with hash-based collections everywhereno_std
support was added this required using thehashbrown
crate for these mapsno_std
environments can't seed their maps with a randomness source meaning thatno_std
usage was theoretically susceptible to collision attacks, where btree-based collections don't have this weakness.The current state we then now have today is that
wasmparser
has its own set of collections but they are conditionally backed by hash maps or btree maps. This isn't great forwasmparser
as the wrapper collections are nontrivial to have and maintain. Ideally they wouldn't exist at all!Hash maps are considered not suitable as "only support this" due to the risk of collisions in
no_std
environments. The question this issue poses is could the btree collections be considered to be used 100% of the time with no options for hash maps? The main benefit of this would be that we could remove all the wrapper collections and simplify the features in thewasmparser
crate here by simply always using btree-based maps. This works inno_std
environments (at least those with an allocator) and others alike.Personally I think the main decision here will come down to benchmarks. Put another way - is it expected for uses of
wasmparser
and large-ish wasm modules to strongly benefit from hash-based collections instead of btree-based collections? Benchmarking is starting to indicate this but the measurements there don't necessarily capture the full picture, so I wanted to write down some more questions here:In the limit I suspect that many of these questions, if not all, are basically un-answerable. They're ones worth considering but I don't think there's necessarily going to be a definitive answer one way or the other. Nevertheless I wanted to open an issue for this at least as a place of discussion for this point.
cc @Robbepop
The text was updated successfully, but these errors were encountered: