Should hash maps be removed from `wasmparser`? #1876

alexcrichton · 2024-10-22T19:27:11Z

tl;dr should the usage of hash maps in wasmparser be entirely replaced with BTree{Map,Set}-based equivalents?

Some history here looks like:

Originally wasmparser was written with hash-based collections everywhere
When no_std support was added this required using the hashbrown crate for these maps
Later btree-based collection support was added to handle the use case where no_std environments can't seed their maps with a randomness source meaning that no_std usage was theoretically susceptible to collision attacks, where btree-based collections don't have this weakness.
Later the way features in this space were updated

The current state we then now have today is that wasmparser has its own set of collections but they are conditionally backed by hash maps or btree maps. This isn't great for wasmparser as the wrapper collections are nontrivial to have and maintain. Ideally they wouldn't exist at all!

Hash maps are considered not suitable as "only support this" due to the risk of collisions in no_std environments. The question this issue poses is could the btree collections be considered to be used 100% of the time with no options for hash maps? The main benefit of this would be that we could remove all the wrapper collections and simplify the features in the wasmparser crate here by simply always using btree-based maps. This works in no_std environments (at least those with an allocator) and others alike.

Personally I think the main decision here will come down to benchmarks. Put another way - is it expected for uses of wasmparser and large-ish wasm modules to strongly benefit from hash-based collections instead of btree-based collections? Benchmarking is starting to indicate this but the measurements there don't necessarily capture the full picture, so I wanted to write down some more questions here:

Is there a size of wasm module (probably in terms of export/import count) where hash maps perform measurably better than btree maps?
Is there a similar limit/size for components? (components are heavier on maps than core wasm)
Are there memory-size benefits to btree maps over hash maps?
To what degree does this performance affect embedders such as Wasmtime? For example not only during validation but also general operation with the wasm module/component?

In the limit I suspect that many of these questions, if not all, are basically un-answerable. They're ones worth considering but I don't think there's necessarily going to be a definitive answer one way or the other. Nevertheless I wanted to open an issue for this at least as a place of discussion for this point.

cc @Robbepop

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should hash maps be removed from `wasmparser`? #1876

Should hash maps be removed from `wasmparser`? #1876

alexcrichton commented Oct 22, 2024

Should hash maps be removed from wasmparser? #1876

Should hash maps be removed from wasmparser? #1876

Comments

alexcrichton commented Oct 22, 2024

Should hash maps be removed from `wasmparser`? #1876

Should hash maps be removed from `wasmparser`? #1876