Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should hash maps be removed from wasmparser? #1876

Open
alexcrichton opened this issue Oct 22, 2024 · 0 comments
Open

Should hash maps be removed from wasmparser? #1876

alexcrichton opened this issue Oct 22, 2024 · 0 comments

Comments

@alexcrichton
Copy link
Member

tl;dr should the usage of hash maps in wasmparser be entirely replaced with BTree{Map,Set}-based equivalents?

Some history here looks like:

The current state we then now have today is that wasmparser has its own set of collections but they are conditionally backed by hash maps or btree maps. This isn't great for wasmparser as the wrapper collections are nontrivial to have and maintain. Ideally they wouldn't exist at all!

Hash maps are considered not suitable as "only support this" due to the risk of collisions in no_std environments. The question this issue poses is could the btree collections be considered to be used 100% of the time with no options for hash maps? The main benefit of this would be that we could remove all the wrapper collections and simplify the features in the wasmparser crate here by simply always using btree-based maps. This works in no_std environments (at least those with an allocator) and others alike.

Personally I think the main decision here will come down to benchmarks. Put another way - is it expected for uses of wasmparser and large-ish wasm modules to strongly benefit from hash-based collections instead of btree-based collections? Benchmarking is starting to indicate this but the measurements there don't necessarily capture the full picture, so I wanted to write down some more questions here:

  • Is there a size of wasm module (probably in terms of export/import count) where hash maps perform measurably better than btree maps?
  • Is there a similar limit/size for components? (components are heavier on maps than core wasm)
  • Are there memory-size benefits to btree maps over hash maps?
  • To what degree does this performance affect embedders such as Wasmtime? For example not only during validation but also general operation with the wasm module/component?

In the limit I suspect that many of these questions, if not all, are basically un-answerable. They're ones worth considering but I don't think there's necessarily going to be a definitive answer one way or the other. Nevertheless I wanted to open an issue for this at least as a place of discussion for this point.

cc @Robbepop

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant