Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a separate method that reads a string of unlimited length for use in the custom name section #1650

Merged
merged 5 commits into from
Jul 12, 2024
Merged

Conversation

trzeciak
Copy link
Contributor

@trzeciak trzeciak commented Jul 3, 2024

Hi, I have a problem with symbol names being too long in the name section, which causes the length of string_name to exceed 100_000, causing an error to be return (if I understand it correctly).
It is generated by emscripten from C++ code, the unfortunate combination of several large templates (std stuff * nlohmann::json * rxcpp * ...) in single functions.

This causes a problem when trying to upload the debug.wasm file to sentry, see: getsentry/symbolic#848

Unfortunately, I don't know the best way to deal with this.

  • The wasm format does not impose limits on the length of the names in name section.
  • The wasm file itself is absolutely correct (it runs and works properly in the browser).

For now, I would like to propose a slight increase in the limit in this library, but I think that in the future it would be better to refactor the way the name section is parsed.

https://webassembly.github.io/spec/core/appendix/custom.html

@alexcrichton
Copy link
Member

Thanks for the PR! The purpose of these limits is to align with browser engines, although if this file works in browsers then something isn't aligned. Would you be up for investigating this? For example have the limits changed in browsers? Or are browsers basically completely ignoring the name section? One other fix would be to add a separate method that reads a string of unlimited length and use that for the name section instead.

@trzeciak
Copy link
Contributor Author

trzeciak commented Jul 8, 2024

Heh, I came here because I have already researched this issue as much as I needed/could, from this topic getsentry/symbolic#848, nevertheless, I will try to expand on the topic.

If I understand correctly, the purpose of custom sections (including the name section) [LINK] is to extend the wasm file with things that are not needed for it to work properly (we can skip all custom sections). And it seems that this is what is happening, because the wasm file with a large name section works in the browser, but does not work in wasm-tools/wasmparser → I tested it in google-chrome.

Additionally, the name section in google-chrome is parsed because wasm crash stack trace contain demangled symbols (this wouldn't be possible otherwise, but I checked it anyway, removing the name section and leave .debug sections, there are no symbols in the stacktrace) → this behavior is consistent with the wasm documentation.

As I wrote earlier, from a documentation point of view, the wasm file is correct: the documentation does not mention a section limit or a function count limit → so I think the parsing-implementation should be changed.
Name section contain:

Id Subsection
0 module name
1 function names
2 local names

And 'function names' subsection contain namemap, with vector of symbols inside. The wasmparser library try read that vector to single string. Therefore, I assume that the 100_000 limit applies to a single symbol, not to an vector of symbols. Unfortunately, I don't have any evidence for this, and finding it in the chromium/firefox/safari engine is a bit beyond my capabilities.

Reads a string of unlimited length will potentially temporarily use a large amount of RAM, and will change this limit in places other than 'function names' → Actually, my solution also has a second disadvantage 😞 📍.
Therefore, I think that a good solution would be to add something like an additional function-binary-reader that allow to go through all the names entry separately (not at once as now). Except I don't have that much energy for it xD. Another workaround would be to dynamically change this limit only when reading functionnames, this is also bad, but not that bad, but I don't know if it's even possible to do it in rust.

A local workaround I can do is to modify the name section before uploading it to sentry:

  • extract the name section from the wasm file to a separate file
  • read/parse full namesection file according to the documentation (usign python, or patched-wasmparser library)
  • trim/corrupt all vectors whose sum of names exceeds 100k
  • save and replace the name section in the wasm file
  • enjoy sentra working for the remaining 99% of functions

This PR is just a cheap attempt to handle my large files (although you can see I'm not alone with this). But having realized that it also side affects 📍, I think that it is not acceptable in this form.

@alexcrichton
Copy link
Member

Apologies if I caused a bit of confusion, but let me try to be more specific. The wasmparser crate has hardcoded limits which you're modifying here. These hardcoded limits are shared in browsers (that's Firefox, Chrome/Safari should be similar). Changing these limits are generally not that consequential here in wasmparser but we try to keep them in sync with others so all engines behave similarly when validating modules.

The specific limit you're running into here is the limit on the length of a string. This limit is baked in here and everything reading &str from bytes uses that method. RAM limits are not an issue here because the wasm module is already resident in memory and all we're doing is validating a pointer/length within this already-resident memory.

The main different appears to be that in Firefox it's using MaxStringBytes in DecodeName but not when decoding function names. This basically means that Firefox, and probably Chrome, have a hard limit on names in the import/export sections for example but not in the name section.

This is where my suggestion for "read a string of unlimited length" comes in. For example this check in wasmparser is not required for correctness or safety, it's just there to conform to what browsers are doing. That doesn't match what browsers are doing for the name section, however, so I think it's reasonable to have a method which is use by the name section which is "read a string but don't require it to be less than this statically known size".

@trzeciak
Copy link
Contributor Author

@alexcrichton I think I understand what you mean, thank you for your patience and detailed explanation!
I've corrected the code, I think you can check if that was the point.

Copy link
Member

@alexcrichton alexcrichton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@alexcrichton alexcrichton added this pull request to the merge queue Jul 12, 2024
Merged via the queue into bytecodealliance:main with commit 021389c Jul 12, 2024
27 checks passed
alexcrichton added a commit to alexcrichton/wasm-tools that referenced this pull request Jul 12, 2024
@trzeciak trzeciak deleted the increase-string-size-slightly branch July 12, 2024 15:42
github-merge-queue bot pushed a commit that referenced this pull request Jul 12, 2024
Adding some follow-up documentation to #1650
@trzeciak trzeciak changed the title Increase MAX_WASM_STRING_SIZE slightly Added a separate method that reads a string of unlimited length for use that for the custom name section Jul 17, 2024
@trzeciak trzeciak changed the title Added a separate method that reads a string of unlimited length for use that for the custom name section Added a separate method that reads a string of unlimited length for use in the custom name section Jul 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants