Added a separate method that reads a string of unlimited length for use in the custom name section #1650

trzeciak · 2024-07-03T01:09:16Z

Hi, I have a problem with symbol names being too long in the name section, which causes the length of string_name to exceed 100_000, causing an error to be return (if I understand it correctly).
It is generated by emscripten from C++ code, the unfortunate combination of several large templates (std stuff * nlohmann::json * rxcpp * ...) in single functions.

This causes a problem when trying to upload the debug.wasm file to sentry, see: getsentry/symbolic#848

Unfortunately, I don't know the best way to deal with this.

The wasm format does not impose limits on the length of the names in name section.
The wasm file itself is absolutely correct (it runs and works properly in the browser).

For now, I would like to propose a slight increase in the limit in this library, but I think that in the future it would be better to refactor the way the name section is parsed.

https://webassembly.github.io/spec/core/appendix/custom.html

alexcrichton · 2024-07-08T16:08:11Z

Thanks for the PR! The purpose of these limits is to align with browser engines, although if this file works in browsers then something isn't aligned. Would you be up for investigating this? For example have the limits changed in browsers? Or are browsers basically completely ignoring the name section? One other fix would be to add a separate method that reads a string of unlimited length and use that for the name section instead.

trzeciak · 2024-07-08T17:27:40Z

Heh, I came here because I have already researched this issue as much as I needed/could, from this topic getsentry/symbolic#848, nevertheless, I will try to expand on the topic.

If I understand correctly, the purpose of custom sections (including the name section) [LINK] is to extend the wasm file with things that are not needed for it to work properly (we can skip all custom sections). And it seems that this is what is happening, because the wasm file with a large name section works in the browser, but does not work in wasm-tools/wasmparser → I tested it in google-chrome.

Additionally, the name section in google-chrome is parsed because wasm crash stack trace contain demangled symbols (this wouldn't be possible otherwise, but I checked it anyway, removing the name section and leave .debug sections, there are no symbols in the stacktrace) → this behavior is consistent with the wasm documentation.

As I wrote earlier, from a documentation point of view, the wasm file is correct: the documentation does not mention a section limit or a function count limit → so I think the parsing-implementation should be changed.
Name section contain:

Id	Subsection
0	module name
1	function names
2	local names

And 'function names' subsection contain namemap, with vector of symbols inside. The wasmparser library try read that vector to single string. Therefore, I assume that the 100_000 limit applies to a single symbol, not to an vector of symbols. Unfortunately, I don't have any evidence for this, and finding it in the chromium/firefox/safari engine is a bit beyond my capabilities.

Reads a string of unlimited length will potentially temporarily use a large amount of RAM, and will change this limit in places other than 'function names' → Actually, my solution also has a second disadvantage 😞 📍.
Therefore, I think that a good solution would be to add something like an additional function-binary-reader that allow to go through all the names entry separately (not at once as now). Except I don't have that much energy for it xD. Another workaround would be to dynamically change this limit only when reading functionnames, this is also bad, but not that bad, but I don't know if it's even possible to do it in rust.

A local workaround I can do is to modify the name section before uploading it to sentry:

extract the name section from the wasm file to a separate file
read/parse full namesection file according to the documentation (usign python, or patched-wasmparser library)
trim/corrupt all vectors whose sum of names exceeds 100k
save and replace the name section in the wasm file
enjoy sentra working for the remaining 99% of functions

This PR is just a cheap attempt to handle my large files (although you can see I'm not alone with this). But having realized that it also side affects 📍, I think that it is not acceptable in this form.

alexcrichton · 2024-07-08T18:48:03Z

Apologies if I caused a bit of confusion, but let me try to be more specific. The wasmparser crate has hardcoded limits which you're modifying here. These hardcoded limits are shared in browsers (that's Firefox, Chrome/Safari should be similar). Changing these limits are generally not that consequential here in wasmparser but we try to keep them in sync with others so all engines behave similarly when validating modules.

The specific limit you're running into here is the limit on the length of a string. This limit is baked in here and everything reading &str from bytes uses that method. RAM limits are not an issue here because the wasm module is already resident in memory and all we're doing is validating a pointer/length within this already-resident memory.

The main different appears to be that in Firefox it's using MaxStringBytes in DecodeName but not when decoding function names. This basically means that Firefox, and probably Chrome, have a hard limit on names in the import/export sections for example but not in the name section.

This is where my suggestion for "read a string of unlimited length" comes in. For example this check in wasmparser is not required for correctness or safety, it's just there to conform to what browsers are doing. That doesn't match what browsers are doing for the name section, however, so I think it's reasonable to have a method which is use by the name section which is "read a string but don't require it to be less than this statically known size".

This reverts commit 2f36947.

…slightly

trzeciak · 2024-07-12T12:58:22Z

@alexcrichton I think I understand what you mean, thank you for your patience and detailed explanation!
I've corrected the code, I think you can check if that was the point.

alexcrichton

Thanks!

Adding some follow-up documentation to bytecodealliance#1650

Adding some follow-up documentation to #1650

Increase MAX_WASM_STRING_SIZE slightly

2f36947

trzeciak mentioned this pull request Jul 3, 2024

sentry-cli debug-files check doesn't seem to work for wasm getsentry/symbolic#848

Closed

trzeciak added 4 commits July 12, 2024 14:21

Revert "Increase MAX_WASM_STRING_SIZE slightly"

3fc6586

This reverts commit 2f36947.

Reads a string of unlimited length during parse custom name section

673aed0

Reads a string of unlimited length during parse custom name section

10fe86e

Merge remote-tracking branch 'origin/main' into increase-string-size-…

08df8a1

…slightly

alexcrichton approved these changes Jul 12, 2024

View reviewed changes

alexcrichton added this pull request to the merge queue Jul 12, 2024

Merged via the queue into bytecodealliance:main with commit 021389c Jul 12, 2024
27 checks passed

alexcrichton added a commit to alexcrichton/wasm-tools that referenced this pull request Jul 12, 2024

Update some documentation about longer names

b84d256

Adding some follow-up documentation to bytecodealliance#1650

alexcrichton mentioned this pull request Jul 12, 2024

Update some documentation about longer names #1667

Merged

trzeciak deleted the increase-string-size-slightly branch July 12, 2024 15:42

github-merge-queue bot pushed a commit that referenced this pull request Jul 12, 2024

Update some documentation about longer names (#1667)

8078b66

Adding some follow-up documentation to #1650

trzeciak changed the title ~~Increase MAX_WASM_STRING_SIZE slightly~~ Added a separate method that reads a string of unlimited length for use that for the custom name section Jul 17, 2024

trzeciak changed the title ~~Added a separate method that reads a string of unlimited length for use that for the custom name section~~ Added a separate method that reads a string of unlimited length for use in the custom name section Jul 17, 2024

trzeciak mentioned this pull request Jul 17, 2024

Updated wasmparser dependency to 0.214.0 getsentry/symbolic#853

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added a separate method that reads a string of unlimited length for use in the custom name section #1650

Added a separate method that reads a string of unlimited length for use in the custom name section #1650

trzeciak commented Jul 3, 2024

alexcrichton commented Jul 8, 2024

trzeciak commented Jul 8, 2024

alexcrichton commented Jul 8, 2024

trzeciak commented Jul 12, 2024

alexcrichton left a comment

Added a separate method that reads a string of unlimited length for use in the custom name section #1650

Added a separate method that reads a string of unlimited length for use in the custom name section #1650

Conversation

trzeciak commented Jul 3, 2024

alexcrichton commented Jul 8, 2024

trzeciak commented Jul 8, 2024

alexcrichton commented Jul 8, 2024

trzeciak commented Jul 12, 2024

alexcrichton left a comment

Choose a reason for hiding this comment