You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
TLDR: Terra gives anonymous functions names like $anon (junk/wasm_helloworld.t:7) containing special characters (specifically the :) that break the way emscripten expects to parse symbol names.
To reproduce:
First (with Terra on llvm10+ and Emscripten installed) compile to wasm32 bitcode:
terralib.includepath="" -- no default includes from system libclocaleminclude=os.getenv("EMSDK") .."/upstream/emscripten/cache/sysroot/include"localtarget=terralib.newtarget{Triple="wasm32"}
localcio=terralib.includec("stdio.h", {"-I", eminclude}, target)
localfoo=terra(i: int32)
cio.printf("helloworld %d!\n", i)
endterrahelloworld_main(): int32fori=0, 10dofoo(i) endreturn0endterralib.saveobj("helloworld.bc", {main=helloworld_main}, nil, target, false)
Now try to link with emscripten:
emcc helloworld.bc -sALLOW_MEMORY_GROWTH=1 -o helloworld.html
Traceback (most recent call last):
File "/home/anon/emsdk/upstream/emscripten/emcc.py", line 3982, in <module>
sys.exit(main(sys.argv))
[...traceback skipped]
File "/home/anon/emsdk/upstream/emscripten/tools/building.py", line 574, in parse_llvm_nm_symbols
status = line[entry_pos + 11] # Skip address, which is always fixed-length 8 chars.
IndexError: string index out of range
Why? Emscripten gets symbol names by calling llvm-nm --print-file-names helloworld.bc and parsing each line using colons as delimiters:
# Line format: "[archive filename:]object filename: address status name"entry_pos=line.rfind(':') # finds *last* colon
But terra has produced this:
llvm-nm --print-file-name helloworld.bc
helloworld.bc: -------- t $anon (junk/wasm_helloworld.t:7)
helloworld.bc: -------- T main
helloworld.bc: U printf
Where emscripten incorrectly splits the line helloworld.bc: -------- t $anon (junk/wasm_helloworld.t:7) because it finds the colon inside the symbol name.
Workaround:
It's possible to avoid the issue by making sure every terra function is named, using func:setname(...) as needed.
Fix?:
Arguably this is Emscripten's fault for trying to parse human-readable tool output rather than using actual structured APIs, and for not even robustly parsing that output.
It might make sense, though, on the Terra side to give anonymous functions more sanitized names (i.e., without spaces, colons, or parenthesis) because there are likely a number of tools that expect symbol names in bitcode to be limited to C/C++ naming rules.
The text was updated successfully, but these errors were encountered:
this is definitely an Emscripten bug (note that file names with colons, which are perfectly legal on linux, would trigger this bug as well!), and if terra is to be tweaked to add a workaround, it should be optional imo. i would suggest either a terralib.saveobj flag/environment variable to use hashes in the generated anonymous names, or a way to customize the format (e.g. you pass a function that takes a terra function object and returns an appropriate name)
Can we at least check with the Emscripten developers to see what their outlook is on this one? Since a workaround is available on our end, I don't think we need to rush the fix.
TLDR: Terra gives anonymous functions names like
$anon (junk/wasm_helloworld.t:7)
containing special characters (specifically the:
) that break the way emscripten expects to parse symbol names.To reproduce:
First (with Terra on llvm10+ and Emscripten installed) compile to wasm32 bitcode:
Now try to link with emscripten:
Why? Emscripten gets symbol names by calling
llvm-nm --print-file-names helloworld.bc
and parsing each line using colons as delimiters:But terra has produced this:
Where emscripten incorrectly splits the line
helloworld.bc: -------- t $anon (junk/wasm_helloworld.t:7)
because it finds the colon inside the symbol name.Workaround:
It's possible to avoid the issue by making sure every terra function is named, using
func:setname(...)
as needed.Fix?:
Arguably this is Emscripten's fault for trying to parse human-readable tool output rather than using actual structured APIs, and for not even robustly parsing that output.
It might make sense, though, on the Terra side to give anonymous functions more sanitized names (i.e., without spaces, colons, or parenthesis) because there are likely a number of tools that expect symbol names in bitcode to be limited to C/C++ naming rules.
The text was updated successfully, but these errors were encountered: