Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Emscripten update and standalone support #925

Conversation

jerbob92
Copy link
Contributor

Related to #924

This MR:

  • Updates Emscripten to the latest SDK version 3.1.43 (from 1.37.26)
  • Uses the default Emscripten backend (LLVM) and not the old one anymore (fastcomp)
  • Updates the libde265 version that was included in the Emscripten build to 1.0.12 (from 1.0.8)
  • Adds the following options to the build-emscripten.sh script:
    • CORES (default value: all available cores)
    • ENABLE_LIBDE265 (whether to include libde265, default = 1)
    • LIBDE265_VERSION (which version of libde265 to include, default = 1.0.12)
    • ENABLE_AOM (whether to include aom, default = 0)
    • AOM_VERSION (which version of aom to include, default = 3.6.1)
    • STANALONE (whether to build a standalone WASM that runs on non-web/serverside WASM runtimes)
    • DEBUG (whether to create a debug build, sets the Emscripten flags --profile and -g)
  • Fixes the pre.js/post.js to work with the new Emscripten version, since the Embind functions were not available yet on load in the previous implementation.


CONFIGURE_ARGS="-DENABLE_MULTITHREADING_SUPPORT=OFF -DWITH_GDK_PIXBUF=OFF -DWITH_EXAMPLES=OFF -DBUILD_SHARED_LIBS=ON -DENABLE_PLUGIN_LOADING=OFF"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added -DENABLE_PLUGIN_LOADING=OFF, I thought it would make sense.

emmake make -j${CORES}

export TOTAL_MEMORY=16777216
LIBHEIFA="libheif/libheif.a"
EXPORTED_FUNCTIONS=$($EMSDK/upstream/bin/llvm-nm $LIBHEIFA --format=just-symbols | grep "^heif_\|^de265_\|^aom_" | grep "[^:]$" | sed 's/^/_/' | paste -sd "," -)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary to make the C API's be exposed, otherwise they won't be available exported the WASM/JS code.

-s LEGACY_VM_SUPPORT=1 \
--memory-init-file 0 \
-O3 \
-s ERROR_ON_UNDEFINED_SYMBOLS=0 \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed all the old flags here, I don't think they are neccesary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When not building to WASM, but plain JS, --memory-init-file 0 avoids the generation of an extra file libheif.js.mem. Thus we may keep this to stay compatible to existing applications.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@farindk Ah I did not even know it was possible to only emit JS, was that what the previous version had as output?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, previously, it was a single libheif.js file. Thus, we have to still support this that other software that depends on this, continues to work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@farindk Shouldn't USE_WASM have a default value of 0 then?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. But I think in this case, I would take the risk to change the default because WASM is clearly the future and it doesn't change anything for the end-user. Just of the ones building new libheif.js files.
I am currently thinking about even doing larger changes, so it will be worth for them having a look anyways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have notified the following repo of the change: catdad-experiments/libheif-emscripten#19

What kind of other changes do you want to make? I think it might be interesting to look at thread support (for the browser it's possible).

@@ -33,7 +33,7 @@
#include <set>
#include <limits>

#if defined(__EMSCRIPTEN__)
#if defined(__EMSCRIPTEN__) && !defined(__EMSCRIPTEN_STANDALONE_WASM__)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: __EMSCRIPTEN_STANDALONE_WASM__ isn't something that Emscripten provides, I add this using CMAKE_C_FLAGS / CMAKE_CXX_FLAGS. Embind is not supported on non-web runtimes since it heavily relies on JS, so we do not want the Embind things to be injected in the standalone build.

@@ -112,7 +112,7 @@ static emscripten::val heif_js_decode_image(struct heif_image_handle* handle,
result.set("width", width);
int height = heif_image_get_height(image, heif_channel_Y);
result.set("height", height);
std::string data;
std::basic_string<unsigned char> data;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was needed because Embind was messing up the data transfer when using a normal std::string, however, the data transfer between strings looks very odd to me right now because I think what currently happens is:

  • WASM sends pointer to string to JS
  • JS reads binary data from pointer position
  • JS converts binary data into string
  • libheif converts string back to binary data (post.js)

I believe we can remove the string conversion, probably also speeds things up a bit. I'm currently talking to Emscripten what the best way to do that is in Embind.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converning passing the data around: do you think this (answer item 3) might work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! result.set("data", emscripten::val(emscripten::typed_memory_view(data.size(), data.data()))); seems to work perfectly! It will give you a Uint8Array directly in JS, however, I'm not completely sure who then becomes owner of the data buffer and who has to free it, would have to be looked into to prevent memory leaks.

@@ -175,40 +175,6 @@ var libheif = {
}
};

var key;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has all moved to pre.js

noInitialRun: true,
onRuntimeInitialized: function() {
// Expose enum values.
var enums = {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to move all this logic from post.js to pre.js inside the onRuntimeInitialized function. This is because these properties are not available yet in post.js since they are loaded later using embind, onRuntimeInitialized is triggered after loading embind. The interface still works the same because of this trick.

if (enums.hasOwnProperty(key.slice(1)) || key.indexOf("_heif_") !== 0) {
continue;
}
libheif[key.slice(1)] = this[key];
Copy link
Contributor Author

@jerbob92 jerbob92 Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the LLVM build, C functions start with _, we strip the _ off here to create an alias, so that the implementations can stay the same.

}

// Expose embind API.
for (key in Module) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I alias the embind API after the C API because there are two functions that have the same name as in the C API (heif_context_read_from_memory and heif_get_version), we prefer the embind function over the C API function.

@@ -33,13 +32,13 @@ if [ ! -d emsdk ]; then
fi

cd emsdk
echo "Updating SDK base to ${EMSDK_VERSION} ..."
echo "Updating SDK base to ${VERSION} ..."
Copy link
Contributor Author

@jerbob92 jerbob92 Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the EMSDK_VERSION and use VERSION now, the version that gets installed is always the same as the SDK.

@farindk farindk merged commit 3f7fad3 into strukturag:master Jul 19, 2023
24 of 30 checks passed
RELEASE_BUILD_FLAGS="-O3"

if [ "$STANDALONE" = "1" ]; then
echo "Building in standalone (non-web) build mode"
BUILD_FLAGS="-s STANDALONE_WASM=1 -s WASM=1 -o libheif.wasm --no-entry"
BUILD_FLAGS="$BUILD_FLAGS -s STANDALONE_WASM=1 -s WASM=1 -o libheif.wasm --no-entry"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@farindk I specifically did not extend BUILD_FLAGS in standalone build because if you do, it will include embind imports in the generated WASM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I'll undo that. I must admit that I didn't really understand the 'standalone' option.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@farindk Standalone builds allow you to run the generated WASM outside the browser in WebAssembly runtimes such as Wasmer, WAVM, or wasmtime. Personally I'm using the standalone build in the Wazero runtime to be used in the go-libheif library, this allows for a go library that is fully contained (because the WASM is embedded in the library) and doesn't need external native libraries, it also allows the library to run fully sandboxed. The only downside is that it's slower than CGO, and multithreaded support is pretty bad right now, which causes the decoding to be around 3x as slow as CGO.

@fancycode
Copy link
Member

While this might improve things, it broke the Emscripten CI tests (already in this PR) 😢 So from a testing point of view, we don't detect if some further change breaks Emscripten.

I'll add an issue to fix the Emscripten CI.

@fancycode fancycode mentioned this pull request Oct 9, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants