Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store Compiled Modules when using OCI Wasm Images #405

Merged
merged 19 commits into from
Feb 13, 2024

Conversation

jsturtevant
Copy link
Contributor

this would replace #44

@jsturtevant jsturtevant force-pushed the precompile-on-oci-loading branch 3 times, most recently from e29594f to 4c01e11 Compare December 1, 2023 00:33
@jsturtevant
Copy link
Contributor Author

fyi @cpuguy83

@jsturtevant jsturtevant marked this pull request as ready for review December 1, 2023 00:37
@jsturtevant jsturtevant requested a review from rumpl December 1, 2023 00:38
Copy link
Member

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to add some GC labels so containerd does not collect this content.
See: https://github.com/containerd/containerd/blob/8459273f806e068e1a6bacfaf1355bbbad738d5e/docs/garbage-collection.md#garbage-collection-labels

How do we invalidate (and de-reference in containerd) pre-compiled modules?
As I understand it a pre-compiled module will not work across different versions of a runtime.

Also we likely need to annotate what runtime the content is for.

Also, when writing the content we'll need to attach a lease to it until gc labels are applied otherwise containerd will collect it.

crates/containerd-shim-wasm/src/sandbox/containerd.rs Outdated Show resolved Hide resolved
crates/containerd-shim-wasm/src/sandbox/containerd.rs Outdated Show resolved Hide resolved
crates/containerd-shim-wasm/src/sandbox/containerd.rs Outdated Show resolved Hide resolved
@jsturtevant
Copy link
Contributor Author

I think we need to add some GC labels so containerd does not collect this content. See: https://github.com/containerd/containerd/blob/8459273f806e068e1a6bacfaf1355bbbad738d5e/docs/garbage-collection.md#garbage-collection-labels

From my experimenting, the GC is looks to be a top down model. So I've put the label that retains on the image level: https://github.com/containerd/runwasi/pull/405/files#diff-74fa6f2b9f387ae0683336f3c04b7e15658dbb2c2820f74bd143894009593d1bR365-R368. If the image gets removed this will be removed as well.

log::debug!("updating content with precompile digest to avoid garbage collection");
let mut image_content = self.get_info(digest.clone())?;
image_content.labels.insert("containerd.io/gc.ref.content.precompile".to_string(),precompile_digest.clone(),);

How do we invalidate (and de-reference in containerd) pre-compiled modules? As I understand it a pre-compiled module will not work across different versions of a runtime.

Also we likely need to annotate what runtime the content is for.

I did this here, but as you point out might need to do this per version as well. That way if a runtime is upgraded it would re-compute.

https://github.com/containerd/runwasi/pull/405/files#diff-74fa6f2b9f387ae0683336f3c04b7e15658dbb2c2820f74bd143894009593d1bR331

let label = format!("runwasi.io/precompiled/{}", T::name());

For dereferencing, do you think it is enough to wait for the image to be removed, which would drop the couple versions (if there are any)? I don't think it will be too much space to have a couple versions of it sitting around vs complexity to dereference them on an ongoing basis. For example, during the upgrade of the shim I think there might be alot of edge cases.

Also, when writing the content we'll need to attach a lease to it until gc labels are applied otherwise containerd will collect it.

Ah, I was wondering about that! There did seem to be a small moment where it could be cleaned up before I put the label on the image. I will take a look at putting a lease on it.

@Mossaka
Copy link
Member

Mossaka commented Dec 4, 2023

@jsturtevant Started reviewing this PR. How do I run this to see if the modules are stored?

@jsturtevant
Copy link
Contributor Author

The shim logs have will have statements and you will also be able to verify in containerd using ctr:

Find the compiled info:

sudo ctr i ls | grep "runwasi.io"
ghcr.io/containerd/runwasi/wasi-demo-oci:latest                                                             application/vnd.oci.image.manifest.v1+json
               sha256:60fccd77070dfeb682a1ebc742e9d677fc452b30a6b99188b081c968992394ce 2.4 MiB   wasi/wasm                                                                                                                           
runwasi.io/precompiled=sha256:b36753ab5a46f26f6bedb81b8a7b489cede8fc7386f1398706782e225fd0a98e

See it in the content store:

sudo ctr content ls | grep "b36753ab5a46f26f6bedb81b8a7b489cede8fc7386f139870"
sha256:60fccd77070dfeb682a1ebc742e9d677fc452b30a6b99188b081c968992394ce 561B    2 months        containerd.io/gc.ref.content.0=sha256:a3c18cd551d54d3cfbf67acc9e8f7ef5761e76827fe7c1ae163fca0193be88b3,containerd.io/gc.ref.content.config=sha256:85b7f2b562fe8665ec9d9e6d47ab0b24e2315627f5f558d298475c4038d71e8b,containerd.io/gc.ref.content.precompile=sha256:b36753ab5a46f26f6bedb81b8a7b489cede8fc7386f1398706782e225fd0a98e
sha256:b36753ab5a46f26f6bedb81b8a7b489cede8fc7386f1398706782e225fd0a98e 626.4kB 3 days          runwasi.io/precompiled=sha256:60fccd77070dfeb682a1ebc742e9d677fc452b30a6b99188b081c968992394ce

save it to local file:

ctr content get sha256:b36753ab5a46f26f6bedb81b8a7b489cede8fc7386f1398706782e225fd0a98e > precompiled.wasm

@jsturtevant jsturtevant force-pushed the precompile-on-oci-loading branch from 4c01e11 to a4a353b Compare December 12, 2023 21:37
@jsturtevant
Copy link
Contributor Author

Also, when writing the content we'll need to attach a lease to it until gc labels are applied otherwise containerd will collect it.

Ah, I was wondering about that! There did seem to be a small moment where it could be cleaned up before I put the label on the image. I will take a look at putting a lease on it.

Added a lease during new content creation

@jsturtevant jsturtevant force-pushed the precompile-on-oci-loading branch from a4a353b to 96dfd6b Compare December 12, 2023 21:42
@jsturtevant
Copy link
Contributor Author

Another question I had was if this is something we want to turn on for all images (as the code is now) or if we want to put an annotation on the image that would enable this feature.

Copy link
Member

@Mossaka Mossaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @jsturtevant great PR! Sorry I was late for the code review.

Honestly it took me some time to grasp the interaction between containerd and the sandbox for saving / retriving precompiled content using leases. I am still not 100% sure the full picture of the interactions. Would be appreciate if you clarify the logic here (maybe document it somewhere?)

crates/containerd-shim-wasm/src/container/context.rs Outdated Show resolved Hide resolved
crates/containerd-shim-wasm/src/container/context.rs Outdated Show resolved Hide resolved
crates/containerd-shim-wasm/src/container/engine.rs Outdated Show resolved Hide resolved
crates/containerd-shim-wasm/src/sandbox/containerd.rs Outdated Show resolved Hide resolved
crates/containerd-shim-wasm/src/sandbox/containerd.rs Outdated Show resolved Hide resolved
crates/containerd-shim-wasm/src/sandbox/containerd.rs Outdated Show resolved Hide resolved
crates/containerd-shim-wasmtime/src/instance.rs Outdated Show resolved Hide resolved
crates/containerd-shim-wasmtime/src/instance.rs Outdated Show resolved Hide resolved
crates/containerd-shim-wasmtime/src/instance.rs Outdated Show resolved Hide resolved
crates/containerd-shim-wasmtime/src/instance.rs Outdated Show resolved Hide resolved
@Mossaka
Copy link
Member

Mossaka commented Jan 8, 2024

@CaptainVincent does WasmEdge Rust SDK have an API to AOT compile a Wasm module to bytes that's similar to https://github.com/bytecodealliance/wasmtime/blob/7690c500228ae005fc70ee3b6297bb4b403686d5/crates/wasmtime/src/engine.rs#L222-L223?

@jsturtevant
Copy link
Contributor Author

Another question I had was if this is something we want to turn on for all images (as the code is now) or if we want to put an annotation on the image that would enable this feature.

I've thought on this a bit more and I think it should be opt in feature so will add support for an annotation

@cpuguy83
Copy link
Member

I don't think an annotation on the image would work well.
A runtime annotation could work but...
I'm curious why do you think it should be opt in? I can see the desire to opt in for now because its kind of experimental, but then the whole stack is.

We must compile the wasm to run it at all, so we already need to store it.
Keeping it cached so the same wasm isn't compiled more than once is just an optimization.

@CaptainVincent
Copy link
Contributor

Yes. This lines of the code provides two methods about bytes to file and file to file. If needed, I can also add bytes to bytes inside wasmedge-rust-sdk or integration layer.

@CaptainVincent does WasmEdge Rust SDK have an API to AOT compile a Wasm module to bytes that's similar to https://github.com/bytecodealliance/wasmtime/blob/7690c500228ae005fc70ee3b6297bb4b403686d5/crates/wasmtime/src/engine.rs#L222-L223?

@jsturtevant
Copy link
Contributor Author

I can see the desire to opt in for now because its kind of experimental, but then the whole stack is.

We must compile the wasm to run it at all, so we already need to store it.
Keeping it cached so the same wasm isn't compiled more than once is just an optimization.

I was leaning towards it being experimental but I can keep it as.

@jsturtevant jsturtevant force-pushed the precompile-on-oci-loading branch 9 times, most recently from f58fe43 to 941e500 Compare January 19, 2024 00:45
jsturtevant and others added 6 commits February 1, 2024 18:44
Signed-off-by: James Sturtevant <[email protected]>
Co-authored-by: Vaughn Dice <[email protected]>
Signed-off-by: James Sturtevant <[email protected]>
Signed-off-by: James Sturtevant <[email protected]>
Signed-off-by: James Sturtevant <[email protected]>
Signed-off-by: James Sturtevant <[email protected]>
Signed-off-by: James Sturtevant <[email protected]>
@jsturtevant jsturtevant force-pushed the precompile-on-oci-loading branch from 0d38599 to 603d448 Compare February 2, 2024 01:29
@jsturtevant
Copy link
Contributor Author

quick update, When I've added the tests it seems to have exposed a possible bug which I am debugging now. Essentially the OCI test run fine but when run with the other tests they are hanging and failing if run after an OCI test. We create a new engine each time so I am not really clear on what is causing the issue yet.

I tracked this down to the call to parallelize the compile step here. The engine should be usable across threads and each test should be creating a new "engine" so I don't know why this happens.
https://github.com/bytecodealliance/wasmtime/blob/b6a8abc6ce31cf6db7ad5edb9b2567620190ce35/crates/wasmtime/src/engine.rs#L142

@Mossaka
Copy link
Member

Mossaka commented Feb 2, 2024

quick update, When I've added the tests it seems to have exposed a possible bug which I am debugging now. Essentially the OCI test run fine but when run with the other tests they are hanging and failing if run after an OCI test. We create a new engine each time so I am not really clear on what is causing the issue yet.

I tracked this down to the call to parallelize the compile step here. The engine should be usable across threads and each test should be creating a new "engine" so I don't know why this happens. https://github.com/bytecodealliance/wasmtime/blob/b6a8abc6ce31cf6db7ad5edb9b2567620190ce35/crates/wasmtime/src/engine.rs#L142

Did we turn on the "parallel-compilation" feature of wasmtime?

@jsturtevant
Copy link
Contributor Author

Did we turn on the "parallel-compilation" feature of wasmtime?

yes we turn it that feature

"parallel-compilation",

and it looks to be avaliable on my machine as well as on CI

My next step is to make a small/simpler reproducible example

@jsturtevant
Copy link
Contributor Author

I was able to create a simple repo: https://github.com/jsturtevant/rayon-clone-issue/blob/master/src/main.rs. After adding the unit tests I am effectively running into #357 which is also called out in rayon-rs/rayon#708 (comment)

@jsturtevant
Copy link
Contributor Author

You might want to include Engine::precompile_compatibility_hash in the cache key; I added this method for precisely this purpose 🙂.

Precompiled binaries are sensitive not just to runtime version but also CPU and engine config features. It doesn't seem entirely implausible that - for example - the content store could be migrated to a new host with different CPU features and then you'd have a very gnarly performance regression to debug...

I've updated wasmtime to use this hash for the label

@jsturtevant
Copy link
Contributor Author

I was able to create a simple repo: https://github.com/jsturtevant/rayon-clone-issue/blob/master/src/main.rs. After adding the unit tests I am effectively running into #357 which is also called out in rayon-rs/rayon#708 (comment)

I've added the ability to disable the parallel compilation for the tests. We won't run into this issue as is in the shim since each shim is it's own instance but we need to address it longer term with #357

@jsturtevant jsturtevant force-pushed the precompile-on-oci-loading branch from 1d85d31 to 063242a Compare February 6, 2024 18:57
@jsturtevant
Copy link
Contributor Author

test failure was #423

@jsturtevant jsturtevant closed this Feb 6, 2024
@jsturtevant jsturtevant reopened this Feb 6, 2024
Signed-off-by: James Sturtevant <[email protected]>
@jsturtevant jsturtevant force-pushed the precompile-on-oci-loading branch from 346e064 to 15a8212 Compare February 6, 2024 20:41
@jsturtevant jsturtevant closed this Feb 6, 2024
@jsturtevant jsturtevant reopened this Feb 6, 2024
Copy link
Member

@Mossaka Mossaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Great work!

@jsturtevant
Copy link
Contributor Author

@cpuguy83 I've addressed the feedback, could you take another look? Thanks!

@cpuguy83 cpuguy83 merged commit 868a06a into containerd:main Feb 13, 2024
63 of 69 checks passed
@Mossaka
Copy link
Member

Mossaka commented Feb 14, 2024

🥳 Awesome! I am going to do a new release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants