Add new snapshots functionality #465

amohoste · 2022-01-23T20:30:08Z

Summary

This PR adds the functionality to boot multiple uVMs from a single snapshot, giving more flexibility to using snapshots and also opening up the possibility to add remote snapshot restore functionality.

Implementation Notes ⚒️

The following external dependencies have been updated to support the new snapshot functionality. Once these have been merged, the go.mod file should be updated to the respective ease_lab repo's:

Firecracker: now allows specifying a custom devmapper snapshot device
Firecracker-containerd: add support for network namespaces support, creating a shim upon loading snapshots, support for diff snapshot, support for specifying a custom devmapper snapshot device in Firecracker
Containerd : expose devmapper snapshot device IDs in stat routine

Since the new snapshotting without offloading is not yet compatible with the reap snapshots, an orchestrator interface has been added to support the old logic with offloading and the new snapshots. To also support REAP snapshots in the future with the new snapshotting logic, we would need to be able to obtain the UPF socket path to handle page faults before loading the snapshot which is currently not possible. The two snapshotting implementations have been named "deduplicated" for the new snapshots and "regular" for the old snapshots with offloading but these could probably use better names.

The following components have been added to support the new snapshots, but are also used in the old implementation with offloading:

imageManager: manages and serializes pulling of container images, avoiding potential unnecessary network congestion
networkManager: manages a pool of in use and potentially also ready to use network namespaces with all the neccessary networking to run a VM preconfigured. Having a pool of ready to use network namespaces reduces cold start latency by taking the network initialization off the critical path of VM creation
snapshotManager: new component that manages the available snapshots. An interface has been provided with an implementation for both the offloading snapshots, where a single snapshot can only be used to boot a single VM and an implementation for the new snapshots where the logic has been adjusted to account for the fact that a snapshot kan be used to boot as many VMs as desired

The following components have been added exclusively for the new snapshots:

devMapper: creates and manages device snapshots used to store container images. This component was necessary to boot multiple snapshots from a single VM, although further optimizations are certainly possible here

ustiugov · 2022-02-09T15:14:31Z

For containerd, please create a PR like for others so it easy to discuss what needs to be done. I reviewed the patch, you need to move the new arguments in api/services/snapshots/v1/snapshots.proto to the end of the protobuf messages, like you did it in firecracker-containerd.

ustiugov

Great work, Amory! But we do need more before merging it

I am sure that this regular/dedup code duplication is a bad SW design decision. there is a ton of replicated code. This bloats the code and making code maintanence much harder. Please merge them and introduce requires runtime arguments etc. Same stands for firecracker-containerd, it has to be 1 version.
make sure that you allow the user to override the interface name because a node may have several NICs and the user might want to drive the traffic thru a particular interface (e.g., the highest BW NIC).
we need unit and integration tests for all the added functionality. This is a must.

cri/firecracker/coordinator.go

cri/firecracker/coordinator_test.go

cri/firecracker/service.go

ctriface/iface_test.go

go.mod

metrics/metrics.go

networking/networking.go

scripts/install_pmutools.sh

ustiugov

Thanks! LGTM

Please see my minor comments, and I look forward to seeing the unit and integration tests ASAP.

cri/firecracker/coordinator.go

cri/firecracker/service.go

ctriface/iface.go

ctriface/manual_cleanup_test.go

vhive.go

Signed-off-by: Amory Hoste <[email protected]>

ustiugov · 2022-04-25T18:34:18Z

scripts/github_runner/clean_cri_runner.sh

@@ -99,6 +99,9 @@ if [ "$SANDBOX" == "gvisor" ]; then
 fi

 if [ "$SANDBOX" == "firecracker" ]; then
+  echo Cleaning snapshots


indent seems off

ustiugov · 2022-05-20T14:27:54Z

go.mod

 	github.com/ease-lab/vhive/examples/protobuf/helloworld => ./examples/protobuf/helloworld
-	github.com/firecracker-microvm/firecracker-containerd => github.com/ease-lab/firecracker-containerd v0.0.0-20210618165033-6af02db30bc4
+	github.com/firecracker-microvm/firecracker-containerd => github.com/amohoste/firecracker-containerd v1.0.0-enhanced-snap // TODO: change to vhive


please change

ustiugov · 2022-05-20T14:33:33Z

@amohoste could you please update the PR's description, particularly the terms. Much of this text can be moved to the doc. Also, a lot of people are interested in how vHive can support remote snaps, can you please elaborate on that in the doc?

ustiugov

Thanks, Amory! Just need to split up the lines into smaller ones for more granular version control.

@cvetkovic could you please read the docs and see if there are any issues with the explanations? is everything clear to you?

ustiugov · 2022-06-07T20:54:30Z

docs/fulllocal_snapshots.md

@@ -0,0 +1,25 @@
+# vHive full local snapshots
+
+When using Firecracker as the sandbox technology in vHive, two snapshotting modes are supported: a default mode and a full local mode. The default snapshot mode use an offloading based technique which leaves the shim and other resources running upon shutting down a microVM such that it can be re-used in the future. This technique has the advantage that the shim does not have to be recreated and the block and network devices of the previously stopped microVM can be reused, but limits the amount of microVMs that can be booted from a snapshot to the amount of microVMs that have been offloaded. The full local snapshot mode instead allows loading an arbitrary amount of microVMs from a single snapshot. This is done by creating a new shim and the required block and network devices upon loading a snapshot and creating an extra patch file containing the filesystem differences written by the microVM upon snapshot creation. To enable the full local snapshot functionality, vHive must be run with the `-snapshots` and `-fulllocal` flags. In addition, the full local snapshot mode can be further configured using the following flags:


please split into short lines for more granular version control

ustiugov · 2022-06-07T20:59:44Z

@amohoste could you please update CHANGELOG.md?

Signed-off-by: Amory Hoste <[email protected]>

cvetkovic · 2022-06-21T14:44:53Z

docs/fulllocal_snapshots.md

+# vHive full local snapshots
+
+When using Firecracker as the sandbox technology in vHive, two snapshotting modes are supported: a default mode and a 
+full local mode. The default snapshot mode use an offloading based technique which leaves the shim and other resources 


Explain what is a shim here?

cvetkovic · 2022-06-21T14:47:06Z

docs/fulllocal_snapshots.md

+offloaded. The full local snapshot mode instead allows loading an arbitrary amount of microVMs from a single snapshot. 
+This is done by creating a new shim and the required block and network devices upon loading a snapshot and creating an 
+extra patch file containing the filesystem differences written by the microVM upon snapshot creation. To enable the 
+full local snapshot functionality, vHive must be run with the `-snapshots` and `-fulllocal` flags. In addition, the 


fulllocal looks ugly. Maybe consider full_local.

cvetkovic · 2022-06-21T14:48:26Z

docs/fulllocal_snapshots.md

+reused, but limits the amount of microVMs that can be booted from a snapshot to the amount of microVMs that have been 
+offloaded. The full local snapshot mode instead allows loading an arbitrary amount of microVMs from a single snapshot. 
+This is done by creating a new shim and the required block and network devices upon loading a snapshot and creating an 
+extra patch file containing the filesystem differences written by the microVM upon snapshot creation. To enable the 


Motivate why patching is needed so a user can understand what were the incentives for creating these two modes.

cvetkovic · 2022-06-21T14:49:28Z

docs/fulllocal_snapshots.md

+full local snapshot mode can be further configured using the following flags:
+
+- `isSparseSnaps`: store the memory file as a sparse file to make its storage size closer to the actual size of the memory utilized by the microVM, rather than the memory allocated to the microVM
+- `snapsStorageSize [capacityGiB]`: specify the amount of capacity that can be used to store snapshots


amount of capacity - rewrite

cvetkovic · 2022-06-21T14:50:27Z

docs/fulllocal_snapshots.md

+## Remote snapshots
+
+Rather than only using the snapshots available locally on a node, snapshots can also be transferred between nodes to 
+potentially accelerate cold start times and reduce memory utilization, given that proper mechanisms are in place to 


Improve, not reduce memory utilization.

ustiugov requested changes Feb 9, 2022

View reviewed changes

ustiugov self-requested a review March 8, 2022 12:21

ustiugov requested changes Mar 10, 2022

View reviewed changes

amohoste force-pushed the new_snapshots branch 9 times, most recently from 29882fb to 870cb25 Compare March 13, 2022 15:38

amohoste force-pushed the new_snapshots branch 17 times, most recently from 45bd9b3 to f6a953d Compare March 27, 2022 14:34

amohoste added 9 commits May 16, 2022 21:16

Add device mapper functionality

7687152

Signed-off-by: Amory Hoste <[email protected]>

make VM cpu and memory configurable

c47a81b

Signed-off-by: Amory Hoste <[email protected]>

add netpoolsize to tests

ea44fe2

Signed-off-by: Amory Hoste <[email protected]>

add improved snapshotting functionality

974e3c5

Signed-off-by: Amory Hoste <[email protected]>

add separate option for new snapshots

27e693c

Signed-off-by: Amory Hoste <[email protected]>

Cleanup and integrate PR comments

c78f1cb

Signed-off-by: Amory Hoste <[email protected]>

Address PR remarks

d6827fb

Signed-off-by: Amory Hoste <[email protected]>

Add tests and compatibility with offloaded snapshots

b62820a

Signed-off-by: Amory Hoste <[email protected]>

Add thinpool detection if not specified

862a3a4

Signed-off-by: Amory Hoste <[email protected]>

amohoste force-pushed the new_snapshots branch 2 times, most recently from ed85025 to f5e0aee Compare May 16, 2022 20:35

run image manager and devmapper tests on containerd runner

6c6f752

Signed-off-by: Amory Hoste <[email protected]>

amohoste force-pushed the new_snapshots branch from f5e0aee to 6c6f752 Compare May 16, 2022 20:40

Add docs on fulllocal snapshots

7bde409

Signed-off-by: Amory Hoste <[email protected]>

ustiugov mentioned this pull request May 20, 2022

Deployment of Functions in vHive failing #539

Closed

ustiugov reviewed May 20, 2022

View reviewed changes

amohoste force-pushed the new_snapshots branch from 9adb916 to 4533fe2 Compare June 6, 2022 21:29

ustiugov reviewed Jun 7, 2022

View reviewed changes

ustiugov assigned cvetkovic and unassigned cvetkovic Jun 7, 2022

ustiugov requested a review from cvetkovic June 7, 2022 20:59

ustiugov assigned amohoste Jun 7, 2022

ustiugov added the enhancement New feature or request label Jun 7, 2022

Add docs on full local snapshots

c990d06

Signed-off-by: Amory Hoste <[email protected]>

amohoste force-pushed the new_snapshots branch from 4533fe2 to c990d06 Compare June 12, 2022 16:53

cvetkovic reviewed Jun 21, 2022

View reviewed changes

aditya2803 mentioned this pull request Jul 9, 2022

Deployment of functions failing while using PR 465 #568

Closed

ustiugov mentioned this pull request Aug 7, 2022

Snapshots Issue - Cannot Serve More Than 10 Concurrent Functions #579

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new snapshots functionality #465

Add new snapshots functionality #465

amohoste commented Jan 23, 2022

ustiugov commented Feb 9, 2022

ustiugov left a comment

ustiugov left a comment

ustiugov Apr 25, 2022

ustiugov May 20, 2022

ustiugov commented May 20, 2022

ustiugov left a comment

ustiugov Jun 7, 2022

ustiugov commented Jun 7, 2022

cvetkovic Jun 21, 2022

cvetkovic Jun 21, 2022

cvetkovic Jun 21, 2022

cvetkovic Jun 21, 2022

cvetkovic Jun 21, 2022

		@@ -0,0 +1,25 @@
		# vHive full local snapshots

		When using Firecracker as the sandbox technology in vHive, two snapshotting modes are supported: a default mode and a full local mode. The default snapshot mode use an offloading based technique which leaves the shim and other resources running upon shutting down a microVM such that it can be re-used in the future. This technique has the advantage that the shim does not have to be recreated and the block and network devices of the previously stopped microVM can be reused, but limits the amount of microVMs that can be booted from a snapshot to the amount of microVMs that have been offloaded. The full local snapshot mode instead allows loading an arbitrary amount of microVMs from a single snapshot. This is done by creating a new shim and the required block and network devices upon loading a snapshot and creating an extra patch file containing the filesystem differences written by the microVM upon snapshot creation. To enable the full local snapshot functionality, vHive must be run with the `-snapshots` and `-fulllocal` flags. In addition, the full local snapshot mode can be further configured using the following flags:

Add new snapshots functionality #465

Are you sure you want to change the base?

Add new snapshots functionality #465

Conversation

amohoste commented Jan 23, 2022

Summary

Implementation Notes ⚒️

ustiugov commented Feb 9, 2022

ustiugov left a comment

Choose a reason for hiding this comment

ustiugov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ustiugov commented May 20, 2022

ustiugov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ustiugov commented Jun 7, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment