-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
local cas: achieve chunk deduplication for nydus. #1399
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #1399 +/- ##
==========================================
- Coverage 62.70% 61.81% -0.90%
==========================================
Files 123 125 +2
Lines 43248 44530 +1282
Branches 43248 44530 +1282
==========================================
+ Hits 27120 27526 +406
- Misses 14817 15660 +843
- Partials 1311 1344 +33
|
8383358
to
4be06b9
Compare
10ee6ca
to
0292485
Compare
@xwb1136021767 Sorry for the delayed reply, please help to add some docs about background and usage for users. |
@@ -29,6 +29,8 @@ pub struct ConfigV2 { | |||
pub cache: Option<CacheConfigV2>, | |||
/// Configuration information for RAFS filesystem. | |||
pub rafs: Option<RafsConfigV2>, | |||
/// Configuration information for image deduplication. | |||
pub deduplication: Option<DeduplicationConfigV2>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add smoke test and update doc
timing_tracer!( | ||
{ | ||
tree.walk_bfs(true, &mut |n| { | ||
n.lock_node().dedup_chunk_for_node( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_chunk_ofs
seems not implemented with v5. Is dedup_chunk_for_node
able to work with v5? If not please use is_v6
to avoid it.
) | ||
) | ||
.subcommand( | ||
App::new("dedup") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Smoke test is required. Please also update test doc.
@@ -1611,4 +1750,89 @@ mod tests { | |||
assert_eq!(size, iovec.size()); | |||
assert_eq!(chunk_count, iovec.len() as u32); | |||
} | |||
|
|||
#[test] | |||
fn test_serialize_cipher_object() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assertions is necessary that ensures cipher ok.
storage/src/device.rs
Outdated
println!("{}", algo); | ||
println!("origin blob_info: {:?}", blob_info); | ||
|
||
println!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove these debug information for merge
This is a particularly cool feature that I think would benefit everyone who deals with multiple random images that essentially share the same stuff thats used at runtime (e.g. |
f2368b1
to
33b2206
Compare
837b6e7
to
b7f1f3d
Compare
@xwb1136021767 Can we also give a performance test result in the doc to give users confidence? |
@@ -17,6 +17,7 @@ libc = "0.2" | |||
log = "0.4" | |||
lz4-sys = "1.9.4" | |||
lz4 = "1.24.0" | |||
rusqlite = { version = "0.29.0", features = ["bundled"] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please update to latest "0.30.0".
// create blob and chunk table for nydus v5 | ||
conn.execute( | ||
"CREATE TABLE IF NOT EXISTS BlobInfos_V5 ( | ||
BlobId TEXT NOT NULL PRIMARY KEY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shoud it be CHAR(64)
?
(), | ||
)?; | ||
conn.execute( | ||
"CREATE INDEX IF NOT EXISTS BlobIndex_V5 ON BlobInfos_V5(BlobId)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BlobInfos_V5(BlobId)
is a primary key, should it have already been indexed?
ChunkId INTEGER PRIMARY KEY, | ||
ChunkDigestValue TEXT NOT NULL, | ||
ChunkInfo TEXT NOT NULL, | ||
BlobId TEXT NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we use an auto-incr integer as primary key for BlobInfos_V5, so we can store an integer instead of a text here.
let cas_mgr = CasMgr::new(path).unwrap(); | ||
let vec = cas_mgr.get_all_blobs(false).unwrap(); | ||
|
||
println!("v5 blobs"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The println!
lines should be removed
This idea is warmly welcomed:) |
Add the dedup option to nydus-image to support block level deduplication between local images. A local database cas.db was introduced, which records all chunks saved locally. Then We can redirect the chunks to those that already exist locally to avoid duplicate downloads. There is still some work needs to be done, including: 1. Insert the on-demand downloaded chunk into the database to ensure that the database stores locally existing chunks. 2. Snapshotter's GC mechanism with the reference relationship of bootstrap after deduplication. Signed-off-by: Huang Jianan <[email protected]> Signed-off-by: xwb1136021767 <[email protected]>
The previous version of local cas was static dedup, which only modified the chunk information in bootstrap. There is a serious problem: it may reuse chunks that cannot be obtained by the backend of the current image, resulting in the container being unable to load the corresponding chunk data on demand during runtime. To address this issue, dynamic dedup was introduced. When nydusd initializes the blob cache, it reads the corresponding backend configuration information of the blob from the CAS database, enabling the blob cache to read chunk data from other backend. Signed-off-by: xwb1136021767 <[email protected]>
…namic dedup. The mismatch between dynamic dedup and nydus' chunk amplification can result in a larger cache size after dedup than without dedup. Because chunk amplification can cause reused chunks to be pulled multiple times, resulting in a larger cache size after dedup is enabled than when dedup is not enabled. To address this issue, a dedup_bitmap was introduced. When initializing rafs, dedup_bitmap is generated based on the chunk information in blob. The determination of whether a chunk in a blob is ready requires both the chunk map and deduplication bitmap to make a joint decision. Signed-off-by: xwb1136021767 <[email protected]>
Signed-off-by: xwb1136021767 <[email protected]>
@xwb1136021767 When reviewing the PR, I have worked out some enhancements/fixes over your work. How about holding on for while, then I will try to submit my patches to your repo? |
Thank you so much for your help! |
true | ||
} | ||
|
||
pub fn get_enable(&self) -> bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a pub field, still need this? Also better be is_enable
.
/// Whether to enable image dedup | ||
#[serde(default)] | ||
pub enable: bool, | ||
/// Work fir for image dedup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo dir
.
@@ -2,16 +2,34 @@ | |||
// Copyright (C) 2023 Alibaba Cloud. All rights reserved. | |||
// | |||
// SPDX-License-Identifier: Apache-2.0 | |||
|
|||
use anyhow::{Context, Error, Result}; | |||
#![allow(unused_variables, unused_imports)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this?
} else { | ||
(old_blob_table_offset, bootstrap_end) | ||
}; | ||
//write rs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add space after //
.
sb.set_extended_blob_table_entries(u32::try_from(blob_table.extended.entries())?); | ||
writer.seek(SeekFrom::Start(0))?; | ||
sb.store(writer).context("failed to store superblock")?; | ||
//rewrite blob table |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
.context("failed to write 0 to padding of bootstrap's end")?; | ||
writer.flush().context("failed to flush bootstrap")?; | ||
|
||
//prepare devtable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
@@ -0,0 +1,166 @@ | |||
// // Copyright (C) 2022 Alibaba Cloud. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Copyright (C) 2023 Alibaba Cloud. All rights reserved.
Arg::new("output-bootstrap") | ||
.long("output") | ||
.short('O') | ||
.help("Bootstrap to output, default is source bootstrap add suffix .debup") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo dedup
.
Some(s) => PathBuf::from(s), | ||
}; | ||
|
||
//config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
@@ -0,0 +1,450 @@ | |||
// Copyright (C) 2022 Alibaba Cloud. All rights reserved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto.
// create blob and chunk table for nydus v5 | ||
conn.execute( | ||
"CREATE TABLE IF NOT EXISTS BlobInfos_V5 ( | ||
BlobId TEXT NOT NULL PRIMARY KEY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better be blob_id, same below.
conn: Connection, | ||
} | ||
|
||
impl CasMgr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we implement trait ChunkDict
for CasMgr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't understand why we need to rebuild a new bootstrap, can we do runtime de-duplication by enhancing the blob cache? i.e. for example using a global (chunk_digest, chunk_info) db records.
I have also considered the solution you mentioned before. My previous concern was whether runtime deduplication would make the CAS database a performance bottleneck in IO intensive scenarios. There are indeed problems with the current solution, and personally, I think the biggest problem is how to combine it with the GC mechanism of the image, especially in localfs or localdisk mode, because nydus snapshot cannot prevent the deletion operation of the image. |
This may require some testing, full dedup and rebuilding the whole bootstrap (not only chunk infos?) on booting may also affect cold start performance. Either full dedup or on-demand dedup, there will be GC issues as containerd checks if an image is referencing a blob based on the image manifest, references: |
A simplified version to implement chunk deduplication at runtime instead of at build time. |
Relevant Issue (if applicable)
If there are Issues related to this PullRequest, please list it.
Details
Original version from #956.
The previous version of local cas was static dedup, which only modified the chunk information in bootstrap. There is a serious problem: it may reuse chunks that cannot be obtained by the backend of the current image, resulting in the container being unable to load the corresponding chunk data on demand during runtime. To address this issue, dynamic dedup was introduced. When nydusd initializes the blob cache, it reads the corresponding backend configuration information of the blob from the CAS database, enabling the blob cache to read chunk data from other backend.
What's more, the mismatch between dynamic dedup and nydus' chunk amplification can result in a larger cache size after dedup than without dedup. Because chunk amplification can cause reused chunks to be pulled multiple times, resulting in a larger cache size after dedup is enabled than when dedup is not enabled. To address this issue, a dedup_bitmap was introduced. When initializing rafs, dedup_bitmap is generated based on the chunk information in blob. The determination of whether a chunk in a blob is ready requires both the chunk map and deduplication bitmap to make a joint decision.
Types of changes
What types of changes does your PullRequest introduce? Put an
x
in all the boxes that apply:Checklist
Go over all the following points, and put an
x
in all the boxes that apply.