Add metadata benchmarks #1055

turan18 · 2024-01-29T18:47:38Z

Issue #, if available:

Description of changes:

Add benchmark tests that benchmark metadata DB insertion performance. Added a helper function to generate random TAR file (TOC) with given number of files/entries.

Testing performed:

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

turan18 · 2024-02-05T17:13:22Z

util/testutil/util.go

@@ -181,3 +186,12 @@ func RandomDigest() string {
 	d := digest.FromBytes(RandomByteData(10))
 	return d.String()
 }
+
+// RandString returns a random string of length n
+func RandString(n int) string {


File names must be random since we use them to traverse the fs tree when creating the DB and so RandString does not use our seeded random. We still use a fixed size, so there shouldn't be any variance between runs.

I don't understand this statement.

If we used the seeded random all our filenames would be the same and the only thing differentiating them would be depth level (eg: file vs file/file vs file/file/file).

When looping through the TOC we maintain a map of node ID to metadata entry. The metadata entry has a map of the nodes children where the child name is the key. When we are adding children to the metadata entry of a node we will end up overwriting any existing children if they share the same name, which never happens in practice since you cannot multiple children nodes/files with the same name under a single parent node/directory. This means that our metadata/nodes bucket will not be fully populated, since a parent can really only have 1 child.

To avoid this, we use rand so we can get an actual pseudo random string. We still use a fixed length of 10 for the filename, to ensure their isn't any variance in bbolt write performance between benchmark runs. (bbolt doesn't care about the content of a KV pair since they are just interpreted as byte slices; the length, however, does matter since it controls how nodes/pages are split before writing to disk).

turan18 · 2024-02-07T14:28:01Z

metadata/reader_test.go

 	}
-	defer os.Remove(f.Name())
-	db, err := bolt.Open(f.Name(), 0600, nil)
+	cwdPath, err := os.Getwd()


Note: Since we want to measure write performance to disk, we have to write to non tmpfs location

sondavidb

Generally LGTM, a lot of minor changes that I just want a bit of attention on before approving. Overall the functionality looks great and I think it's a pretty cool addition to our testing suite.

metadata/reader_test.go

metadata/util_test.go

metadata/reader_test.go

Kern-- · 2024-02-14T00:26:25Z

util/testutil/util.go

@@ -181,3 +186,12 @@ func RandomDigest() string {
 	d := digest.FromBytes(RandomByteData(10))
 	return d.String()
 }
+
+// RandString returns a random string of length n
+func RandString(n int) string {


I don't understand this statement.

util/testutil/tar.go

Add benchmarks functions that benchmark sequential and concurrent writes to the underlying metadata db. Signed-off-by: Yasin Turan <[email protected]>

turan18 force-pushed the add_metadata_benchmarks branch 15 times, most recently from c7bfdda to d5bba30 Compare February 5, 2024 16:48

turan18 marked this pull request as ready for review February 5, 2024 16:53

turan18 requested a review from a team as a code owner February 5, 2024 16:53

turan18 commented Feb 5, 2024

View reviewed changes

turan18 force-pushed the add_metadata_benchmarks branch from d5bba30 to df9ec63 Compare February 7, 2024 14:26

turan18 commented Feb 7, 2024

View reviewed changes

turan18 force-pushed the add_metadata_benchmarks branch from df9ec63 to 8f8eb12 Compare February 7, 2024 14:36

sondavidb reviewed Feb 8, 2024

View reviewed changes

turan18 force-pushed the add_metadata_benchmarks branch 2 times, most recently from f2d02b9 to 692b984 Compare February 13, 2024 21:36

Kern-- reviewed Feb 14, 2024

View reviewed changes

turan18 force-pushed the add_metadata_benchmarks branch from 692b984 to 86bf1fa Compare February 14, 2024 02:00

Add metadata reader benchmarks

9ffd6e7

Add benchmarks functions that benchmark sequential and concurrent writes to the underlying metadata db. Signed-off-by: Yasin Turan <[email protected]>

turan18 force-pushed the add_metadata_benchmarks branch from 86bf1fa to 9ffd6e7 Compare February 16, 2024 19:50

sondavidb approved these changes Feb 16, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add metadata benchmarks #1055

Add metadata benchmarks #1055

turan18 commented Jan 29, 2024 •

edited

Loading

turan18 Feb 5, 2024

Kern-- Feb 14, 2024

turan18 Feb 16, 2024

turan18 Feb 7, 2024

sondavidb left a comment •

edited

Loading

Kern-- Feb 14, 2024

Add metadata benchmarks #1055

Are you sure you want to change the base?

Add metadata benchmarks #1055

Conversation

turan18 commented Jan 29, 2024 • edited Loading

turan18 Feb 5, 2024

Choose a reason for hiding this comment

Kern-- Feb 14, 2024

Choose a reason for hiding this comment

turan18 Feb 16, 2024

Choose a reason for hiding this comment

turan18 Feb 7, 2024

Choose a reason for hiding this comment

sondavidb left a comment • edited Loading

Choose a reason for hiding this comment

Kern-- Feb 14, 2024

Choose a reason for hiding this comment

turan18 commented Jan 29, 2024 •

edited

Loading

sondavidb left a comment •

edited

Loading