-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metadata benchmarks #1055
base: main
Are you sure you want to change the base?
Add metadata benchmarks #1055
Conversation
c7bfdda
to
d5bba30
Compare
@@ -181,3 +186,12 @@ func RandomDigest() string { | |||
d := digest.FromBytes(RandomByteData(10)) | |||
return d.String() | |||
} | |||
|
|||
// RandString returns a random string of length n | |||
func RandString(n int) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
File names must be random since we use them to traverse the fs tree when creating the DB and so RandString does not use our seeded random. We still use a fixed size, so there shouldn't be any variance between runs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we used the seeded random all our filenames would be the same and the only thing differentiating them would be depth level (eg: file
vs file/file
vs file/file/file
).
When looping through the TOC we maintain a map of node ID to metadata entry. The metadata entry has a map of the nodes children where the child name is the key. When we are adding children to the metadata entry of a node we will end up overwriting any existing children if they share the same name, which never happens in practice since you cannot multiple children nodes/files with the same name under a single parent node/directory. This means that our metadata/nodes bucket will not be fully populated, since a parent can really only have 1 child.
To avoid this, we use rand so we can get an actual pseudo random string. We still use a fixed length of 10 for the filename, to ensure their isn't any variance in bbolt write performance between benchmark runs. (bbolt doesn't care about the content of a KV pair since they are just interpreted as byte slices; the length, however, does matter since it controls how nodes/pages are split before writing to disk).
d5bba30
to
df9ec63
Compare
} | ||
defer os.Remove(f.Name()) | ||
db, err := bolt.Open(f.Name(), 0600, nil) | ||
cwdPath, err := os.Getwd() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Since we want to measure write performance to disk, we have to write to non tmpfs location
df9ec63
to
8f8eb12
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM, a lot of minor changes that I just want a bit of attention on before approving. Overall the functionality looks great and I think it's a pretty cool addition to our testing suite.
f2d02b9
to
692b984
Compare
@@ -181,3 +186,12 @@ func RandomDigest() string { | |||
d := digest.FromBytes(RandomByteData(10)) | |||
return d.String() | |||
} | |||
|
|||
// RandString returns a random string of length n | |||
func RandString(n int) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this statement.
692b984
to
86bf1fa
Compare
Add benchmarks functions that benchmark sequential and concurrent writes to the underlying metadata db. Signed-off-by: Yasin Turan <[email protected]>
86bf1fa
to
9ffd6e7
Compare
Issue #, if available:
Description of changes:
Add benchmark tests that benchmark metadata DB insertion performance. Added a helper function to generate random TAR file (TOC) with given number of files/entries.
Testing performed:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.