-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GCP implementation of dedup storage #199
Conversation
9eb025c
to
8414578
Compare
8414578
to
fc8ca10
Compare
(Rebased onto |
m := make([]*spanner.MutationGroup, 0, len(entries)) | ||
for _, e := range entries { | ||
m = append(m, &spanner.MutationGroup{ | ||
Mutations: []*spanner.Mutation{spanner.Insert("IDSeq", []string{"id", "h", "idx"}, []interface{}{0, e.LeafID, int64(e.Idx)})}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this check whether it's overriding a smaller index?
I didn't specify this on the interface, but I did this for bbolt. I don't think it's a strong requirement, but if it's not too expensive (it might be?), might be worth having it? How about we put it for consistency (between implementation, and knowing that deduplication will always converge on the smaller index), and remove it later if it hurts performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that can happen as this deliberately uses an Insert
mutation which will fail if the key already exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There could be two frontends that receive the same entry. Frontend A would get idx 1, the request on frontend B would get idx 2. But frontend B could put it in the deduplication database before frontend A, so the deduplication index will remain 2 forever.
It's not the end of the world, and if it hurts performance, I don't think it would be worth it, but I got from our recent conversation that it would be nice for the log to converge on the smallest possible index for a given leaf.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yeah - ok, how about if we merge this as is for now, and I'll send another PR with the "update if lower" algo? That way we've got a baseline + patch for ease of diff/rollback/etc.
This PR adds a first cut of a GCP Spanner based dedup storage for SCTFE.
The creation of the Spanner DB is expected to be handled externally. I imagine we will probably eventually have some reference terraform/terragrunt configs for the SCTFE on GCP, so this could be done there for example.
Towards #88