Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GCP implementation of dedup storage #199

Merged
merged 1 commit into from
Sep 9, 2024

Conversation

AlCutter
Copy link
Collaborator

@AlCutter AlCutter commented Sep 3, 2024

This PR adds a first cut of a GCP Spanner based dedup storage for SCTFE.

The creation of the Spanner DB is expected to be handled externally. I imagine we will probably eventually have some reference terraform/terragrunt configs for the SCTFE on GCP, so this could be done there for example.

Towards #88

@AlCutter AlCutter requested a review from phbnf September 3, 2024 17:40
@AlCutter AlCutter force-pushed the gcp_dedup_for_sctfe branch 2 times, most recently from 9eb025c to 8414578 Compare September 4, 2024 10:19
@AlCutter
Copy link
Collaborator Author

AlCutter commented Sep 4, 2024

(Rebased onto main now that #188 is in.)

m := make([]*spanner.MutationGroup, 0, len(entries))
for _, e := range entries {
m = append(m, &spanner.MutationGroup{
Mutations: []*spanner.Mutation{spanner.Insert("IDSeq", []string{"id", "h", "idx"}, []interface{}{0, e.LeafID, int64(e.Idx)})},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this check whether it's overriding a smaller index?
I didn't specify this on the interface, but I did this for bbolt. I don't think it's a strong requirement, but if it's not too expensive (it might be?), might be worth having it? How about we put it for consistency (between implementation, and knowing that deduplication will always converge on the smaller index), and remove it later if it hurts performance?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that can happen as this deliberately uses an Insert mutation which will fail if the key already exists.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There could be two frontends that receive the same entry. Frontend A would get idx 1, the request on frontend B would get idx 2. But frontend B could put it in the deduplication database before frontend A, so the deduplication index will remain 2 forever.
It's not the end of the world, and if it hurts performance, I don't think it would be worth it, but I got from our recent conversation that it would be nice for the log to converge on the smallest possible index for a given leaf.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, yeah - ok, how about if we merge this as is for now, and I'll send another PR with the "update if lower" algo? That way we've got a baseline + patch for ease of diff/rollback/etc.

@AlCutter AlCutter merged commit b56842e into transparency-dev:main Sep 9, 2024
12 checks passed
@AlCutter AlCutter deleted the gcp_dedup_for_sctfe branch September 9, 2024 09:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants