Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENG-2572: VectorClockIds are a tuple. Editing changesets are gone #3988

Merged
merged 10 commits into from
Jul 11, 2024

Conversation

zacharyhamm
Copy link
Contributor

@zacharyhamm zacharyhamm commented Jun 17, 2024

This change mitigates the unbounded growth of vector clocks by using the change set id + the actor id as the vector clock id. Ephemeral "editing change sets" are gone, although system services get a randomly generated actor id for the duration of their DalContext. It also prunes more vector clocks, now including write clocks, on rebase, ensuring our vector clock entries remain small.

WorkspaceSnapshot method signatures have been refactored to take the vector clock id directly, instead of assuming the change set and vector clock id are the same.

All snapshots for change sets that are not applied will be migrated on SDF boot, so deploying this to prod may cause some downtime.

@zacharyhamm
Copy link
Contributor Author

bors try

si-bors-ng bot added a commit that referenced this pull request Jun 17, 2024
@si-bors-ng
Copy link
Contributor

si-bors-ng bot commented Jun 17, 2024

try

Build failed:

@zacharyhamm
Copy link
Contributor Author

bors try

si-bors-ng bot added a commit that referenced this pull request Jun 17, 2024
@si-bors-ng
Copy link
Contributor

si-bors-ng bot commented Jun 17, 2024

try

Build failed:

@zacharyhamm zacharyhamm force-pushed the zack/ENG-2572-vector-clock-is-a-tuple branch from 9f6710a to 70b8a3b Compare June 17, 2024 14:40
@stack72
Copy link
Contributor

stack72 commented Jun 17, 2024

bors try

si-bors-ng bot added a commit that referenced this pull request Jun 17, 2024
@si-bors-ng
Copy link
Contributor

si-bors-ng bot commented Jun 17, 2024

try

Build failed:

@zacharyhamm zacharyhamm force-pushed the zack/ENG-2572-vector-clock-is-a-tuple branch 4 times, most recently from b7e532a to 5f8fe8b Compare June 20, 2024 17:47
@zacharyhamm zacharyhamm changed the title wip: it compiles, and some tests pass wip: tests pass Jun 20, 2024
@zacharyhamm
Copy link
Contributor Author

bors try

si-bors-ng bot added a commit that referenced this pull request Jun 20, 2024
@zacharyhamm zacharyhamm force-pushed the zack/ENG-2572-vector-clock-is-a-tuple branch from 5f8fe8b to 2794b81 Compare June 20, 2024 17:55
@si-bors-ng
Copy link
Contributor

si-bors-ng bot commented Jun 20, 2024

try

Build succeeded:

@zacharyhamm zacharyhamm force-pushed the zack/ENG-2572-vector-clock-is-a-tuple branch from 2794b81 to 055406c Compare June 20, 2024 18:09
@zacharyhamm zacharyhamm changed the title wip: tests pass ENG-2572: VectorClockIds are a tuple. Editing changesets are gone Jun 20, 2024
@zacharyhamm zacharyhamm force-pushed the zack/ENG-2572-vector-clock-is-a-tuple branch 3 times, most recently from 5059dfd to 41a8c60 Compare June 26, 2024 13:59
@zacharyhamm zacharyhamm force-pushed the zack/ENG-2572-vector-clock-is-a-tuple branch 5 times, most recently from 4345b56 to 4f21b25 Compare July 9, 2024 16:38
Copy link

netlify bot commented Jul 10, 2024

Deploy Preview for systeminit-tools ready!

Name Link
🔨 Latest commit bfc952c
🔍 Latest deploy log https://app.netlify.com/sites/systeminit-tools/deploys/668eb8044e884700088352c1
😎 Deploy Preview https://deploy-preview-3988--systeminit-tools.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@zacharyhamm zacharyhamm force-pushed the zack/ENG-2572-vector-clock-is-a-tuple branch from bfc952c to a21e3b1 Compare July 11, 2024 16:07
zacharyhamm and others added 7 commits July 11, 2024 11:09
Previously, vector clock ids were the same as change set ids. And, we
generated a "editing change set" anytime we mutated the graph. This
changeset was ephemeral, and not connected to a real "change set" in the
system. In addition, for conflict detection to work correctly, the node
write clocks have to store every vector clock write that has *ever*
happened to them. This meant these clocks would grow indefinitely, since
they have to store every ephemeral "editing change set" in the node,
forever.

This change transforms the vector clock id into a tuple of the real
ChangeSetId and the UserPk/ActorId of the current user. In the context
of system actors, like Pinga and the Rebaser, the WorkspacePk is used in
place of the UserPk and removes editing change sets entirely. Now the
bound on the vector clock write clocks in node weights is the number of
change sets and users in the system, which will grow much more slowly
than the editing change sets.

This is a breaking change, since it changes both Node and Edge weight
data strucutres. Migration must be in place before this can be deployed.
Whenever we don't have HistoryActor, generate an actor id that lasts for
the current DalContext and use that for the vector clock id's actor id.
But, when the rebaser writes out the final snapshot, use the workspace
pk for the actor id.

Co-Authored-By: Jacob Helwig <[email protected]>
On SDF boot, attempt to automatically migrate all snapshots for a
deployment, beginning with the builtin workspace's snapshot. Follows the
"based_on_change_set_id" paths, treating the snapshots as a dependency
graph, so that shared clock ids are migrated to the new clock ids
correctly.

Once this code is deployed, SDF will panic if it encounters a 'legacy'
snapshot.

Co-Authored-By: Jacob Helwig <[email protected]>
@zacharyhamm zacharyhamm force-pushed the zack/ENG-2572-vector-clock-is-a-tuple branch from a21e3b1 to 6bd2768 Compare July 11, 2024 16:09
zacharyhamm and others added 2 commits July 11, 2024 11:34
If a table's structure changes, cached query plans against that table
need to be invalidated, or postgresql will return an error. This change
prevents that error after migrating the database in a production system
running pb_bouncer, which holds on to connections and reuses them even
if our services are restarted. We could avoid needing to discard plans
by selecting exactly the columns we need instead of SELECT * (unless
the column type changes!)

This issue never hit us before because we haven't changed table
structures much since adding pg_bouncer to the stack.

Co-Authored-By: Jacob Helwig <[email protected]>
@zacharyhamm zacharyhamm marked this pull request as ready for review July 11, 2024 18:13
@zacharyhamm zacharyhamm added this pull request to the merge queue Jul 11, 2024
Merged via the queue into main with commit 9ac1a8a Jul 11, 2024
8 checks passed
@zacharyhamm zacharyhamm deleted the zack/ENG-2572-vector-clock-is-a-tuple branch July 11, 2024 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants