epoc: cleanup #531

joemfb · 2023-10-04T04:05:52Z

This PR cleans up the new epoch system from #459, fixing some small bugs, plugging leaks, and simplifying the interface to it.

It still needs a final round of crash recovery testing (killing the process at every stage of the intialization/migration, confirming that subsequent restarts proceed as they should).

Resolves #530.

matthew-levan · 2023-10-04T21:09:20Z

Spun up a fakezod with this, and probably unrelated, but getting a lot of:

http: fail (13, 504): I/O error (body; content-length)
http: fail (15, 504): I/O error (body; content-length)

matthew-levan

Thanks for catching my errors and cleaning up the initialization and migration control flows. Happy to see those c3_unlink return code checks fixed, the removal of u3_Host.eve_d, a well-named and well purposed u3_disk_kindly function, and the reduction of exposure of disk functions in the rest of the codebase (like pier.c, for example). Approved.

pkg/vere/disk.c

joemfb · 2023-10-06T20:55:10Z

These last few commits reorder epoc creation and add fsyncs for atomicity (ie, if the epoch version file exists, we know the snapshot was fully copied). Then it modifies how the latest epoch is loaded, deleting it and falling back to the prior epoch if it was not completely initialized.

More granular error handling on load is possible (for instance, copying metadata from the prior epoch if it's the only thing that's missing, which would be more efficient), but these changes handle the base case of all errors where it's clear that we can recover.

matthew-levan · 2023-10-07T00:43:12Z

Awesome, will review first thing Monday.

…

On Fri, Oct 6, 2023 at 4:55 PM Joe Bryan ***@***.***> wrote: These last few commits reorder epoc creation and add fsyncs for atomicity (ie, if the epoch version file exists, we know the snapshot was fully copied). Then it modifies how the latest epoch is loaded, deleting it and falling back to the prior epoch if it was not completely initialized. More granular error handling on load is possible (for instance, copying metadata from the prior epoch if it's the only thing that's missing, which would be more efficient), but these changes handle the base case of all errors where it's clear that we can recover. — Reply to this email directly, view it on GitHub <#531 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AV2DQRCSKO2NKXJ3KKPWIQ3X6BV3TAVCNFSM6AAAAAA5R632COVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTONJRGM4TGMJYGE> . You are receiving this because your review was requested.Message ID: ***@***.***>

matthew-levan

More good stuff. I booted a fresh fakezod with this build. I was able to then use it and crash ./urbit roll zod before it could complete. The contents of the second epoch 0i98, at the point, had only an incomplete copy of the [north|south].bin files. I then tested how both ./urbit roll zod and ./urbit zod behave afterwards, and both were able to recover gracefully (with helpful messages too).

Approved.

P.S. For my edification, why did you implement try_init as a goto?

joemfb · 2023-10-11T22:49:00Z

@matthew-levan thanks for the review. it could've been a loop, but there's only one case that continues around a second time, and that case only retries once. that's kind of a strange pattern of repetition, and it seemed better to be explicit.

This PR resolves a bug that I introduced in #531. I added stronger validation during event log initialization by checking for the presence of metadata, but that was allocating on the loom (in the case of 32-bit or greater ships and/or lives). There are various chicken-and-egg problems with initializing old event logs for replay, so it seems best to just remove this whole operation from the loom.

joemfb added 13 commits October 2, 2023 13:48

u3: fix warnings

d27bfc1

mars: move replay no-op printf

fa94907

vere: fix log initialization, remove spurious "migration" on boot

bf04ea6

vere: fixes epoch runtime-version handling

fbf2492

vere: check epoch version on init

1c208c7

vere: centralizes auto migration/rollover after replay

2abe6e4

vere: removes auto-migration from u3_disk_init()

529874a

vere: moves auto migration/rollover details inside disk.c

222735f

vere: explicitly migrate if needed before chop and roll

6250339

u3: ensure snapshot file descriptors are closed in u3e_stop()

7f84b73

vere: automatically recover from pre-epoc downgrade

0051160

vere: clean up disk migration error handling

b889f77

vere: fix up lmdb lifecycle, plugging leaks

b1cf277

joemfb requested a review from a team as a code owner October 4, 2023 04:05

joemfb changed the title ~~Jb/epoc cleanup~~ epoc: cleanup Oct 4, 2023

joemfb mentioned this pull request Oct 4, 2023

Gracefully handle cases where urbit roll didn't complete successfully #530

Closed

matthew-levan previously approved these changes Oct 6, 2023

View reviewed changes

pkg/vere/disk.c Outdated Show resolved Hide resolved

vere: s/desk/disk/g

b082b4f

joemfb dismissed matthew-levan’s stale review via b082b4f October 6, 2023 14:34

joemfb added 4 commits October 6, 2023 14:27

u3: updates _ce_image_sync to return status

f31afc7

u3: updates u3e_backup to fsync destination images

55c0e82

vere: reorders 3_disk_epoc_init and adds syncs for atomicity

fc7efea

vere: factors sorted epoc list out of chop subcommand

b9e39ba

joemfb force-pushed the jb/epoc-cleanup branch from ca70966 to c7302ed Compare October 6, 2023 21:23

vere: refactors epoc loading, adds fallback to previous when bad

d7ea982

joemfb force-pushed the jb/epoc-cleanup branch from c7302ed to d7ea982 Compare October 7, 2023 17:11

matthew-levan approved these changes Oct 9, 2023

View reviewed changes

pkova merged commit 9bdc1af into develop Oct 11, 2023
5 checks passed

pkova deleted the jb/epoc-cleanup branch October 11, 2023 15:06

joemfb mentioned this pull request Nov 7, 2023

vere: refactors event log metadata reading to avoid the loom #547

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

epoc: cleanup #531

epoc: cleanup #531

joemfb commented Oct 4, 2023 •

edited by matthew-levan

Loading

matthew-levan commented Oct 4, 2023

matthew-levan left a comment

joemfb commented Oct 6, 2023

matthew-levan commented Oct 7, 2023 via email

matthew-levan left a comment

joemfb commented Oct 11, 2023

epoc: cleanup #531

epoc: cleanup #531

Conversation

joemfb commented Oct 4, 2023 • edited by matthew-levan Loading

matthew-levan commented Oct 4, 2023

matthew-levan left a comment

Choose a reason for hiding this comment

joemfb commented Oct 6, 2023

matthew-levan commented Oct 7, 2023 via email

matthew-levan left a comment

Choose a reason for hiding this comment

joemfb commented Oct 11, 2023

joemfb commented Oct 4, 2023 •

edited by matthew-levan

Loading