Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store backfill_processes status for protection against interruptions #29

Merged
merged 12 commits into from
Jun 21, 2024

Conversation

bitwiseguy
Copy link
Contributor

@bitwiseguy bitwiseguy commented Jun 17, 2024

Description

There is currently a failure mode where if the blob-archiver gets restarted before a backfill process completes, there is a gap in blob slots stored that will go undetected unless there is an external tool (i.e. blob-validator) running to catch the missing slots.

This PR introduces a backfill_processes file/object that is stored in persistent memory (either s3 or a file). That object contains information about any ongoing backfill processes so that they will be resumed during the blob-archiver startup routine. This reduces the dependence on external tools.

This change is meant to be backward compatible with existing blob-archivers.

Additional context

Changes made:

  • added new storage.WriteBackfillProcesses method
  • added new storage.ReadBackfillProcesses method
  • renamed storage.Write --> storage.WriteBlobs to distinguish from the new storage.WriteBackfillProcesses
  • renamed storage.Read --> storage.ReadBlobs to distinguish from the new storage.ReadBackfillProcesses
  • added archiver test: TestArchiver_BackfillFinishOldProcess
  • during storage initialization, if a backfill_processes file/object does not exist, it will create an empty one
  • updated archiver.backfillBlobs method to utilize the new storage methods and loop through all new and old backfill processes, completing them one-by-one. When all processes are completed, the content in the backfill_processes file in storage should just be an empty mapping

Open questions

  • Should the new backfill_processes functionality be hidden behind a feature flag and remain off-by-default?
  • Does it matter that the order in which backfill_processes are completed is not based on the slot number (i.e. the highest or lowest slot number is not what determines which process is done first)? Rather, the hash of the block header and its order in a mapping is what determines the order of processing.

@danyalprout danyalprout self-requested a review June 17, 2024 13:56
common/storage/s3.go Outdated Show resolved Hide resolved
archiver/service/archiver.go Outdated Show resolved Hide resolved
@danyalprout danyalprout merged commit 988d545 into base-org:master Jun 21, 2024
4 checks passed
@bitwiseguy bitwiseguy deleted the ss/finish-old-backfills branch June 21, 2024 01:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants