Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement an early-decrypting check #5171

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 24 additions & 15 deletions src/borg/archive.py
Original file line number Diff line number Diff line change
Expand Up @@ -1386,13 +1386,9 @@ def __next__(self):

class ArchiveChecker:

def __init__(self):
self.error_found = False
self.possibly_superseded = set()

def check(self, repository, repair=False, archive=None, first=0, last=0, sort_by='', glob=None,
def __init__(self, repository, repair=False, archive=None, first=0, last=0, sort_by='', glob=None,
verify_data=False, save_space=False):
"""Perform a set of checks on 'repository'
"""A checker of repostiory archives
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo


:param repair: enable repair mode, write updated or corrected data into repository
:param archive: only check this archive
Expand All @@ -1401,32 +1397,45 @@ def check(self, repository, repair=False, archive=None, first=0, last=0, sort_by
:param verify_data: integrity verification of data referenced by archives
:param save_space: Repository.commit(save_space)
"""
logger.info('Starting archive consistency check...')
self.check_all = archive is None and not any((first, last, glob))
self.error_found = False
self.verify_data = verify_data
self.possibly_superseded = set()
self.repair = repair
self.repository = repository
self.archive = archive
self.first = first
self.last = last
self.glob = glob
self.check_all = archive is None and not any((first, last, glob))
self.sort_by = sort_by
self.save_space = save_space
self.init_chunks()
self.key = self.identify_key(repository)
Comment on lines 1412 to +1413
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guess these 2 need checking.

do these have any unwanted consequences when run before the repo check has happened?

logger.info('Initialised archive consistency checker...')

def check(self):
"""Perform a set of checks on 'repository'"""
logger.info('Starting archive consistency check...')
if not self.chunks:
logger.error('Repository contains no apparent data at all, cannot continue check/repair.')
return False
self.key = self.identify_key(repository)
if verify_data:
self.verify_data()
if self.verify_data:
self.do_verify_data()
if Manifest.MANIFEST_ID not in self.chunks:
logger.error("Repository manifest not found!")
self.error_found = True
self.manifest = self.rebuild_manifest()
else:
try:
self.manifest, _ = Manifest.load(repository, (Manifest.Operation.CHECK,), key=self.key)
self.manifest, _ = Manifest.load(self.repository, (Manifest.Operation.CHECK,), key=self.key)
except IntegrityErrorBase as exc:
logger.error('Repository manifest is corrupted: %s', exc)
self.error_found = True
del self.chunks[Manifest.MANIFEST_ID]
self.manifest = self.rebuild_manifest()
self.rebuild_refcounts(archive=archive, first=first, last=last, sort_by=sort_by, glob=glob)
self.rebuild_refcounts(archive=self.archive, first=self.first, last=self.last, sort_by=self.sort_by, glob=self.glob)
self.orphan_chunks_check()
self.finish(save_space=save_space)
self.finish(save_space=self.save_space)
if self.error_found:
logger.error('Archive consistency check complete, problems found.')
else:
Expand Down Expand Up @@ -1459,7 +1468,7 @@ def identify_key(self, repository):
cdata = repository.get(some_chunkid)
return key_factory(repository, cdata)

def verify_data(self):
def do_verify_data(self):
Comment on lines -1462 to +1471
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you rename this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as we now have a boolean flag we need to save to self:

def __init__(self, ..., verify_data=False):
    ...
    self.verify_data = verify_data

logger.info('Starting cryptographic data integrity verification...')
chunks_count_index = len(self.chunks)
chunks_count_segments = 0
Expand Down
18 changes: 12 additions & 6 deletions src/borg/archiver.py
Original file line number Diff line number Diff line change
Expand Up @@ -327,15 +327,21 @@ def do_check(self, args, repository):
# also, there is no max_duration support in the archives check code anyway.
self.print_error("--repository-only is required for --max-duration support.")
return EXIT_ERROR
checker = None
if args.prefix is not None:
args.glob_archives = args.prefix + '*'
if not args.repo_only:
try:
checker = ArchiveChecker(repository, repair=args.repair, archive=args.location.archive, first=args.first,
last=args.last, sort_by=args.sort_by or 'ts', glob=args.glob_archives,
verify_data=args.verify_data, save_space=args.save_space)
except Exception as exc:
self.print_error(str(exc))
return EXIT_WARNING
if not args.archives_only:
if not repository.check(repair=args.repair, save_space=args.save_space, max_duration=args.max_duration):
Comment on lines +335 to 342
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the potential fundamental problem with first accessing the repo inside ArchiveChecker.__init__ before doing repository.check is that we do not know whether we are working on valid data.

The repo check / repair makes sure that the fundamental data is ok (in memory and in repair mode also on disk).
The archives check / repair builds upon that (and assumes that lower level structures are ok).

So, guess there needs to be pretty much review before touching repo data before the repo has been checked.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, and that's what I feared. Is there some way to verify only the chunks that contain the metadata needed to decrypt in ArchiveChecker.__init__(), so we know that the crypto & manifest & whatever other metadata is kosher, but don't spend 5 hours doing all of repo/data/*

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, a "miniature version" of the borg repo check, that only makes sure that 1 chunk is ok (which could be the manifest (id 0) or any other chunk). And then detect crypto based on that chunk.

that might work, needs checking...

return EXIT_WARNING
if args.prefix is not None:
args.glob_archives = args.prefix + '*'
if not args.repo_only and not ArchiveChecker().check(
repository, repair=args.repair, archive=args.location.archive,
first=args.first, last=args.last, sort_by=args.sort_by or 'ts', glob=args.glob_archives,
verify_data=args.verify_data, save_space=args.save_space):
if not args.repo_only and not checker.check():
return EXIT_WARNING
return EXIT_SUCCESS

Expand Down