-
Notifications
You must be signed in to change notification settings - Fork 43
My original use case (old README.md)
This page contains the original use-case description that was moved from the README.md
, as project's scope and features expanded.
I use rdup to create backup archive, and syncthing to duplicate my backups over a lot of systems. Some of them are more trusted (desktops with disk-level encryption, firewalls, stored in the vault etc.), and some not so much (semi-personal laptops, phones etc.)
As my backups tend to contain a lot of shared data (even backups taken on different systems), it makes perfect sense to deduplicate them.
However I don't want one of my hosts being physically or remotely compromised, give access to data inside all my backups from all my systems. Existing deduplication software like ddar or zbackup provide encryption, but only symmetrical (zbackup issue, ddar issue) which means you have to share the same key on all your hosts and one compromised system gives access to all your backup data.
To fill the missing piece in my master backup plan, I've decided to write it myself using my beloved Rust programming language.
With time the projects grown into improvements and features to support other use cases.
rdedup
works very much like zbackup and other deduplication software
with a little twist:
- Thanks to public key cryptography, secure passpharse is required only when restoring data, while adding and deduplicating new data does not.
- Everything is synchronization friendly. Dropbox, Syncthing and similar should work fine for data synchronization.
When storing data, rdedup
will split it into smaller pieces - chunks - using
rolling sum, and store each chunk under unique id (sha256 digest) in a
special format directory: repo. Then the whole backup will be described as
index: a list of digests.
Index will be stored internally just like the data itself. Recursively, this reduces each backup to one unique digest, which is saved under given name.
When restoring data, rdedup
will read the index, then restore the data, reading
each chunk listed in it.
Thanks to rolling sum chunking scheme, when saving frequently similar data, a lot of common chunks will be reused, saving space.
What makes rdedup
unique, is that every time new repo directory is created,
a pair of keys (public and secret) is generated. Public key is saved in the
storage directory in plain text, while secret key is encrypted with key
derived from a passphrase.
Every time rdedup
saves a new chunk file, its data is encrypted using public
key so it can only be decrypted using the corresponding secret key. This way
new data can always be added, with full deduplication, while only restoring
data requires providing the passphrase to unlock the private key.
Nice little detail: rdedup
supports removing old names and no longer
needed chunks (garbage collection) without passphrase. Only the data chunks
are encrypted, making operations like garbage collection safe even on untrusted
machines.
- bup methods of splitting files into chunks is used
- sha256 sum of chunk data is used as digest id
-
libsodium's sealed boxes are used for encryption/decryption:
- ephemeral keys are used for sealing
- chunk digest is used as nonce
- private key is encrypted using libsodium
crypto secretbox
using random nonce, and key derived from passphrase using password hashing and random salt