Skip to content

My original use case (old README.md)

Dawid Ciężarkiewicz edited this page Jul 29, 2017 · 1 revision

This page contains the original use-case description that was moved from the README.md , as project's scope and features expanded.

My use case

I use rdup to create backup archive, and syncthing to duplicate my backups over a lot of systems. Some of them are more trusted (desktops with disk-level encryption, firewalls, stored in the vault etc.), and some not so much (semi-personal laptops, phones etc.)

As my backups tend to contain a lot of shared data (even backups taken on different systems), it makes perfect sense to deduplicate them.

However I don't want one of my hosts being physically or remotely compromised, give access to data inside all my backups from all my systems. Existing deduplication software like ddar or zbackup provide encryption, but only symmetrical (zbackup issue, ddar issue) which means you have to share the same key on all your hosts and one compromised system gives access to all your backup data.

To fill the missing piece in my master backup plan, I've decided to write it myself using my beloved Rust programming language.

With time the projects grown into improvements and features to support other use cases.

How it works

rdedup works very much like zbackup and other deduplication software with a little twist:

  • Thanks to public key cryptography, secure passpharse is required only when restoring data, while adding and deduplicating new data does not.
  • Everything is synchronization friendly. Dropbox, Syncthing and similar should work fine for data synchronization.

When storing data, rdedup will split it into smaller pieces - chunks - using rolling sum, and store each chunk under unique id (sha256 digest) in a special format directory: repo. Then the whole backup will be described as index: a list of digests.

Index will be stored internally just like the data itself. Recursively, this reduces each backup to one unique digest, which is saved under given name.

When restoring data, rdedup will read the index, then restore the data, reading each chunk listed in it.

Thanks to rolling sum chunking scheme, when saving frequently similar data, a lot of common chunks will be reused, saving space.

What makes rdedup unique, is that every time new repo directory is created, a pair of keys (public and secret) is generated. Public key is saved in the storage directory in plain text, while secret key is encrypted with key derived from a passphrase.

Every time rdedup saves a new chunk file, its data is encrypted using public key so it can only be decrypted using the corresponding secret key. This way new data can always be added, with full deduplication, while only restoring data requires providing the passphrase to unlock the private key.

Nice little detail: rdedup supports removing old names and no longer needed chunks (garbage collection) without passphrase. Only the data chunks are encrypted, making operations like garbage collection safe even on untrusted machines.

Technical Details

  • bup methods of splitting files into chunks is used
  • sha256 sum of chunk data is used as digest id
  • libsodium's sealed boxes are used for encryption/decryption:
    • ephemeral keys are used for sealing
    • chunk digest is used as nonce
  • private key is encrypted using libsodium crypto secretbox using random nonce, and key derived from passphrase using password hashing and random salt