rdedup v2.0.0 Release Notes

I'm happy to announce rdedup v2.0.0!

What's new

There is no single milestone or a feature that is responsible for the new major release. Mostly it's following the mantra "release early, release often". It's been more than a year since rdedup v1.0.0 has been released.

rdedup v1.0.0 was mostly focused around my personal use case:

being written in Rust (yay!)
public key cryptography
synchronization over Dropbox/Syncthing

Since v1.0.0 rdedup attracted some user base, and with time improved considerably:

rdedup store performance have been greatly improved, to the point where I'd like to think of it as ripgrep of dedup[*]:
- store pipeline is zero-copy and extremely multi-threaded
- new faster algorithms are implemented:
  - default CDC ("chunking") algorithm is now FastCDC; FastCDC is state of the art and rdedup is one of the first (maybe even the only, at the time of writting) Open Source data deduplication tool to have it
  - blake2s is now default hashing algo
  - zstd is default for compression
- almost all parts of rdedup are now configurable with many algorithms to choose from
- testing has been improved, particularly with end-to-end tests, giving greater confidence in rdedup reliability
- -t flag has been introduced to help with timing different parts of the pipeline, to help finding performance bottlenecks
- asynchronous IO architecture have been added, in preparation for over-the-network backends

[*] Take with a grain of salt. Use https://github.com/gilbertchen/benchmarking to draw your own conclusions.

Project status

I'd like to thank all the users providing me with a feedback, and most of all, all the contributors: it really helps my motiviation knowing that there are people using rdedup.

Having said that, rdedup is still mostly a one-man, spare time project, and should be treated as such. Since v1.0.0 there have been no reports of data loss or corruption, but it's hard to tell if it's because of rdedup reliability or just small userbase. :)

I'm very aware of project pain-points:

current GC model is not very scalable, and may be too slow for datasets of TB or more. New rdedup GC approach is on the roadmap for v3.0.0 and will feature incremental, scalable and efficient GC without compromising anything.
Network-backends are still not implemented.

The codebase is not as neat as it could be, and testing is not as comprehensive as it should be for a "production ready" product.

I am planning to continue development toward rdedup v3.0.0 in the master branch. v3 will have a different repository format, to enable more efficient GC and other features. I'll continue to add fixes and smaller-scope enhancements to v2, now living in 2.0.0 branch.

Call for participation

If you think rdedup seems like an interesting project, feel free to reach out! I'd be happy to mentor and help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rdedup v2.0.0 Release Notes

What's new

Project status

Call for participation

Clone this wiki locally