-
Notifications
You must be signed in to change notification settings - Fork 43
rdedup v2.0.0 Release Notes
I'm happy to announce rdedup
v2.0.0!
There is no single milestone or a feature that is responsible for the new major release. Mostly it's following the mantra "release early, release often". It's been more than a year since rdedup
v1.0.0 has been released.
rdedup
v1.0.0 was mostly focused around my personal use case:
- being written in Rust (yay!)
- public key cryptography
- synchronization over Dropbox/Syncthing
Since v1.0.0 rdedup
attracted some user base, and with time improved considerably:
-
rdedup store
performance have been greatly improved, to the point where I'd like to think of it asripgrep
of dedup[*]:store
pipeline is zero-copy and extremely multi-threaded- new faster algorithms are implemented:
- default CDC ("chunking") algorithm is now FastCDC; FastCDC is state of the art and
rdedup
is one of the first (maybe even the only, at the time of writting) Open Source data deduplication tool to have it - blake2s is now default hashing algo
- zstd is default for compression
- default CDC ("chunking") algorithm is now FastCDC; FastCDC is state of the art and
- almost all parts of
rdedup
are now configurable with many algorithms to choose from - testing has been improved, particularly with end-to-end tests, giving greater confidence in rdedup reliability
-
-t
flag has been introduced to help with timing different parts of the pipeline, to help finding performance bottlenecks - asynchronous IO architecture have been added, in preparation for over-the-network backends
[*] Take with a grain of salt. Use https://github.com/gilbertchen/benchmarking to draw your own conclusions.
I'd like to thank all the users providing me with a feedback, and most of all, all the contributors: it really helps my motiviation knowing that there are people using rdedup
.
Having said that, rdedup
is still mostly a one-man, spare time project, and should be treated as such. Since v1.0.0 there have been no reports of data loss or corruption, but it's hard to tell if it's because of rdedup
reliability or just small userbase. :)
I'm very aware of project pain-points:
- current GC model is not very scalable, and may be too slow for datasets of TB or more. New
rdedup
GC approach is on the roadmap for v3.0.0 and will feature incremental, scalable and efficient GC without compromising anything. - Network-backends are still not implemented.
The codebase is not as neat as it could be, and testing is not as comprehensive as it should be for a "production ready" product.
I am planning to continue development toward rdedup v3.0.0
in the master
branch. v3 will have a different repository format, to enable more efficient GC and other features. I'll continue to add fixes and smaller-scope enhancements to v2, now living in 2.0.0
branch.
If you think rdedup
seems like an interesting project, feel free to reach out! I'd be happy to mentor and help.