-
-
Notifications
You must be signed in to change notification settings - Fork 746
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
interesting compression algorithms + policy #1633
Comments
Lepton JPEG compression
|
zstd compression
|
brotli compression |
I think zstd could be very compelling for something like borg. Admittedly, I don't full understand the patent issue around it, though. There's already a couple python packages bundling zstd. Those can't be used? |
@jungle-boogie for distribution (in the debian / fedora / ... sense) it matters whether zstd is available as package there (we can't just pull it via pip and compile it). I agree that zstd looks interesting technically. |
zstd indeed looks nice but FB doesn't seem to want to move at all regarding the patents issue. So far they refuse to tell even whether they actually hold any patents (this has been an issue with other projects from them as well, where the same LICENSE+PATENTS is used; however in some cases (iirc react) it has been seen that they have patent [applications]). To me it looks a lot like their intention is to create a "mutually assured destruction" scenario for patents around their software. However, I think it is quite clear that there won't be any widespread adoption (network protocols, file formats) of zstd unless this issue is clarified. Given that another entity in FB tries to push zstd as the standard compression of the future (tm) we'll just have to wait and see which part of FB prevails here. So in summary: Inclusion in Borg doesn't really depend on us, it depends on what other players in the ecosystem and FB will do with it. |
I respectful disagree with previous comment:
So zstd is not perfect but very clear about it: every user may decide if that grant is OK for him and use/don't use the software. Borg (as well as most open source software not using licences covering patents like GPLv3) just ignore any potential issue and their users just have to hope for the best. |
Zstd is already included in most important Linux/*BSD distributions: Yann Collet's (zstd author) answer regarding patents:
|
moved from suggestion by @infectormp in #45: alternative to zlib from Apple |
I haven't read through all of the tickets yet, and I'm sure this has already been answered somewhere, but it sounds like there are various challenges posed by the approach taken to compression in borg. Why is it not better to have borg do all of its work without compression and then call out to an external compressor? This would be similar to tar's |
@sjuxax forking an external program is expensive, esp. if you have small data sizes. borg compresses chunks (~2MB for big files, or as small as the file is for smaller files). |
Added
Which is a reason why e.g. things like dropbox/lepton do not work. |
zstd is coming around nicely; schedule for inclusion in 1.2? How do we add a compression algorithm? Mandatory feature flag when it's used? |
@enkore I'ld put it into 1.3. We can do it earlier in case we get bored. Yes, guess that a mandatory flag for read. And the manifest better does not get compressed (not at all or not with a new algorithm). |
By way of facebook/zstd#801, zstd is now licensed under a standard BSD license. |
Put it into 1.3, so we have it on the radar. Maybe we can also do it earlier, let's see. |
BTW, FYI, zstd is now added to the Linux kernel 4.14 for btrfs. |
zstd support in borg was released with 1.1.4. |
Thanks! |
@henfri yeah, I also noticed that - it was there before we actually implemented that. |
I hope it's the right place to post that: https://quixdb.github.io/squash-benchmark/ seems to have some indications, like Density or Pithy |
Why do you think so? lz4 is quite good for high speed, while zstd covers a wide range of compression ratios with good speed. We don't need to add anything that is only marginally better. |
|
Guess that is from borg uses it as a library, so the called library function does not know anything about I/O conditions. |
What is exactly expected from this issue in terms of documentation? I'm ready to do this one |
Guess it is enough to have this here on the issue tracker. |
@ThomasWaldmann I guess it could be in the development.rst file for people not looking for it in the issue tracker. But that's just my point of view. |
If you like, add a pointer to this issue to the docs, but do not duplicate all the stuff from top post to the docs. As this might need per-algorithm discussion, that is better done here and not possible in the docs. |
Add some documentation for new compression algorithm, see #1633 [1.1]
Agree on JPEG-XL being an interesting consideration |
Cython binding: https://github.com/olokelo/jxlpy/ I had a quick look, but there doesn't seem to be a simple compress/decompress api for content of jpeg files. Is it even possible? What we want for borg is bit-identical reconstruction of file content. |
@ThomasWaldmann it's absolutely possible, they call it "lossless transcode" and it decodes back to the original file byte for byte (identical checksums). According to the jxlpy feature list, there's no support for it in the Python bindings yet. A chance here to contribute to two projects at once? For my backups I run Shall we open a separate issue for JXL? I'd be happy to at least test it on my side. |
@alexandervlpl Let's discuss in #8092 first. |
There are often requests to add some new and great compression algorithm X.
This ticket is to collect such ideas and also to specify the adoption policy for X:
To not make this ticket get unmanageable over time, developers will actively edit and also delete comments here after processing them.
The text was updated successfully, but these errors were encountered: