Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example of streaming only one file in the archive #80

Merged
merged 3 commits into from
Sep 22, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 33 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,12 @@
[![codecov](https://codecov.io/gh/JuliaIO/ZipArchives.jl/branch/main/graph/badge.svg?token=K3J0T9BZ42)](https://codecov.io/gh/JuliaIO/ZipArchives.jl)
[![Aqua QA](https://raw.githubusercontent.com/JuliaTesting/Aqua.jl/master/badge.svg)](https://github.com/JuliaTesting/Aqua.jl)

Read and write Zip archives in julia.
Read and write Zip archives in Julia.

Like Tar.jl, it is designed to use the Zip format to share data between
multiple computers, not to backup a directory and preserve all local filesystem metadata.

All public functions are exported, non exported functions and struct fields are internal.
All public functions are exported. Non-exported functions and struct fields are internal.

See [test/test_simple-usage.jl](https://github.com/JuliaIO/ZipArchives.jl/blob/main/test/test_simple-usage.jl) for more examples.

Expand All @@ -20,24 +20,24 @@ See [test/test_simple-usage.jl](https://github.com/JuliaIO/ZipArchives.jl/blob/m
An archive contains a list of named entries.
These entries represent archived files or empty directories.
Internally there is no file system like tree structure; however,
the entry name may have "/"s to represented a relative path.
the entry name may have "/"s to represent a relative path.

At the end of the archive there is a "central directory" of all entry names, sizes,
At the end of the archive, there is a "central directory" of all entry names, sizes,
and other metadata.

The central directory gets parsed first when reading an archive.

The central directory makes it fast to read just one random entry out of a very large archive.
The central directory makes reading just one random entry out of a large archive fast.

When writing it is important to close the writer so the central directory gets written out.

More details on the file format can be found at https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT

### Reading Zip archives

Archives can be read from any `AbstractVector{UInt8}` containing the data of a zip archive.
Archives can be read from any `AbstractVector{UInt8}` containing the data of a Zip archive.

For example if you download this repo as a ".zip" from github https://github.com/JuliaIO/ZipArchives.jl/archive/refs/heads/main.zip you can read this README in julia.
For example, if you download this repo as a ".zip" from GitHub https://github.com/JuliaIO/ZipArchives.jl/archive/refs/heads/main.zip you can read this README in julia.

```julia
using ZipArchives: ZipReader, zip_names, zip_readentry
Expand All @@ -64,9 +64,9 @@ using Test: @test_throws
filename = tempname()
```
Open a new zip file with `ZipWriter`
If a file already exists at filename, it will be replaced.
If a file already exists at `filename`, it will be replaced.
Using the do syntax ensures the file will be closed.
Otherwise make sure to close the ZipWriter to finish writing the file.
Otherwise, make sure to close the ZipWriter to finish writing the file.

```julia
ZipWriter(filename) do w
Expand All @@ -87,6 +87,22 @@ ZipWriter(filename) do w
end
```

### Streaming one entry in a large archive file
If your archive is in a file, `mmap` can be used to treat the file as a `Vector{UInt8}`.

An entry in the archive can be opened as an `IO` stream using `zip_openentry`.

```julia
using ZipArchives: ZipReader, zip_openentry
using Downloads: download
using Mmap: mmap
zip_file_path = download("https://github.com/JuliaIO/ZipArchives.jl/archive/refs/heads/main.zip");
archive = ZipReader(mmap(open(zip_file_path)))
readme_n_lines = zip_openentry(archive, "ZipArchives.jl-main/README.md") do io
countlines(io)
end
```

### Supported Compression Methods

| Compression Method | Reading | Writing |
Expand All @@ -100,7 +116,7 @@ end
1. Cannot directly extract all files in an archive and write those files to disk.
1. Ignores time stamps.
1. Cannot write an archive fully in streaming mode. See https://github.com/madler/zipflow if you need this functionality.
1. Encryption and decryption not supported.
1. Encryption and decryption are not supported.
1. Multi disk archives not supported.
1. Cannot recover data from a corrupted archive. Especially if the end of the archive is corrupted.

Expand All @@ -115,12 +131,12 @@ It is just a wrapper of p7zip, and must be run as an external program.

ZipFile is very similar to ZipArchives at a high level.

Currently ZipArchives has the following benefits over ZipFile:
Currently, ZipArchives has the following benefits over ZipFile:
1. Full ZIP64 support: archives larger than 4GB can be written.
2. UTF-8 file name support: entry names correctly mark that they are UTF-8.
3. Safe multi threaded reading of different entries in a single archive.
3. Safe multi-threaded reading of different entries in a single archive.
4. Files can be marked as executable. Permissions are handled like in https://github.com/JuliaIO/Tar.jl#permissions
5. By default when writing an archive, entry names are checked to avoid some common issues if the archive would be extracted on windows.
5. By default when writing an archive, entry names are checked to avoid some common issues if the archive is extracted on Windows.
6. Ability to append to an existing zip archive, in an `IO` or in a file on disk.

ZipArchives currently has the following limitations compared to ZipFile:
Expand All @@ -132,11 +148,11 @@ ZipArchives currently has the following limitations compared to ZipFile:



## Is there a unzip function for a whole archive?
## Is there an unzip function for a whole archive?
This package cannot unzip a whole archive to disk with a single function.

This is quite complicated to do in a cross platform manner that also handles all potential errors or malicious zip archives in a safe way.
This is quite complicated to do in a cross-platform manner that also handles all potential errors or malicious Zip archives safely.

So this could be done in a separate package that depends on this package. Or using existing well tested C libraries such as `p7zip_jll`
So this could be done in a separate package that depends on this package. Or using existing well-tested C libraries such as `p7zip_jll`

I am happy to add other high level functions for creating zip archives to this package.
I am happy to add other high-level functions for creating zip archives to this package.
Loading