You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Because ZIP files may be appended to, only files specified in the central directory at the end of the file are valid. Scanning a ZIP file for local file headers is invalid (except in the case of corrupted archives), as the central directory may declare that some files have been deleted and other files have been updated.
Maybe consider adding a note about deleted and updated files to the list here:
Do you happen to know how a file would be updated/deleted in this way for the LFH to become invalid? I could just be blind, but I can't find anything in the spec that strictly supports this.
Instead, this is a requirement by the spec:
Each "local file header" MUST be accompanied by a corresponding "central directory header" record within the central directory section of the ZIP file.
which means you can't just delete a file by removing the CDR but leaving the actual LFH/data present.
One other thing that occurred to me was that you can't use Stored when storing an inner ZIP file, because we'll start matching on that inner ZIP file's signatures. Will add that now.
4.3.2 Each file placed into a ZIP file MUST be preceded by a "local
file header" record for that file. Each "local file header" MUST be
accompanied by a corresponding "central directory header" record within
the central directory section of the ZIP file.
I think one way to interpret that statement is:
All files included in the ZIP must have a local file header.
Each of these local file headers must be pointed to by the central directory.
So the local file headers of every included file must have a corresponding record in the central directory.
It doesn't necessarily say that every local file header must be pointed to by the central directory.
...the central directory may declare that some files have been deleted and other files have been updated.
For example, we may start with a ZIP file that contains files A, B and C. File B is then deleted and C updated. This may be achieved by just appending a new file C to the end of the original ZIP file and adding a new central directory that only lists file A and the new file C.
The link above also provides some rationale for why this feature was initially used with floppy disks when ZIP was first designed.
There are also several append-only storage systems so I'd expect that this approach is also still used in those cases.
I think Stored inner ZIP files should be a solvable problem. Anything contained in the outer ZIP file will be preceded by a local file header that lists the (possibly compressed) length of the contained file. If you ignore all data for the next compressed_size bytes, you can avoid matching on an internal zip file.
(I haven't looked at the code for the streaming implementation in this crate yet so this may not be easily applicable to the current implementation)
It may be a good idea to add a few more notes to your list of caveats about decompressing from a non-seekable stream.
From Wikipedia:
Maybe consider adding a note about deleted and updated files to the list here:
rs-async-zip/src/read/stream.rs
Lines 17 to 28 in 6bca65b
The text was updated successfully, but these errors were encountered: