-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible optimization? #12
Comments
I think this is a duplicate of mxmlnkn/ratarmount#105. Pragzip is not yet used for zip files. Your case should work after it has been integrated. |
IMHO it is related but not the same. In case of .zip file there is no need at all to deflate anything upon mount.... |
But didn't you say you were mounting a ".zip archive"? For .tar.gz, I don't see any way around inflating the whole file. That's because the metadata for each file can be anywhere inside the TAR. That's why I have to go over it once to collect all file names. Not inflating the whole file would mean that some file names would be missing in the mount point. I am simply skipping over the file contents during metadata gathering but gzip does not allow to skip data. The gzip decompression also needs an index for that in the first place. I could try to start decoding in the middle of a gzip file but it would never be guaranteed that this would work and I wouldn't even know at which decompressed offset I am currently at. I need to know all data before to determine that.
That is correct but in your original post you were talking about seeking to the end of zip members ... |
I was not able to reproduce your observed behavior. I have tried: base64 /dev/urandom | head -c $(( 8 * 1024 * 1024 * 1024 )) > large
zip large.zip large
ratarmount large.zip mounted
python3 -c 'import io; file=open("mounted/10k-1MiB-files.tar", "rb"); file.seek(0, io.SEEK_END); print(file.tell())' Getting the file size like this is completed in 19ms. This indicates that it does not actually decompress the whole member. It simply seeks to the end and returns the size without any decompression. I'm closing it for now. Please provide a bash script to reproduce the issue. And it should probably be an issue in the ratarmount repository not in here in the pragzip repository. Are you by change using |
Are you sure this is the script you've used for testing? |
Yeah sorry, I wanted to make the script more generic and reproducible by changing base64 /dev/urandom | head -c $(( 8 * 1024 * 1024 * 1024 )) > large
zip large.zip large
ratarmount large.zip mounted
python3 -c 'import io; file=open("mounted/large", "rb"); file.seek(0, io.SEEK_END); print(file.tell())' The result is the same. It takes ~20ms. |
I've stumbled on following scenario:
I'm mounting a .zip archive with ratarmont,
The archive contains 10 .tar files each of the 10G in size;
I have a 3rd party antivirus program which scans the mount point and which is EXTREMELY slow (relative to others).
So I've analyzed it's behavior with strace. It seems that it tries to determine the file size using the following (or similar) code:
Of course the first
fseek
causes ratarmount to fully decompress the member of the .zip file which takes a LOT of time.So I wonder is it possible to make ratarmount to postpone to do the actual seek until the
read
orwrite
operations?The
seek(... SEK_END)
call can position virtual offset to the value retrieved form associated struct stat?The text was updated successfully, but these errors were encountered: