Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decompressing large .7z files (> 4GiB) causes Python to raise MemoryError exception #32

Open
ijacquez opened this issue Oct 17, 2015 · 9 comments

Comments

@ijacquez
Copy link

Decompressing large .7z files (> 4GiB) causes Python to raise MemoryError exception:

for name in self.archive.getnames():
    out_filename = os.path.join(path, name)
    out_dir = os.path.dirname(out_filename)
    if not os.path.exists(out_dir):
        os.makedirs(out_dir)
        with open(out_filename, 'wb') as out_file:
            out_file.write(self.archive.getmember(name).read())
@victor3rc
Copy link

I managed to read 7z files in chunks.

@fancycode if you have any interest in this let me know, I can wrap it up in a method and do a pull request. It could potentially solve this issue.

@ijacquez
Copy link
Author

@victor3rc, what were the results with files exceeding 4GiB?

@fancycode
Copy link
Owner

@victor3rc sure, pull requests are always welcome!

@remyroy
Copy link

remyroy commented Feb 15, 2016

@victor3rc I'm highly interesting by that code which read 7z files in chunks. It makes little sense for the ArchiveFile class to have a single read method which reads the whole file in memory.

@victor3rc
Copy link

@remyroy I agree.

@ijacquez I've been doing some tests with a 50+ GB file and it is reading it in chunks fine.

I'll try to wrap it in a method this week guys.

@victor3rc
Copy link

Hey @remyroy @ijacquez, just an update: I managed to read chunks but I was getting some errors when I was calling pylzma.decompressobj.decompress(chunk), specifically at the end of the file, on the final chunks.

A temporary solution I have found to the problem is to use subprocess to call 7z and decompress the file locally. I then read whatever is decompressed in chunks.

@ijacquez
Copy link
Author

Do you have an idea as to what is causing that? Is it your changes? Are the chunks too big?

@victor3rc
Copy link

no idea, sorry. I didn't have time to look into the pylzma.decompressobj.decompress functionality, that's where the error was happening. It wouldn't be the size of the chunks, that method is used to read the entire file.

@igkins
Copy link

igkins commented Jun 5, 2019

@victor3rc , could you post your chunk reading code? even if it didn't fully work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants