Fails with out-of-memory for a very-very large pdf file #125

LudeeD · 2019-07-01T15:54:42Z

I have a pdf file that is 1.3 Gb in size ( it's a master thesis, that's why I am not annexing it here )
Okular can handle it pretty well but crashes Adobe
While trying to use pdfsizeopt it crashes too with a memory error

info: This is pdfsizeopt ZIP rUNKNOWN size=69734.
info: prepending to PATH: /home/ludee/Programs/pdfsizeopt/pdfsizeopt_libexec
info: loading PDF from: /home/ludee/Desktop/Dissertação_Ana_Antunes_201405897.pdf
info: loaded PDF of 1322590721 bytes
info: separated to 2269032 objs + xref + trailer
Traceback (most recent call last):
  File "/proc/self/exe/runpy.py", line 162, in _run_module_as_main
  File "/proc/self/exe/runpy.py", line 72, in _run_code
  File "./pdfsizeopt.single/__main__.py", line 1, in <module>
  File "./pdfsizeopt.single/m.py", line 6, in <module>
  File "./pdfsizeopt.single/pdfsizeopt/main.py", line 5622, in main
  File "./pdfsizeopt.single/pdfsizeopt/main.py", line 2664, in Load
  File "./pdfsizeopt.single/pdfsizeopt/main.py", line 689, in __init__
  File "./pdfsizeopt.single/pdfsizeopt/main.py", line 942, in Get
  File "./pdfsizeopt.single/pdfsizeopt/main.py", line 1217, in ParseDict
  File "./pdfsizeopt.single/pdfsizeopt/main.py", line 1148, in ParseSimpleValue
MemoryError

The text was updated successfully, but these errors were encountered:

zvezdochiot · 2019-07-01T17:56:10Z

@LudeeD say> I have a pdf file that is 1.3 Gb in size

More information please:

pdfinfo /home/ludee/Desktop/Dissertação_Ana_Antunes_201405897.pdf

And see #119

LudeeD · 2019-07-01T19:19:05Z

More info

Title:          
Subject:        
Keywords:       
Author:         
Creator:        LaTeX with hyperref
Producer:       pdfTeX-1.40.19
CreationDate:   Sun Jun 30 21:11:45 2019 WEST
ModDate:        Sun Jun 30 21:11:45 2019 WEST
Tagged:         no
UserProperties: no
Suspects:       no
Form:           none
JavaScript:     no
Pages:          308
Encrypted:      no
Page size:      595.276 x 841.89 pts (A4)
Page rot:       0
File size:      1322590721 bytes
Optimized:      no
PDF version:    1.5

Following instructions on #119 cpdf also failed with a

Initial file size is 1322590721 bytes
Beginning squeeze: 2269033 objects
Fatal error: out of memory.

zvezdochiot · 2019-07-01T19:32:31Z

@LudeeD say> Pages: 308, File size: 1322590721 bytes

1322590721/308 = 4294128 bytes/page. Hmm! Is big!

You can change /FlateDecode (~ png) to /DCTDecode (~ jpeg), use ghostscript:

ps2pdf /home/ludee/Desktop/Dissertação_Ana_Antunes_201405897.pdf /home/ludee/Desktop/Dissertação_Ana_Antunes_201405897.gs.pdf

LudeeD · 2019-07-01T22:07:29Z

After running for 3 hours I gave up on this.
Rebuilt the PDF with compressed versions of the images and now its in a more reasonable size.

feel free to close this issue if handling > 1Gb files is not really a priority

Thanks for the help

rbrito · 2019-07-02T01:16:18Z

Can you share this file? It sure sounds interesting and I would like to have a look at it. Thanks, Rogério Brito. Em seg, 1 de jul de 2019 12:54, Luís Silva <[email protected]> escreveu:

…

I have a pdf file that is 1.3 Gb in size ( it's a master thesis, that's why I am not annexing it here ) Okular can handle it pretty well but crashes Adobe While trying to use pdfsizeopt it crashes too with a memory error info: This is pdfsizeopt ZIP rUNKNOWN size=69734. info: prepending to PATH: /home/ludee/Programs/pdfsizeopt/pdfsizeopt_libexec info: loading PDF from: /home/ludee/Desktop/Dissertação_Ana_Antunes_201405897.pdf info: loaded PDF of 1322590721 bytes info: separated to 2269032 objs + xref + trailer Traceback (most recent call last): File "/proc/self/exe/runpy.py", line 162, in _run_module_as_main File "/proc/self/exe/runpy.py", line 72, in _run_code File "./pdfsizeopt.single/__main__.py", line 1, in <module> File "./pdfsizeopt.single/m.py", line 6, in <module> File "./pdfsizeopt.single/pdfsizeopt/main.py", line 5622, in main File "./pdfsizeopt.single/pdfsizeopt/main.py", line 2664, in Load File "./pdfsizeopt.single/pdfsizeopt/main.py", line 689, in __init__ File "./pdfsizeopt.single/pdfsizeopt/main.py", line 942, in Get File "./pdfsizeopt.single/pdfsizeopt/main.py", line 1217, in ParseDict File "./pdfsizeopt.single/pdfsizeopt/main.py", line 1148, in ParseSimpleValue MemoryError — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#125?email_source=notifications&email_token=AABTZMIXYH56MRBFKPB2MGLP5ISEFA5CNFSM4H4TY5WKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4G4VUHNA>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AABTZMJBALHMD5X7I67PTXLP5ISEFANCNFSM4H4TY5WA> .

zvezdochiot · 2019-07-02T05:09:28Z

@rbrito say> It sure sounds interesting and I would like to have a look at it.

Use pdftk to process the file in parts.

pts · 2023-02-23T02:15:36Z

pdfsizeopt indeed uses a lot of memory for large PDF files, because it keeps the parsed version of the entire PDF file in memory. It also keeps multiple versions of compressed image data in memory for the current image being optimized.

Throwing more memory at it should make it work. Unfortunately there is no easy estimate for the total required memory for a given input file.

In the meantime, splitting the PDF file on some page boundary (with pdftk or qpdf), running pdfsizeopt on the split PDF files individually, and joining the results may work for some PDFs.

I'm keeping this issue open as a reminder to add memory optimizations.

pts changed the title ~~Fails with very very large pdf file~~ Fails with out-of-memory for a very-very large pdf file Feb 23, 2023

pts added bug enhancement and removed bug labels Feb 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fails with out-of-memory for a very-very large pdf file #125

Fails with out-of-memory for a very-very large pdf file #125

LudeeD commented Jul 1, 2019

zvezdochiot commented Jul 1, 2019 •

edited

Loading

LudeeD commented Jul 1, 2019

zvezdochiot commented Jul 1, 2019 •

edited

Loading

LudeeD commented Jul 1, 2019

rbrito commented Jul 2, 2019 via email

zvezdochiot commented Jul 2, 2019 •

edited

Loading

pts commented Feb 23, 2023

Fails with out-of-memory for a very-very large pdf file #125

Fails with out-of-memory for a very-very large pdf file #125

Comments

LudeeD commented Jul 1, 2019

zvezdochiot commented Jul 1, 2019 • edited Loading

LudeeD commented Jul 1, 2019

zvezdochiot commented Jul 1, 2019 • edited Loading

LudeeD commented Jul 1, 2019

rbrito commented Jul 2, 2019 via email

zvezdochiot commented Jul 2, 2019 • edited Loading

pts commented Feb 23, 2023

zvezdochiot commented Jul 1, 2019 •

edited

Loading

zvezdochiot commented Jul 1, 2019 •

edited

Loading

zvezdochiot commented Jul 2, 2019 •

edited

Loading