Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converted PDF have broken pages in Acrobat Pro DC #111

Open
galaxy001 opened this issue Mar 19, 2019 · 9 comments
Open

Converted PDF have broken pages in Acrobat Pro DC #111

galaxy001 opened this issue Mar 19, 2019 · 9 comments
Labels

Comments

@galaxy001
Copy link

galaxy001 commented Mar 19, 2019

I remove the broken xmp item, and get the result with ../pdfsizeopt.single nnGm.pdf nnGmo2.pdf 2>nnGmo2.log.

The "nnGmo2.pdf" can be viewed with macOS Preview app, but in Acrobat Pro DC 2018, some pages are broken. Such as: xx, 68, 69,70, 226-230 as labeled.

The pdf structure view also breaks after page 21.

image

nnGmo2.log
nnGmo2.pdf

@galaxy001 galaxy001 changed the title Convert PDF have broken pages in Acrobat Pro DC Converted PDF have broken pages in Acrobat Pro DC Mar 19, 2019
@pts
Copy link
Owner

pts commented Mar 19, 2019

Thank you for reporting this in detail! This may indicate multiple bugs in pdfsizeopt.

Unfortunately I don't have a license for Acrobat Pro DC 2018, so I can't reproduce the problem. The indicated pages of nnGmo2.pdf work for me in Google Chrome and Evince.

Do you have more detailed error messages from Acrobat Pro DC 2018?

You may want to diagnose this further. First, try pdfsizeopt.single --use-pngout=no, to make the image processing faster. Then try pdfsizeopt.single --use-pngout=no --do-optimize-fonts=no. Does this fix all the problems you were encountering?

FYI The full list of useful flags to try are: pdfsizeopt.single --do-optimize-images=no --do-optimize-fonts=no --do-optimize-objs=no --do-optimize-streams=no --do-decompress-most-streams=yes --do-generate-xref-stream=no --do-generate-object-stream=no

@galaxy001
Copy link
Author

galaxy001 commented Mar 20, 2019

Even the last one pdfsizeopt.single --do-optimize-images=no --do-optimize-fonts=no --do-optimize-objs=no --do-optimize-streams=no --do-decompress-most-streams=yes --do-generate-xref-stream=no --do-generate-object-stream=no, does not work for Acrobat.

However, after convert to QDF with qpdf, it works.

Then, process the QDF with the last one above, it sucks again.


Would you use the free trial license to give a try ?
https://acrobat.adobe.com/us/en/free-trial-download.html

I don't know how to get a detailed error messages from Acrobat Pro DC 2018 yet.

@galaxy001
Copy link
Author

I got a fix with 'qpdf':

qpdf --decode-level=none --normalize-content=y fo.pdf for.pdf
qpdf --decode-level=none for.pdf forr.pdf

$ ls -1s fo.pdf for.pdf forr.pdf
 9984 fo.pdf
18440 for.pdf
10248 forr.pdf

@pts
Copy link
Owner

pts commented Mar 20, 2019

Can you please convert fo.pdf, for.pdf and forr.pdf with pdfsizeopt, upload all 3*2 files to this issue, and declare which work in Acrobat and which don't? It would be awesome to have such short example PDF files which don't work in Acrobat.

Having uploaded the files here you may also want to report the bug to Adobe, and wait for analysis and comments of the Adobe engineers. Currently (without a meaningful error message from Adobe Acrobat) it's not obvious whether pdfsizeopt or Adobe Acrobat has the bug.

I'm developing pdfsizeopt on Linux. In order to use Adobe Acrobat, I'd have to give my credit card details to Adobe (that's the smaller issue), and I'd have to either buy a Mac or install Windows to one of my existing computers (or into a VM). I'm willing to do this only if I'm compensated for the licenses and the work in advance.

@galaxy001
Copy link
Author

man ls

     -s      Display the number of file system blocks actually used by each file, in units of 512 bytes, where
             partial units are rounded up to the next integer value.  If the output is to a terminal, a total
             sum for all the file sizes is output on a line before the listing.  The environment variable
             BLOCKSIZE overrides the unit size of 512 bytes.

I tried to extract some pages. When there are a few pages, everything is right, thus I use this file set:

pdfsizeopt --do-keep-font-optionals=yes --do-regenerate-all-fonts=no --do-double-check-type1c-output=yes --do-ignore-generation-numbers=no --do-optimize-objs=no --use-multivalent=yes ex.pdf exo.pdf
qpdf --decode-level=none --normalize-content=y exo.pdf exor.pdf
qpdf --decode-level=none exor.pdf exorr.pdf

ex.pdf
exo.pdf
exor.pdf
exorr.pdf

The Acrobat shows page 12 is missing for exo.pdf.
Acrobat

@babinslava
Copy link

babinslava commented Oct 18, 2019

I have the same issue. Thank you @galaxy001 for help qpdf --decode-level=none --normalize-content=y fixes the file and doesn't even increase file size.

@pts error message in adobe acrobat pro is Expected a dict object.

@pts
Copy link
Owner

pts commented Dec 12, 2019

Thank you for reporting this. I'd love to debug and fix this, but unfortunately I don't have a copy of Acrobat Pro DC, and the error message Expected a dict object. is already helpful, but not specific enough, it could take hours or days to debug by trial and error. Any contributions are welcome.

@pts pts added the bug label Feb 23, 2023
@pts
Copy link
Owner

pts commented Feb 23, 2023

Nevertheless, it's still worth investigating what difference qpdf --decode-level=none --normalize-content=y makes to the PDF file. Maybe pdfsizeopt itself could do it.

@Ndolam
Copy link

Ndolam commented Feb 23, 2023

Hi Péter,

I keep a copy of the last Adobe Reader for Linux (9.5.5) around, and just for your information, that program also doesn't show page 12 (and when I scroll down that far acroread spits out an error message "There was a problem reading this document (14).").

Also, my version of pdftk can't process this exo.pdf file.

To any Linux users reading this: you need quite a number of 32-bit libraries installed on your system to use this, and I have been told there are some security bugs with this version. So use it at your own risk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants