Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optimization of /CalRGB and /CalGray images #105

Open
rbrito opened this issue Dec 10, 2018 · 3 comments
Open

Add optimization of /CalRGB and /CalGray images #105

rbrito opened this issue Dec 10, 2018 · 3 comments

Comments

@rbrito
Copy link

rbrito commented Dec 10, 2018

Hi, @pts.

Perhaps you consider this to be the same issue as with issue #102, perhaps not.

I had a file that contained only bilevel images that were actually deflated and with a prefix of (before the actual stream):

<<
/ColorSpace [/CalRGB 
<<
/Gamma [2.2 2.2 2.2]
/WhitePoint [0.95043 1 1.09]
/Matrix [0.41239 0.21264 0.01933 0.35758 0.71517 0.11919 0.18045 0.07218 0.9504]
>>]
/Height 3093
/Subtype /Image
/Filter /FlateDecode
/DecodeParms 
<<
/Columns 2216
/Colors 3
/Predictor 15
/BitsPerComponent 8
>>
/Width 2216
/BitsPerComponent 8
/Length 341433
>>

When running pdfsizeopt, it didn't try to touch those images. I'm attaching a page from this document here.

I'm also attaching a page that I produced by a bad method of extracting the image with pdfimages, then wrapped with img2pdf and then compressed with pdfsizeopt and the difference in size is amazing (from 342kB to 42kB or, in other words, only approximately 12% of the size!).

The files are visually identical (as far as diffpdf is concerned), but this method has the huge drawback of throwing away any scanned text and it only works if all the pages are scans.

Thanks,

Rogério Brito.

p-010.pdf
p-010.pso.pdf
b.pdf
b.pso.pdf

@pts
Copy link
Owner

pts commented Dec 11, 2018

Thank you for reporting this!

The /CalRGB colorspace is not supported by pdfsizeopt. This code explicitly skips unsupported colorspaces:

      if not re.match(r'(?:/Device(?:RGB|Gray)\Z|\[[\0\t\n\r\f ]*'
                      r'/Indexed[\0\t\n\r\f ]*'
                      r'/Device(?:RGB|Gray)[\0\t\n\r\f (<\[/])', colorspace):
        continue

Adding support would be possible, but not trivial. Since there is no simple conversion between /CalRGB and /DeviceGray (etc.), all image optimizers which change the colorspace have to be disabled for such images.

An alternative to the above is converting from /CalRGB to /DeviceRGB before optimizing the image. Preferably we'd need a printing expert's opinion about the print quality degradation when converting from /CalRGB to /DeviceRGB. (The fact that diffpdf doesn't show any diffs can be misleading, maybe the color differences are more subtle, not representable in 8 bits.)

@pts pts changed the title Missed optimization with certain images Add optimization of /CalRGB images Dec 11, 2018
@pts
Copy link
Owner

pts commented Dec 11, 2018

I can confirm that just changing the /ColorSpace value to /DeviceRGB in p-010.pdf makes the output of pdfsizeopt much smaller (info: generated 42726 bytes (12%)). However, this change is not safe, because it can also affect the visual appearance of the image, and by design pdfsizeopt doesn't change the visual appearance.

Nevertheless we could enable such unsafe changes with a command-line flag.

@pts
Copy link
Owner

pts commented Dec 14, 2018

Good news: it is possible to add support for these color spaces to pdfsizeopt with keeping existing image optimizers (sam2p, jbig2, pngout etc.) in a safe way, without introducing visible changes:

  • [/CalGray ...]
  • [/CalRGB ...]
  • [/Indexed [/CalGray ...] ...]
  • [/Indexed [/CalRGB ...] ...]

The trick is to pretend that these are /DeviceGray or /DeviceRGB (or the /Indexed variants of those) while the image optimizers are running, and keep the original (*Cal*) /ColorSpace value in the PDF object along with the optimized image data. The only problem is the conversion to [/CalGray ...] from [/CalRGB ...] (when color components within a pixel have the same values), because there is no color forumula mapping. The workaround this is emitting [/Indexed [/CalRGB ...] ...] instead of [/CalGray ...].

Keeping this issue open to track to implementation of this feature.

@pts pts changed the title Add optimization of /CalRGB images Add optimization of /CalRGB and /CalGray images Dec 14, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants