Add optimization of /CalRGB and /CalGray images #105

rbrito · 2018-12-10T22:13:22Z

Perhaps you consider this to be the same issue as with issue #102, perhaps not.

I had a file that contained only bilevel images that were actually deflated and with a prefix of (before the actual stream):

<<
/ColorSpace [/CalRGB 
<<
/Gamma [2.2 2.2 2.2]
/WhitePoint [0.95043 1 1.09]
/Matrix [0.41239 0.21264 0.01933 0.35758 0.71517 0.11919 0.18045 0.07218 0.9504]
>>]
/Height 3093
/Subtype /Image
/Filter /FlateDecode
/DecodeParms 
<<
/Columns 2216
/Colors 3
/Predictor 15
/BitsPerComponent 8
>>
/Width 2216
/BitsPerComponent 8
/Length 341433
>>

When running pdfsizeopt, it didn't try to touch those images. I'm attaching a page from this document here.

I'm also attaching a page that I produced by a bad method of extracting the image with pdfimages, then wrapped with img2pdf and then compressed with pdfsizeopt and the difference in size is amazing (from 342kB to 42kB or, in other words, only approximately 12% of the size!).

The files are visually identical (as far as diffpdf is concerned), but this method has the huge drawback of throwing away any scanned text and it only works if all the pages are scans.

Thanks,

Rogério Brito.

p-010.pdf
p-010.pso.pdf
b.pdf
b.pso.pdf

The text was updated successfully, but these errors were encountered:

pts · 2018-12-11T10:24:42Z

Thank you for reporting this!

The /CalRGB colorspace is not supported by pdfsizeopt. This code explicitly skips unsupported colorspaces:

      if not re.match(r'(?:/Device(?:RGB|Gray)\Z|\[[\0\t\n\r\f ]*'
                      r'/Indexed[\0\t\n\r\f ]*'
                      r'/Device(?:RGB|Gray)[\0\t\n\r\f (<\[/])', colorspace):
        continue

Adding support would be possible, but not trivial. Since there is no simple conversion between /CalRGB and /DeviceGray (etc.), all image optimizers which change the colorspace have to be disabled for such images.

An alternative to the above is converting from /CalRGB to /DeviceRGB before optimizing the image. Preferably we'd need a printing expert's opinion about the print quality degradation when converting from /CalRGB to /DeviceRGB. (The fact that diffpdf doesn't show any diffs can be misleading, maybe the color differences are more subtle, not representable in 8 bits.)

pts · 2018-12-11T10:32:30Z

I can confirm that just changing the /ColorSpace value to /DeviceRGB in p-010.pdf makes the output of pdfsizeopt much smaller (info: generated 42726 bytes (12%)). However, this change is not safe, because it can also affect the visual appearance of the image, and by design pdfsizeopt doesn't change the visual appearance.

Nevertheless we could enable such unsafe changes with a command-line flag.

pts · 2018-12-14T15:17:19Z

Good news: it is possible to add support for these color spaces to pdfsizeopt with keeping existing image optimizers (sam2p, jbig2, pngout etc.) in a safe way, without introducing visible changes:

[/CalGray ...]
[/CalRGB ...]
[/Indexed [/CalGray ...] ...]
[/Indexed [/CalRGB ...] ...]

The trick is to pretend that these are /DeviceGray or /DeviceRGB (or the /Indexed variants of those) while the image optimizers are running, and keep the original (*Cal*) /ColorSpace value in the PDF object along with the optimized image data. The only problem is the conversion to [/CalGray ...] from [/CalRGB ...] (when color components within a pixel have the same values), because there is no color forumula mapping. The workaround this is emitting [/Indexed [/CalRGB ...] ...] instead of [/CalGray ...].

Keeping this issue open to track to implementation of this feature.

pts added the enhancement label Dec 11, 2018

pts changed the title ~~Missed optimization with certain images~~ Add optimization of /CalRGB images Dec 11, 2018

pts changed the title ~~Add optimization of /CalRGB images~~ Add optimization of /CalRGB and /CalGray images Dec 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add optimization of /CalRGB and /CalGray images #105

Add optimization of /CalRGB and /CalGray images #105

rbrito commented Dec 10, 2018

pts commented Dec 11, 2018

pts commented Dec 11, 2018 •

edited

Loading

pts commented Dec 14, 2018 •

edited

Loading

Add optimization of /CalRGB and /CalGray images #105

Add optimization of /CalRGB and /CalGray images #105

Comments

rbrito commented Dec 10, 2018

pts commented Dec 11, 2018

pts commented Dec 11, 2018 • edited Loading

pts commented Dec 14, 2018 • edited Loading

pts commented Dec 11, 2018 •

edited

Loading

pts commented Dec 14, 2018 •

edited

Loading