NullPointerException using PdfTextExtractor #1210

gtoison · 2024-08-19T08:30:07Z

Hello, I'm running into an NPE when using PdfTextExtractor with a file produced by a third party. The code has worked for while but it seems that the third party has updated something and I'm now getting the NPE.

java.lang.NullPointerException: Cannot invoke "com.lowagie.text.pdf.PdfDictionary.getAsDict(com.lowagie.text.pdf.PdfName)" because "resources" is null
	at com.lowagie.text.pdf.parser.PdfContentStreamHandler$SetTextFont.invoke(PdfContentStreamHandler.java:599)
	at com.lowagie.text.pdf.parser.PdfContentStreamHandler.lambda$invokeOperator$0(PdfContentStreamHandler.java:204)
	at java.base/java.util.Optional.ifPresent(Optional.java:178)
	at com.lowagie.text.pdf.parser.PdfContentStreamHandler.invokeOperator(PdfContentStreamHandler.java:204)
	at com.lowagie.text.pdf.parser.PdfContentStreamHandler$Do.processContent(PdfContentStreamHandler.java:989)
	at com.lowagie.text.pdf.parser.PdfContentStreamHandler$Do.invoke(PdfContentStreamHandler.java:976)
	at com.lowagie.text.pdf.parser.PdfContentStreamHandler.lambda$invokeOperator$0(PdfContentStreamHandler.java:204)
	at java.base/java.util.Optional.ifPresent(Optional.java:178)
	at com.lowagie.text.pdf.parser.PdfContentStreamHandler.invokeOperator(PdfContentStreamHandler.java:204)
	at com.lowagie.text.pdf.parser.PdfTextExtractor.processContent(PdfTextExtractor.java:218)
	at com.lowagie.text.pdf.parser.PdfTextExtractor.getTextFromPage(PdfTextExtractor.java:199)
	at com.lowagie.text.pdf.parser.PdfTextExtractor.getTextFromPage(PdfTextExtractor.java:177)

I'm getting the error with version 2.0.2

The problems seems to be on that line because resource2 is null:

OpenPDF/openpdf/src/main/java/com/lowagie/text/pdf/parser/PdfContentStreamHandler.java

Line 968 in 00afd24

PdfDictionary resources2 = stream.getAsDict(PdfName.RESOURCES);

The error seems similar to #650

Please let me know if you need further information to help troubleshooting this, thanks in advance!

The text was updated successfully, but these errors were encountered:

andreasrosdal · 2024-08-19T20:22:38Z

Hello, can you please share a PDF file where this problem occurs? This will make it easier to make a fix.

The issue you are encountering is related to the resources dictionary sometimes being null. This typically happens if the page does not contain a resources dictionary directly. However, the resources dictionary might be inherited from the parent pages (for example, from a "Pages" dictionary).

Pull requests welcome!

gtoison · 2024-08-21T10:11:37Z

Thank you for the answer, the document contains confidential information so I can't unfortunately share it here.
I tried making a fix with your suggestion to look for a "Pages" dictionary but ran into the problem that Eclipse won't open it because a maven module "openpdf" has the same name as the project "OpenPDF". I don't have a good connectivity where I am now, I'll try with Intellij

gtoison · 2024-08-21T13:07:37Z

It does not seem to crash with that change: 6b51521
That might be a misguided fix because I don't quite know what the code is supposed to do :)

gtoison added the bug label Aug 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NullPointerException using PdfTextExtractor #1210

NullPointerException using PdfTextExtractor #1210

gtoison commented Aug 19, 2024

andreasrosdal commented Aug 19, 2024

gtoison commented Aug 21, 2024

gtoison commented Aug 21, 2024

NullPointerException using PdfTextExtractor #1210

NullPointerException using PdfTextExtractor #1210

Comments

gtoison commented Aug 19, 2024

andreasrosdal commented Aug 19, 2024

gtoison commented Aug 21, 2024

gtoison commented Aug 21, 2024