Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NullPointerException using PdfTextExtractor #1210

Open
gtoison opened this issue Aug 19, 2024 · 3 comments
Open

NullPointerException using PdfTextExtractor #1210

gtoison opened this issue Aug 19, 2024 · 3 comments
Labels

Comments

@gtoison
Copy link

gtoison commented Aug 19, 2024

Hello, I'm running into an NPE when using PdfTextExtractor with a file produced by a third party. The code has worked for while but it seems that the third party has updated something and I'm now getting the NPE.

java.lang.NullPointerException: Cannot invoke "com.lowagie.text.pdf.PdfDictionary.getAsDict(com.lowagie.text.pdf.PdfName)" because "resources" is null
	at com.lowagie.text.pdf.parser.PdfContentStreamHandler$SetTextFont.invoke(PdfContentStreamHandler.java:599)
	at com.lowagie.text.pdf.parser.PdfContentStreamHandler.lambda$invokeOperator$0(PdfContentStreamHandler.java:204)
	at java.base/java.util.Optional.ifPresent(Optional.java:178)
	at com.lowagie.text.pdf.parser.PdfContentStreamHandler.invokeOperator(PdfContentStreamHandler.java:204)
	at com.lowagie.text.pdf.parser.PdfContentStreamHandler$Do.processContent(PdfContentStreamHandler.java:989)
	at com.lowagie.text.pdf.parser.PdfContentStreamHandler$Do.invoke(PdfContentStreamHandler.java:976)
	at com.lowagie.text.pdf.parser.PdfContentStreamHandler.lambda$invokeOperator$0(PdfContentStreamHandler.java:204)
	at java.base/java.util.Optional.ifPresent(Optional.java:178)
	at com.lowagie.text.pdf.parser.PdfContentStreamHandler.invokeOperator(PdfContentStreamHandler.java:204)
	at com.lowagie.text.pdf.parser.PdfTextExtractor.processContent(PdfTextExtractor.java:218)
	at com.lowagie.text.pdf.parser.PdfTextExtractor.getTextFromPage(PdfTextExtractor.java:199)
	at com.lowagie.text.pdf.parser.PdfTextExtractor.getTextFromPage(PdfTextExtractor.java:177)

I'm getting the error with version 2.0.2

The problems seems to be on that line because resource2 is null:

PdfDictionary resources2 = stream.getAsDict(PdfName.RESOURCES);

The error seems similar to #650

Please let me know if you need further information to help troubleshooting this, thanks in advance!

@gtoison gtoison added the bug label Aug 19, 2024
@andreasrosdal
Copy link
Contributor

Hello, can you please share a PDF file where this problem occurs? This will make it easier to make a fix.

The issue you are encountering is related to the resources dictionary sometimes being null. This typically happens if the page does not contain a resources dictionary directly. However, the resources dictionary might be inherited from the parent pages (for example, from a "Pages" dictionary).

Pull requests welcome!

@gtoison
Copy link
Author

gtoison commented Aug 21, 2024

Thank you for the answer, the document contains confidential information so I can't unfortunately share it here.
I tried making a fix with your suggestion to look for a "Pages" dictionary but ran into the problem that Eclipse won't open it because a maven module "openpdf" has the same name as the project "OpenPDF". I don't have a good connectivity where I am now, I'll try with Intellij

@gtoison
Copy link
Author

gtoison commented Aug 21, 2024

It does not seem to crash with that change: 6b51521
That might be a misguided fix because I don't quite know what the code is supposed to do :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants