Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HtmlParser cannot recognise base64-encoded images #1230

Open
polentino opened this issue Oct 31, 2024 · 0 comments
Open

HtmlParser cannot recognise base64-encoded images #1230

polentino opened this issue Oct 31, 2024 · 0 comments
Labels

Comments

@polentino
Copy link

Describe the bug

I am trying to generate a PDF document from an html file that contains some base64-encoded images.

The procedure to embed those images is taken straight from the wikipedia page about "data URI scheme":

<img alt="" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg==" style="width:36pt;height:36pt" />

However, the PDF generation process throws the following exception:

com.lowagie.text.ExceptionConverter: /Users/dc/projects/onboarding-api/data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4/8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg== (No such file or directory)

Which seems to be related to the fact that ElementFactory.getImage internally calls Image.getInstance(String filename) without proper checks of its attributes map, mistakenly confusing the url property for a real path.

To Reproduce

  1. Sample Code

Slight adaptation from the ParseTableHtml example in this repo:

public class ParseBase64ImageHtml {

    /**
     * Generates an HTML page with a base64-encoded image
     *
     * @param args no arguments needed here
     */
    public static void main(String[] args) {
        String htmlContent = "<html><head><title></title></head><body><img alt="" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg==" style="width:36pt;height:36pt" /></body></html>"

        // step 1: creation of a document-object
        try (Document document = new Document()) {
            PdfWriter.getInstance(document, Files.newOutputStream(Paths.get("parseBase64Image.pdf")));
            // step 2: we open the document
            document.open();
            // step 3: parsing the HTML document to convert it in PDF
            HtmlParser.parse(document,  new InputSource(new StringReader(htmlContent)));
        } catch (DocumentException | IOException de) {
            System.err.println(de.getMessage());
        }
    }
}

Expected behavior

The conversion process should succeed and generate a pdf document with a small red dot, instead of throwing an exception

System

  • OS: MacOS Sonoma 14.6.1 (23G93)
  • Used font: system default
  • OpenPDF version: 2.0.3

Your real name

Diego Casella

@polentino polentino added the bug label Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant