HtmlParser cannot recognise base64-encoded images #1230

polentino · 2024-10-31T13:57:02Z

Describe the bug

I am trying to generate a PDF document from an html file that contains some base64-encoded images.

The procedure to embed those images is taken straight from the wikipedia page about "data URI scheme":

<img alt="" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg==" style="width:36pt;height:36pt" />

However, the PDF generation process throws the following exception:

com.lowagie.text.ExceptionConverter: /Users/dc/projects/onboarding-api/data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4/8/w38GIAXDIBKE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg== (No such file or directory)

Which seems to be related to the fact that ElementFactory.getImage internally calls Image.getInstance(String filename) without proper checks of its attributes map, mistakenly confusing the url property for a real path.

To Reproduce

Sample Code

Slight adaptation from the ParseTableHtml example in this repo:

public class ParseBase64ImageHtml {

    /**
     * Generates an HTML page with a base64-encoded image
     *
     * @param args no arguments needed here
     */
    public static void main(String[] args) {
        String htmlContent = "<html><head><title></title></head><body><img alt="" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNbyblAAAAHElEQVQI12P4//8/w38GIAXDIBE0DHxgljNBAAO9TXL0Y4OHwAAAABJRU5ErkJggg==" style="width:36pt;height:36pt" /></body></html>"

        // step 1: creation of a document-object
        try (Document document = new Document()) {
            PdfWriter.getInstance(document, Files.newOutputStream(Paths.get("parseBase64Image.pdf")));
            // step 2: we open the document
            document.open();
            // step 3: parsing the HTML document to convert it in PDF
            HtmlParser.parse(document,  new InputSource(new StringReader(htmlContent)));
        } catch (DocumentException | IOException de) {
            System.err.println(de.getMessage());
        }
    }
}

Expected behavior

The conversion process should succeed and generate a pdf document with a small red dot, instead of throwing an exception

System

OS: MacOS Sonoma 14.6.1 (23G93)
Used font: system default
OpenPDF version: 2.0.3

Your real name

Diego Casella

The text was updated successfully, but these errors were encountered:

polentino added the bug label Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HtmlParser cannot recognise base64-encoded images #1230

HtmlParser cannot recognise base64-encoded images #1230

polentino commented Oct 31, 2024

HtmlParser cannot recognise base64-encoded images #1230

HtmlParser cannot recognise base64-encoded images #1230

Comments

polentino commented Oct 31, 2024

Describe the bug

To Reproduce

Expected behavior

System

Your real name