Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning: Dom\HTMLDocument::createFromString(): tree error unexpected-token-in-initial-mode in Entity #16807

Closed
askonomm opened this issue Nov 15, 2024 · 4 comments

Comments

@askonomm
Copy link

askonomm commented Nov 15, 2024

Description

The following code:

<?php

$dom = HTMLDocument::createFromString('<p>hello</p>', HTML_NO_DEFAULT_NS);

Resulted in this output:

Warning: Dom\HTMLDocument::createFromString(): tree error unexpected-token-in-initial-mode in Entity

But I expected this output instead:

It to work normally, given that it gets a perfectly valid piece of HTML. It doesn't break the HTML parsing, so it seems to work, but for some reason throws a warning.

PHP Version

PHP 8.4.0 RC1

Operating System

Debian 12

@devnexen
Copy link
Member

devnexen commented Nov 15, 2024

hmmm do not know if that is a bug, i.e. if we are supposed to pass a full valid html doc. But I do not have the full picture so cc @nielsdos.

@nielsdos
Copy link
Member

The dtd is missing, so the document is actually not standard compliant.
You can silence this by passing LIBXML_NOERROR to the options as well (using bitwise OR).

@nielsdos nielsdos closed this as not planned Won't fix, can't repro, duplicate, stale Nov 15, 2024
@askonomm
Copy link
Author

So then partial HTML documents are not supported? There are use cases where you'd want to parse recursively parts of a document, e.g template partials come to mind as a very common use case. I suppose I can indeed suppress the warning, but that to me indicates that I can't rely on making something that is thought to be wrong design, as that would mean the warning can turn into a fatal error at some point in the future. Either that or I just sort of "hack it" and programmatically add a DTD to a HTML partial if its not present, then remove it before merging it to the larger document.

@nielsdos
Copy link
Member

So then partial HTML documents are not supported? There are use cases where you'd want to parse recursively parts of a document, e.g template partials come to mind as a very common use case.

Yes it is, via $innerHTML. Not via Dom\HTMLDocument nor via DOMDocument as they expect full documents.
Note that "partial HTML" does not actually exist, how it is parsed depends on the context where the fragment is used in.

as that would mean the warning can turn into a fatal error at some point in the future.

It won't be a fatal error or exception because suppressing parse errors is something people want to do legitimately. You'll find that a lot of web pages actually violate at least one of the parsing/tokenization rules that HTML defines.

Either that or I just sort of "hack it" and programmatically add a DTD to a HTML partial if its not present, then remove it before merging it to the larger document.

Note that the parser adds a body and html tag implicitly, so you'll have to get rid of that too or use the LIBXML_HTML_NOIMPLIED option. But again, using $innerHTML would be the better option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants