-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KSES: Add test cases for "bogus comment" syntax not currently detected. #6833
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -988,6 +988,9 @@ function wp_kses_split( $content, $allowed_html, $allowed_protocols ) { | |
(<!--.*?(-->|$)) # - Normative HTML comments. | ||
| | ||
</[^a-zA-Z][^>]*> # - Closing tags with invalid tag names. | ||
| | ||
<![^>]*> # - Invalid markup declaration nodes. Not all invalid nodes | ||
# are matched so as to avoid breaking legacy behaviors. | ||
) | ||
| | ||
(<[^>]*(>|$)|>) # Tag-like spans of text. | ||
|
@@ -1114,22 +1117,30 @@ function wp_kses_split2( $content, $allowed_html, $allowed_protocols ) { | |
} | ||
|
||
/* | ||
* When a closing tag appears with a name that isn't a valid tag name, | ||
* it must be interpreted as an HTML comment. It extends until the | ||
* first `>` character after the initial opening `</`. | ||
* When certain invalid syntax constructs appear, the HTML parser | ||
* shifts into what's called the "bogus comment state." This is a | ||
* plaintext state that consumes everything until the nearest `>` | ||
* and then transforms the entire span into an HTML comment. | ||
* | ||
* Preserve these comments and do not treat them like tags. | ||
* | ||
* @see https://html.spec.whatwg.org/#bogus-comment-state | ||
*/ | ||
if ( 1 === preg_match( '~^</[^a-zA-Z][^>]*>$~', $content ) ) { | ||
$content = substr( $content, 2, -1 ); | ||
$transformed = null; | ||
if ( 1 === preg_match( '~^(?:</[^a-zA-Z][^>]*>|<![a-z][^>]*>)$~', $content ) ) { | ||
/** | ||
* Since the pattern matches `</…>` and also `<!…>`, this will | ||
* preserve the type of the cleaned-up token in the output. | ||
*/ | ||
$opener = $content[1]; | ||
$content = substr( $content, 2, -1 ); | ||
|
||
while ( $transformed !== $content ) { | ||
$transformed = wp_kses( $content, $allowed_html, $allowed_protocols ); | ||
$content = $transformed; | ||
} | ||
do { | ||
$prev = $content; | ||
$content = wp_kses( $content, $allowed_html, $allowed_protocols ); | ||
} while ( $prev !== $content ); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there a test case that fails in trunk and now passes in the PR as a result of this fixed loop? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no. I think it's because we don't have a test case showing how one form of sanitization adds a new thing needing sanitization. given that this is copying the existing behavior I don't want to touch that, and I'm not even sure what the scenario would be where it would need to trigger this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For anyone else looking at this, this is just copying https://github.com/WordPress/wordpress-develop/pull/6833/files#diff-13e853d5729978a0e7936543d65ad1a3e120b994702c9559bf99c3ad857f5066R1153, but had to be rewritten to make the linter happy. |
||
|
||
return "</{$transformed}>"; | ||
// Recombine the modified inner content with the original token structure. | ||
return "<{$opener}{$content}>"; | ||
} | ||
|
||
/* | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make the same [a-z] check here?