Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KSES: Add test cases for "bogus comment" syntax not currently detected. #6833

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 22 additions & 11 deletions src/wp-includes/kses.php
Original file line number Diff line number Diff line change
Expand Up @@ -988,6 +988,9 @@ function wp_kses_split( $content, $allowed_html, $allowed_protocols ) {
(<!--.*?(-->|$)) # - Normative HTML comments.
|
</[^a-zA-Z][^>]*> # - Closing tags with invalid tag names.
|
<![^>]*> # - Invalid markup declaration nodes. Not all invalid nodes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make the same [a-z] check here?

# are matched so as to avoid breaking legacy behaviors.
)
|
(<[^>]*(>|$)|>) # Tag-like spans of text.
Expand Down Expand Up @@ -1114,22 +1117,30 @@ function wp_kses_split2( $content, $allowed_html, $allowed_protocols ) {
}

/*
* When a closing tag appears with a name that isn't a valid tag name,
* it must be interpreted as an HTML comment. It extends until the
* first `>` character after the initial opening `</`.
* When certain invalid syntax constructs appear, the HTML parser
* shifts into what's called the "bogus comment state." This is a
* plaintext state that consumes everything until the nearest `>`
* and then transforms the entire span into an HTML comment.
*
* Preserve these comments and do not treat them like tags.
*
* @see https://html.spec.whatwg.org/#bogus-comment-state
*/
if ( 1 === preg_match( '~^</[^a-zA-Z][^>]*>$~', $content ) ) {
$content = substr( $content, 2, -1 );
$transformed = null;
if ( 1 === preg_match( '~^(?:</[^a-zA-Z][^>]*>|<![a-z][^>]*>)$~', $content ) ) {
/**
* Since the pattern matches `</…>` and also `<!…>`, this will
* preserve the type of the cleaned-up token in the output.
*/
$opener = $content[1];
$content = substr( $content, 2, -1 );

while ( $transformed !== $content ) {
$transformed = wp_kses( $content, $allowed_html, $allowed_protocols );
$content = $transformed;
}
do {
$prev = $content;
$content = wp_kses( $content, $allowed_html, $allowed_protocols );
} while ( $prev !== $content );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test case that fails in trunk and now passes in the PR as a result of this fixed loop?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no. I think it's because we don't have a test case showing how one form of sanitization adds a new thing needing sanitization. given that this is copying the existing behavior I don't want to touch that, and I'm not even sure what the scenario would be where it would need to trigger this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For anyone else looking at this, this is just copying https://github.com/WordPress/wordpress-develop/pull/6833/files#diff-13e853d5729978a0e7936543d65ad1a3e120b994702c9559bf99c3ad857f5066R1153, but had to be rewritten to make the linter happy.


return "</{$transformed}>";
// Recombine the modified inner content with the original token structure.
return "<{$opener}{$content}>";
}

/*
Expand Down
5 changes: 4 additions & 1 deletion tests/phpunit/tests/kses.php
Original file line number Diff line number Diff line change
Expand Up @@ -1936,11 +1936,13 @@ public function filter_wp_kses_object_added_in_html_filter( $tags, $context ) {
*
* @ticket 61009
*
* @dataProvider data_html_containing_various_kinds_of_html_comments
*
* @param string $html_comment HTML containing a comment; must not be a valid comment
* but must be syntax which a browser interprets as a comment.
* @param string $expected_output How `wp_kses()` ought to transform the comment.
*/
public function wp_kses_preserves_html_comments( $html_comment, $expected_output ) {
public function test_wp_kses_preserves_html_comments( $html_comment, $expected_output ) {
$this->assertSame(
$expected_output,
wp_kses( $html_comment, array() ),
Expand All @@ -1957,6 +1959,7 @@ public static function data_html_containing_various_kinds_of_html_comments() {
return array(
'Normative HTML comment' => array( 'before<!-- this is a comment -->after', 'before<!-- this is a comment -->after' ),
'Closing tag with invalid tag name' => array( 'before<//not a tag>after', 'before<//not a tag>after' ),
'Incorrectly opened comment (Markup declaration)' => array( 'before<!also not a tag>after', 'before<!also not a tag>after' ),
);
}

Expand Down
Loading