Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tag Processor: throw when supplied unacceptible attribute names. #44431

Merged
merged 7 commits into from
Sep 30, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions lib/experimental/html/class-wp-html-tag-processor.php
Original file line number Diff line number Diff line change
Expand Up @@ -942,12 +942,57 @@ public function get_tag() {
*
* @param string $name The attribute name to target.
* @param string|boolean $value The new attribute value.
* @throws Exception When WP_DEBUG is true and the attribute name is invalid.
*/
public function set_attribute( $name, $value ) {
if ( null === $this->tag_name_starts_at ) {
return;
}

/*
* Verify that the attribute name is allowable. In WP_DEBUG
* environments we want to crash quickly to alert developers
* of typos and issues; but in production we don't want to
* interrupt a normal page view, so we'll silently avoid
* updating the attribute in those cases.
*
* Of note, we're disallowing more characters than are strictly
* forbidden in HTML5. This is to prevent additional security
* risks deeper in the WordPress and plugin stack. Specifically
* we reject the less-than (<) greater-than (>) and ampersand (&).
*
* The use of a PCRE match allows us to look for specific Unicode
* code points without writing a UTF-8 decoder. Whereas scanning
* for one-byte characters is trivial (with `strcspn`), scanning
* for the longer byte sequences would be more complicated, and
* this shouldn't be in the hot path for execution so we can
* compromise on the efficiency at this point.
*
* @see https://html.spec.whatwg.org/#attributes-2
*/
if ( preg_match(
'~[' .
// Syntax-like characters.
'"\'>&</ =' .
// Control characters.
'\x{00}-\x{1F}' .
// HTML noncharacters.
'\x{FDD0}-\x{FDEF}' .
'\x{FFFE}\x{FFFF}\x{1FFFE}\x{1FFFF}\x{2FFFE}\x{2FFFF}\x{3FFFE}\x{3FFFF}' .
'\x{4FFFE}\x{4FFFF}\x{5FFFE}\x{5FFFF}\x{6FFFE}\x{6FFFF}\x{7FFFE}\x{7FFFF}' .
'\x{8FFFE}\x{8FFFF}\x{9FFFE}\x{9FFFF}\x{AFFFE}\x{AFFFF}\x{BFFFE}\x{BFFFF}' .
'\x{CFFFE}\x{CFFFF}\x{DFFFE}\x{DFFFF}\x{EFFFE}\x{EFFFF}\x{FFFFE}\x{FFFFF}' .
'\x{10FFFE}\x{10FFFF}' .
']~Ssu',
$name
) ) {
if ( defined( 'WP_DEBUG' ) && WP_DEBUG ) {
throw new Exception( 'Invalid attribute name' );
}

return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return new WP_Error perhaps?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should return an object as devs won't check the return value anyway.

The special case in the debug mode looks like a reasonable approach. I'm curious whether there is any standardized way of logging those issues.

Maybe we could call _doing_it_wrong. We use that in several places when registering blocks or similar stuff for the block editor, for example:

https://github.com/WordPress/wordpress-develop/blob/2b1febd20d77898eb81439a688ea5597da00172a/src/wp-includes/class-wp-block-type-registry.php#L55L62

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the big point about this patch that I'm unsure of. my only goal is to quickly bail if someone submits the wrong attribute, as I don't think it's likely people will send user-input attribute names here. if they do do that then I don't want it to crash the full render.

I'll ask around and see if anyone else has experience with this kind of failure.

Copy link
Contributor

@azaozz azaozz Sep 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my only goal is to quickly bail if someone submits the wrong attribute

Yes, think this is the best approach here. If a plugin allows user input for attribute names (very unlikely as usually these are not random), it will have to make sure the input is valid, or the attribute will be skipped.

Copy link
Contributor

@azaozz azaozz Sep 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious whether there is any standardized way of logging those issues.

Think there isn't one, unfortunately. Has been discussed few times afaik but a decision on how to implement it was not reached. Thinking it is time for this to be added to core. Perhaps another constant: WP_DEV_MODE or similar, then be much bolder about throwing errors and exceptions (with backtrace?) and writing in logs.

Using WP_DEBUG seems proper here imho. doing_it_wrong doesn't seem as good because it is targeted more at developers that try to use some function/method improperly (when there are better ways).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@azaozz, great feedback! Thank you so much for clarifying where given options fit best.

}

/*
* > The values "true" and "false" are not allowed on boolean attributes.
* > To represent a false value, the attribute has to be omitted altogether.
Expand Down
145 changes: 145 additions & 0 deletions phpunit/html/WP_HTML_Tag_Processor_Isolated_Test.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
<?php
/**
* Unit tests covering WP_HTML_Tag_Processor functionality.
*
* @package WordPress
* @subpackage HTML
*/

if ( ! function_exists( 'esc_attr' ) ) {
function esc_attr( $s ) {
return str_replace( '"', '&quot;', $s );
}
}

if ( ! class_exists( 'WP_UnitTestCase' ) ) {
abstract class WP_UnitTestCase extends \PHPUnit\Framework\TestCase {}
}

require_once __DIR__ . '/../../lib/experimental/html/index.php';

/**
* Runs tests in isolated PHP process for verifying behaviors
* that depend on the `WP_DEBUG` constant value, if set.
*
* @group html
*
* @coversDefaultClass WP_HTML_Tag_Processor
*/
class WP_HTML_Tag_Processor_Isolated_Test extends WP_UnitTestCase {
// phpcs:disable WordPress.NamingConventions.ValidVariableName.PropertyNotSnakeCase
protected $runTestInSeparateProcess = true;

/**
* Attribute names with invalid characters should be rejected.
*
* When WP_DEBUG is set we want to throw an error to alert a
* developer that they are sending invalid attribute names.
*
* @dataProvider data_invalid_attribute_names
* @covers set_attribute
*/
public function test_set_attribute_throw_when_given_invalid_attribute_names_in_debug_mode( $attribute_name ) {
define( 'WP_DEBUG', true );
$p = new WP_HTML_Tag_Processor( '<span></span>' );

$this->expectException( Exception::class );

$p->next_tag();
$p->set_attribute( $attribute_name, 'test' );

$this->assertEquals( '<span></span>', (string) $p );
}

/**
* Attribute names with invalid characters should be rejected.
*
* When WP_DEBUG isn't set we want to quietly fail to set the
* invalid attribute to avoid breaking the HTML and to do so
* without breaking the entire page.
*
* @dataProvider data_invalid_attribute_names
* @covers set_attribute
*/
public function test_set_attribute_silently_fails_when_given_invalid_attribute_names_outside_of_debug_mode( $attribute_name ) {
$p = new WP_HTML_Tag_Processor( '<span></span>' );

$p->next_tag();
$p->set_attribute( $attribute_name, 'test' );

$this->assertEquals( '<span></span>', (string) $p );
}

/**
* Data provider with invalid HTML attribute names.
*
* @return array {
* @type string $attribute_name Text considered invalid for HTML attribute names.
* }
*/
public function data_invalid_attribute_names() {
return array(
'controls_null' => array( "i\x00d" ),
'controls_newline' => array( "\nbroken-expectations" ),
'space' => array( 'aria label' ),
'double-quote' => array( '"id"' ),
'single-quote' => array( "'id'" ),
'greater-than' => array( 'sneaky>script' ),
'solidus' => array( 'data/test-id' ),
'equals' => array( 'checked=checked' ),
'noncharacters_1' => array( html_entity_decode( 'anything&#xFDD0;' ) ),
'noncharacters_2' => array( html_entity_decode( 'te&#xFFFF;st' ) ),
'noncharacters_3' => array( html_entity_decode( 'te&#x2FFFE;st' ) ),
'noncharacters_4' => array( html_entity_decode( 'te&#xDFFFF;st' ) ),
'noncharacters_5' => array( html_entity_decode( '&#x10FFFE;' ) ),
'wp_no_lt' => array( 'id<script' ),
'wp_no_amp' => array( 'class&lt;script' ),
);
}

/**
* Attribute names with only valid characters should not be rejected.
*
* > Attributes have a name and a value. Attribute names must
* > consist of one or more characters other than controls,
* > U+0020 SPACE, U+0022 ("), U+0027 ('), U+003E (>),
* > U+002F (/), U+003D (=), and noncharacters.
*
* @see https://html.spec.whatwg.org/#attributes-2
*
* @dataProvider data_valid_attribute_names
* @covers set_attribute
*/
public function test_set_attribute_does_not_reject_valid_attribute_names( $attribute_name ) {
define( 'WP_DEBUG', true );
$p = new WP_HTML_Tag_Processor( '<span></span>' );

$p->next_tag();
$p->set_attribute( $attribute_name, 'test' );

$this->assertEquals( "<span $attribute_name=\"test\"></span>", (string) $p );
}

/**
* Data provider with valid HTML attribute names.
*
* @return array {
* @type string $attribute_name Text considered valid for HTML attribute names.
* }
*/
public function data_valid_attribute_names() {
return array(
'ascii_letters' => array( 'abcdefghijklmnopqrstuwxyzABCDEFGHIJKLMNOPQRSTUWXYZ' ),
'ascii_numbers' => array( '0123456789' ),
'symbols' => array( '!@#$%^*()[]{};:\\||,.?`~£§±' ),
'emoji' => array( '❌' ),
'utf8_diacritics' => array( 'ÁÄÂÀÃÅČÇĆĎÉĚËÈÊẼĔȆĞÍÌÎÏİŇÑÓÖÒÔÕØŘŔŠŞŤÚŮÜÙÛÝŸŽáäâàãåčçćďéěëèêẽĕȇğíìîïıňñóöòôõøðřŕšşťúůüùûýÿžþÞĐđßÆa' ),
'hebrew_accents' => array( html_entity_decode( '&#x059D;a' ) ),
// See https://arxiv.org/abs/2111.00169.
'rtl_magic' => array( html_entity_decode( '&#x2067;&#x2066;abc&#x2069;&#x2066;def&#x2069;&#x2069;' ) ),
// Only a single unicode "noncharacter" should be rejected. Specific byte segments used in the "noncharacter" sequence are valid.
'noncharacter_segments' => array( "\xFF\xFE" ),
);
}

}