Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent saveHTML from munging amp-mustache curly braces via URL-encoding #906

Merged
merged 1 commit into from
Jan 26, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 56 additions & 0 deletions includes/utils/class-amp-dom-utils.php
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,22 @@ public static function get_dom( $document ) {
// @todo In the future consider an AMP_DOMDocument subclass that does this automatically. See <https://github.com/Automattic/amp-wp/pull/895/files#r163825513>.
$document = self::convert_amp_bind_attributes( $document );

/*
* Prevent amp-mustache syntax from getting URL-encoded in attributes when saveHTML is done.
* While this is applying to the entire document, it only really matters inside of <template>
* elements, since URL-encoding of curly braces in href attributes would not normally matter.
* But when this is done inside of a <template> then it breaks Mustache. Since Mustache
* is logic-less and curly braces are not unsafe for HTML, we can do a global replacement.
* The replacement is done on the entire HTML document instead of just inside of the <template>
* elements since it is faster and wouldn't change the outcome.
*/
$placeholders = self::get_mustache_tag_placeholders();
$document = str_replace(
array_keys( $placeholders ),
array_values( $placeholders ),
$document
);

/*
* Wrap in dummy tags, since XML needs one parent node.
* It also makes it easier to loop through nodes.
Expand Down Expand Up @@ -98,6 +114,38 @@ public static function get_amp_bind_placeholder_prefix() {
return $attribute_prefix;
}

/**
* Get amp-mustache tag/placeholder mappings.
*
* @since 0.7
* @see \wpdb::placeholder_escape()
*
* @return array Mapping of mustache tag token to its placeholder.
*/
private static function get_mustache_tag_placeholders() {
static $placeholders;
if ( ! isset( $placeholders ) ) {
$salt = wp_rand();

// Note: The order of these tokens is important, as it determines the order of the order of the replacements.
$tokens = array(
'{{{',
'}}}',
'{{#',
'{{^',
'{{/',
'{{/',
'{{',
'}}',
);
$placeholders = array();
foreach ( $tokens as $token ) {
$placeholders[ $token ] = '_amp_mustache_' . md5( $salt . $token );
}
}
return $placeholders;
}

/**
* Replace AMP binding attributes with something that libxml can parse (as HTML5 data-* attributes).
*
Expand Down Expand Up @@ -285,6 +333,14 @@ public static function get_content_from_dom_node( $dom, $node ) {

$html = self::restore_amp_bind_attributes( $html );

// Restore amp-mustache placeholders which were replaced to prevent URL-encoded corruption by saveHTML.
Copy link
Collaborator

@ThierryA ThierryA Jan 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps something like the prototype replacement below would have been sufficient:

$html = preg_replace_callback(
	"#href=[\"|'](%7B.*%7D)?[\"|']+#",
	function( $matches ) {
		return urldecode( $matches[0] );
	},
	$html
);

If that works for you, it can be updated in another PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't want to do replacements on encodings that were original. And I wanted to limit the replacements to braces.

$placeholders = self::get_mustache_tag_placeholders();
$html = str_replace(
array_values( $placeholders ),
array_keys( $placeholders ),
$html
);

/*
* Travis w/PHP 7.1 generates <br></br> and <hr></hr> vs. <br/> and <hr/>, respectively.
* Travis w/PHP 7.x generates <source ...></source> vs. <source ... />. Etc.
Expand Down
2 changes: 1 addition & 1 deletion tests/test-tag-and-attribute-sanitizer.php
Original file line number Diff line number Diff line change
Expand Up @@ -317,7 +317,7 @@ public function get_data() {
),

'attribute_value_valid' => array(
'<template type="amp-mustache">Template Data</template>',
'<template type="amp-mustache">Hello {{world}}! <a href="{{user_url}}" title="{{user_name}}">Homepage</a> User content: {{{some_formatting}}} A guy with Mustache: :-{). {{#returning}}Welcome back!{{/returning}} {{^returning}}Welcome for the first time!{{/returning}} </template>',
null,
array( 'amp-mustache' ),
),
Expand Down