Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The [meta charset] tag is duplicated if it's not parsed correctly #4179

Closed
pierlon opened this issue Jan 25, 2020 · 0 comments · Fixed by #4181
Closed

The [meta charset] tag is duplicated if it's not parsed correctly #4179

pierlon opened this issue Jan 25, 2020 · 0 comments · Fixed by #4181
Assignees
Labels
Bug Something isn't working Validation
Milestone

Comments

@pierlon
Copy link
Contributor

pierlon commented Jan 25, 2020

Bug Description

While investigating the validation errors in #4060, it was found that the postmag theme was throwing the DUPLICATE_UNIQUE_TAG validation error for the [meta charset] tag, although that was not the case after inspecting theme's source code. The rendered meta tag by the theme is:

<meta charset = "UTF-8" >

Upon further investigation within the plugin, it was found that the [meta charset] tag was incorrectly parsed and considered not present within the HTML source, and so another one was added.

Both AMP_Theme_Support::prepare_response() and \Amp\AmpWP\Dom\Document::loadHTML() fail to correctly parse the tag, and a duplicate one will be added by either function.

Within AMP_Theme_Support::prepare_response(), this can be traced to the following regex failing to find the tag:

if ( ! preg_match( '#<meta[^>]+charset=#i', substr( $response, 0, 1024 ) ) ) {

The regex pattern is not considering any white-space that may be found between the attribute and the equal sign, and because of this another [meta charset] will be added.

If for some other reason the meta tag is not added there, the call to Document::from_html() will duplicate the tag. This can be traced back to where it adds the required AMP encoding charset via the [meta charset] tag:

amp-wp/src/Dom/Document.php

Lines 353 to 356 in 22727d1

// Add the required utf-8 meta charset tag.
$charset = $this->createElement( 'meta' );
$charset->setAttribute( 'charset', self::AMP_ENCODING );
$this->head->insertBefore( $charset, $this->head->firstChild );

This is done as it is expected that the original encoding is parsed and removed from the HTML source, but the regex pattern used to parse the charset attribute and its value also fails to consider white-space:

const HTML_FIND_TAG_WITH_ATTRIBUTE_PATTERN = '/<%1$s [^>]*?\s*%2$s=[^>]*?>[^<]*(?:<\/%1$s>)?/i';

leading to both the original and required [meta charset] tags being found in the HTML source.

Expected Behaviour

Only one [meta charset] tag should be seen in the HTML source.

Steps to reproduce

  1. Add white-space between the charset and the equal sign in the [meta charset] tag, for example:
<meta charset ="UTF-8" >
  1. Validate a post with the above tag included in the HTML

  2. The following AMP validation error should be seen for the post:

image

Screenshots

Additional context

  • WordPress version:
  • Plugin version:
  • Gutenberg plugin version (if applicable):
  • AMP plugin template mode:
  • PHP version:
  • OS:
  • Browser: [e.g. chrome, safari]
  • Device: [e.g. iPhone6]

Do not alter or remove anything below. The following sections will be managed by moderators only.

Acceptance criteria

  • The [meta charset] tag should not be duplicated if white-space is found between the charset attribute and the equal sign.

Implementation brief

The regex patterns to parse the HTML source for the [meta charset] tag should be adapted to consider white-space that may be found between the charset attribute and the equal sign.

QA testing instructions

Demo

Changelog entry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working Validation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants