Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strip whitespace from data: URLs in stylesheets to prevent AMP validation error #1089

Closed
jonhendershot opened this issue Apr 20, 2018 · 33 comments
Assignees
Milestone

Comments

@jonhendershot
Copy link

jonhendershot commented Apr 20, 2018

Validator reports CSS syntax error in tag 'style amp-custom' - bad url.

This is still throwing errors on Google Search Console. See ticket #604.

Worth noting that we're on v0.7 of the plugin to fix sanitization issues with the carousel, as per your recommendation in ticket #1076

screen shot 2018-04-20 at 1 40 12 pm

@westonruter
Copy link
Member

@jonhendershot I can confirm this is an error v0.6. However, it is fixed when I check out the 0.7 branch due to the fix merged in #1004. Note that this was not part of 0.7-beta1. Are you trying with current build of the 0.7 branch (with instructions in #1076 (comment))? Note that we're planning to package a 0.7-rc1 release on Monday.

@westonruter
Copy link
Member

I can see that it was not fixed in 0.7-beta1, but it is fixed in the current state of the 0.7 branch.

@jonhendershot
Copy link
Author

Thanks for such a quick reply @westonruter - I do believe we're packaged with the current build via the instructions given in #1076 . We would've built from it earlier this week.

@westonruter
Copy link
Member

@jonhendershot please share the URL for the page in question. Also, try running the output through https://validator.ampproject.org/ since it could be that the Google Search Console is reporting based on a cached version.

@jonhendershot
Copy link
Author

@westonruter We've got quite a few instances of this error on the site currently. Just checked it again on https://validator.ampproject.org/ and confirmed seeing this issue.

https://www.wellandgood.com/good-looks/millie-bobby-brown-red-carpet-sneaker-look/amp/

@westonruter
Copy link
Member

@jonhendershot I'm seeing:

<meta name="generator" content="AMP Plugin v0.7-beta1" />

Would you please do another build off of the 0.7 branch to make sure it is on the latest? Note that once you do this you should see the generator as:

<meta name="generator" content="AMP Plugin v0.7-beta1-d65578d-20180420T185722Z" />

@westonruter
Copy link
Member

@jonhendershot it would be great to get confirmation on this ASAP as we're planning to release v0.7-RC1 on Monday.

@westonruter
Copy link
Member

@jonhendershot we're going to proceed with the 0.7-RC1 release shortly.

@jonhendershot
Copy link
Author

jonhendershot commented Apr 23, 2018

hi @westonruter -- Sorry for the delay. We've updated the plugin and are still seeing the same issues. We've staged the updated plugin in the following environment:
https://amp-slides-well-good.pantheonsite.io/good-looks/millie-bobby-brown-red-carpet-sneaker-look/amp/

Confirmed that we're seeing the following meta tag in our head:
<meta name="generator" content="AMP Plugin v0.7-beta1-d65578d-20180420T213729Z" />

@westonruter
Copy link
Member

@jonhendershot thanks, I was able to replicate the issue this time. The issue is that there seems to be a space in the original data: URL in your content:

background:url(data:image/png; base64,ivborw0kggoaa...

If I copy your original markup and remove this extra space then the result properly validates. Can you try that?

Either way, this should be fixed, but it's not as bad an issue as #604 was, since it completely corrupted the data: URL.

@jonhendershot
Copy link
Author

@westonruter gotcha - thanks! I'm not totally clear, though -- it seems that that string is something generated by the plugin. I can't seem to find it in our codebase anywhere so I'm assuming it's being built somewhere. Should I be digging deeper to find a solve, or is this something on your side?

@westonruter
Copy link
Member

@jonhendershot The style rule is coming from an inline style. Specifically, it is coming from this block of HTML:

<blockquote class="instagram-media" style="background: #FFF; border: 0; border-radius: 3px; box-shadow: 0 0 1px 0 rgba(0,0,0,0.5),0 1px 10px 0 rgba(0,0,0,0.15); margin: 1px; max-width: 658px; padding: 0; width: calc(100% - 2px);" data-instgrm-permalink="https://www.instagram.com/p/BePtv5Ug43B/" data-instgrm-version="8">
<div style="padding: 8px;">
<div style="background: #F8F8F8; line-height: 0; margin-top: 40px; padding: 62.5% 0; text-align: center; width: 100%;">
<div style="background: url(data:image/png; base64,ivborw0kggoaaaansuheugaaacwaaaascamaaaapwqozaaaabgdbtueaalgpc/xhbqaaaafzukdcak7ohokaaaamuexurczmzpf399fx1+bm5mzy9amaaadisurbvdjlvzxbesmgces5/p8/t9furvcrmu73jwlzosgsiizurcjo/ad+eqjjb4hv8bft+idpqocx1wjosbfhh2xssxeiyn3uli/6mnree07uiwjev8ueowds88ly97kqytlijkktuybbruayvh5wohixmpi5we58ek028czwyuqdlkpg1bkb4nnm+veanfhqn1k4+gpt6ugqcvu2h2ovuif/gwufyy8owepdyzsa3avcqpvovvzzz2vtnn2wu8qzvjddeto90gsy9mvlqtgysy231mxry6i2ggqjrty0l8fxcxfcbbhwrsyyaaaaaelftksuqmcc); display: block; height: 44px; margin: 0 auto -44px; position: relative; top: -22px; width: 44px;"></div>
</div>
<p style="color: #c9c8cd; font-family: Arial,sans-serif; font-size: 14px; line-height: 17px; margin-bottom: 0; margin-top: 8px; overflow: hidden; padding: 8px 0 7px; text-align: center; text-overflow: ellipsis; white-space: nowrap;"><a style="color: #c9c8cd; font-family: Arial,sans-serif; font-size: 14px; font-style: normal; font-weight: normal; line-height: 17px; text-decoration: none;" href="https://www.instagram.com/p/BePtv5Ug43B/" target="_blank" rel="noopener">A post shared by Millie Bobby Brown (@milliebobbybrown)</a> on <time style="font-family: Arial,sans-serif; font-size: 14px; line-height: 17px;" datetime="2018-01-22T08:46:26+00:00">Jan 22, 2018 at 12:46am PST</time>
</div>
</blockquote>

Is this coming from an Instagram embed? Please share the underlying post_content for this embed so we can reproduce.

@westonruter
Copy link
Member

I can see actually that this is the Instagram embed code they supply. Here's another embed code snippet just copied from a post:

<blockquote class="instagram-media" data-instgrm-captioned data-instgrm-permalink="https://www.instagram.com/p/Bh7jRUkBIzB/" data-instgrm-version="8" style=" background:#FFF; border:0; border-radius:3px; box-shadow:0 0 1px 0 rgba(0,0,0,0.5),0 1px 10px 0 rgba(0,0,0,0.15); margin: 1px; max-width:658px; padding:0; width:99.375%; width:-webkit-calc(100% - 2px); width:calc(100% - 2px);"><div style="padding:8px;"> <div style=" background:#F8F8F8; line-height:0; margin-top:40px; padding:50.0% 0; text-align:center; width:100%;"> <div style=" background:url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAACwAAAAsCAMAAAApWqozAAAABGdBTUEAALGPC/xhBQAAAAFzUkdCAK7OHOkAAAAMUExURczMzPf399fX1+bm5mzY9AMAAADiSURBVDjLvZXbEsMgCES5/P8/t9FuRVCRmU73JWlzosgSIIZURCjo/ad+EQJJB4Hv8BFt+IDpQoCx1wjOSBFhh2XssxEIYn3ulI/6MNReE07UIWJEv8UEOWDS88LY97kqyTliJKKtuYBbruAyVh5wOHiXmpi5we58Ek028czwyuQdLKPG1Bkb4NnM+VeAnfHqn1k4+GPT6uGQcvu2h2OVuIf/gWUFyy8OWEpdyZSa3aVCqpVoVvzZZ2VTnn2wU8qzVjDDetO90GSy9mVLqtgYSy231MxrY6I2gGqjrTY0L8fxCxfCBbhWrsYYAAAAAElFTkSuQmCC); display:block; height:44px; margin:0 auto -44px; position:relative; top:-22px; width:44px;"></div></div> <p style=" margin:8px 0 0 0; padding:0 4px;"> <a href="https://www.instagram.com/p/Bh7jRUkBIzB/" style=" color:#000; font-family:Arial,sans-serif; font-size:14px; font-style:normal; font-weight:normal; line-height:17px; text-decoration:none; word-wrap:break-word;" target="_blank">Introducing the new “Psycho Kaler”! Kale, Calabrian Chilies, Cream, Mozzarella, Garlic, and Ricotta. #pizzaweekiseveryweek</a></p> <p style=" color:#c9c8cd; font-family:Arial,sans-serif; font-size:14px; line-height:17px; margin-bottom:0; margin-top:8px; overflow:hidden; padding:8px 0 7px; text-align:center; text-overflow:ellipsis; white-space:nowrap;">A post shared by <a href="https://www.instagram.com/fillmorepdx/" style=" color:#c9c8cd; font-family:Arial,sans-serif; font-size:14px; font-style:normal; font-weight:normal; line-height:17px;" target="_blank"> [email protected]</a> (@fillmorepdx) on <time style=" font-family:Arial,sans-serif; font-size:14px; line-height:17px;" datetime="2018-04-23T22:55:39+00:00">Apr 23, 2018 at 3:55pm PDT</time></p></div></blockquote> <script async defer src="//www.instagram.com/embed.js"></script>

Notice here that there is no space in the background image data: URL. It appears that your code was modified to add the space.

@jonhendershot
Copy link
Author

@westonruter yah good call -- thanks for digging through that for me!! Looks like that code is being added via the TinyMCE content editor, so it'll be on content management and/or a content filter to protect against it. Thank you again for such quick replies this week!

@westonruter
Copy link
Member

I'll re-open this because it's something we can fix in CSS sanitizer to prevent extra spaces from being added in the first place.

@westonruter westonruter reopened this Apr 24, 2018
@westonruter westonruter added this to the v1.0 milestone Apr 24, 2018
@westonruter westonruter changed the title CSS syntax error in tag 'style amp-custom' - bad url. Strip whitespace from data: URLs in stylesheets to prevent CSS syntax error in tag 'style amp-custom' - bad url. Apr 24, 2018
@westonruter westonruter changed the title Strip whitespace from data: URLs in stylesheets to prevent CSS syntax error in tag 'style amp-custom' - bad url. Strip whitespace from data: URLs in stylesheets to prevent AMP validation error Apr 24, 2018
@westonruter
Copy link
Member

To fix the issue, this is where I would work:

https://github.com/Automattic/amp-wp/blob/1e2cd22f1bfe8ab9d833c901e02111120f52fe72/includes/sanitizers/class-amp-style-sanitizer.php#L749-L758

The real_path_urls method should be renamed to normalize_urls, and then instead of just skipping data: URLs, it should remove all whitespace before continuing.

@jonhendershot
Copy link
Author

ah, wonderful! Thanks @westonruter -- will keep an eye on this

@amedina amedina self-assigned this Apr 24, 2018
@jonhendershot
Copy link
Author

@westonruter just wanted to follow up on this and see if it has been addressed yet. Thanks!

@westonruter
Copy link
Member

@jonhendershot no it has not.

@jonhendershot
Copy link
Author

@westonruter ok cool - any estimate on when we can expect to see this patched?

@westonruter
Copy link
Member

It will be part of the 1.0 release which would be coming this summer. I suggest implementing an alternative with a the_content filter in the mean time.

@Mte90
Copy link

Mte90 commented May 18, 2018

I fixed with this code:

add_filter( 'the_content', 'fix_amp_space' );

function fix_amp_space( $content ) {
    if ( is_amp_endpoint() ) {
        return str_replace('data:image/png; base64,', 'data:image/png;base64,', $content);
    }
    return $content;
}

But I don't understand why in 0.8 this fix cannot be implemented because there are a lot of people reporting this problem.

@westonruter
Copy link
Member

The fix here using the_content is more of a workaround than a proper fix (as the fix here won't apply to CSS in external stylesheets). Only with 1.0 do we have an actual CSS Parser which can be used to extract the data: URLs from stylesheets. In 0.7 the only option is to do some string replacement like this. But it's a pretty brute-force fix.

@westonruter
Copy link
Member

@amedina Instead of removing the spaces from the data: URL it may be better to wrap the content in quotes. This would also fix the issue of using SVG in data: URLs per: https://wordpress.org/support/topic/background-gradients-invalid/

@mehigh
Copy link
Contributor

mehigh commented May 23, 2018

Wrapping the dataUri with quotes is indeed the better approach here.

@westonruter
Copy link
Member

@mehigh why? URLs don't allow whitespace so just removing whitespace seems to have the same effect, and it also has the added benefit of saving at least 3 bytes.

@mehigh
Copy link
Contributor

mehigh commented May 23, 2018

I googled. What I knew was not up to date.. this was actually something I picked up from the CSS validator years back, but it seems this was reported as a bug which got fixed .. 3 years ago, since quotes are optional for data uris as well. I've always used quotes myself though.
w3c/css-validator#42 (comment)

TL;DR: dropping the quotes for data embeds is OK.

@davisshaver
Copy link
Contributor

Chiming in after getting AMP regressions.

Since removing the whitespace works but is a little brute force, could we wrap in the single quote to support legacy embeds (like Instagram)? I wouldn't call it better but at least it would fix the errors until 1.0 version is ready.

@westonruter
Copy link
Member

@davisshaver Regressions as in this wasn't a problem in 0.6?

If the issue is specifically Instagram embeds, then the proper solution for that would be in #1103. It seems WordPress has a tendency to add spaces inside the data: URL and so this might be actually an issue with TinyMCE. In the general case, the fix is to target the data: URLs as in #1164 to remove the spaces. This targeting of URLs is most reliable when we have an actual CSS parser, which we do as of 1.0-alpha. In the mean time, I suggest in the mean time using a the_content filter on your own site to strip out the invalid spaces as a temporary workaround, like in #1089 (comment)

@davisshaver
Copy link
Contributor

Nice! Thanks @westonruter. By regression I just meant that formerly-validating AMP pages were no longer validating. It looks like @amedina's patch will fix this though. 🚀

@westonruter
Copy link
Member

formerly-validating AMP pages were no longer validating

@davisshaver right, but are they no longer validating because of a change in the plugin or a change on your site?

@davisshaver
Copy link
Contributor

@westonruter Sorry I really don't know... I think the change in plugin caused it, as I don't believe anything related to inline images changed on the site; the content affected was all archival (Feb 2018 or earlier); and I don't recall seeing this error at any point previously. BUT I am unable to verify that the pages were at any time indexed in AMP, as analytics data shows no hits to the pages, and I didn't keep clean enough records of plugin versions to put this into context of v0.6/v0.7. Sorry I can't be more help.

@kienstra
Copy link
Contributor

kienstra commented Jun 5, 2018

Steps To Test

Hi @csossi,
Could you please test this?

  1. Create a new post on the Native AMP staging site
  2. Copy this Instagram URL into the post editor: https://www.instagram.com/p/BjnK5eGlil4
  3. This should render as an Instagram post
  4. Visit the front-end of the post
  5. Expected: There are no AMP validation errors, when using the Chrome AMP extension, or the AMP Validator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants