-
Notifications
You must be signed in to change notification settings - Fork 384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AMP Optimizer library #4019
Conversation
@westonruter We already discussed having not only a PSR-2 package for the optimizer and a WPCS package for the plugin, but also a shared package with common code. The first optimizer I'm working on ( @pierlon Already wrote a PHP version of most of this, so writing it again for the Optimizer doesn't make sense. The Optimizer also shouldn't have the plugin package as a dependency. The clean solution is to extract it out into this third shared package (naming to be discussed). However, this would mean for now (while we have everything in one repo) that I'm adding a third namespace root already via this PR (which is already adding the second Should we go for "clean separation" for now and later deal with the many subfolders either through separate repos or lerna, or do you want me to just copy that shared code into the Optimizer for now (which might lead to inconsistencies that can result in bugs when moving over to a shared library). |
8835c9a
to
3b31720
Compare
The Go optimizer (powering the Google Cache) and the NodeJS optimizer handle amp-stories differently. The NodeJS optimizer behaves the way the Go version states its future intent: https://github.com/ampproject/amppackager/blob/c9993b8ac4d17d1f05d3a1289956dadf3f9c370a/transformer/transformers/serversiderendering.go#L149 With NodeJS, the presence of an I'd suggest going with the NodeJS version in this case. @westonruter, @sebastianbenz thoughts? |
Having clean separation with different directories for the various packages makes sense to me. At a future point we can split them into separate repos (for those which aren't left in place) when we make the packages available. It will be easier now to keep everything in one repo while we iterate. |
@schlessera re amp-story: the Go version is correct. I've filed ampproject/amp-toolbox#567 for Optimizer. |
Alright. I'll go with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't finished my review but I wanted to submit what I've seen so far.
src/Dom/Document.php
Outdated
/** | ||
* Restore the emoji AMP symbol (⚡) from its pure text placeholder. | ||
* | ||
* @param string $html HTML string to restore the AMP emoji symbol in. | ||
* @return string Adapted HTML string. | ||
*/ | ||
private function restore_amp_emoji_attribute( $html ) { | ||
return preg_replace( '/(<html [^>]*?)' . preg_quote( self::EMOJI_AMP_ATTRIBUTE, '/' ) . '="([^"]*)"/i', '\1⚡\2', $html, 1 ); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is restore_amp_emoji_attribute
needed? We should let the post-processor be adding the amp
attribute, as it is currently doing. Additionally, in the case for when there is excessive CSS on a site in the Standard template mode, we serve an AMP document without the amp
attribute. So that should be retained.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to build this so that we can use the language-independent testsuite provided at https://github.com/ampproject/amp-toolbox/tree/master/packages/optimizer/spec/transformers/valid
We can of course either ignore these tests, or modify them on-the-fly to adapt them to our specific requirements. But I thought it might be worthwhile to stick as closely to the testable spec as possible within the PHP libraries...?
src/Dom/Document.php
Outdated
/** | ||
* Covert the emoji AMP symbol (⚡) into pure text. | ||
* | ||
* The emoji symbol gets stripped by DOMDocument::loadHTML(). | ||
* | ||
* @param string $source Source HTML string to convert the emoji AMP symbol in. | ||
* @return string Adapted source HTML string. | ||
*/ | ||
private function convert_amp_emoji_attribute( $source ) { | ||
return preg_replace( '/(<html [^>]*?)⚡([^\s^>]*)/i', '\1' . self::EMOJI_AMP_ATTRIBUTE . '="\2"', $source, 1 ); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think convert_amp_emoji_attribute
should just remove the attribute altogether instead of replacing it with self::EMOJI_AMP_ATTRIBUTE
. We will add the amp
attribute during post-processing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As stated in #4019 (comment), the goal is to be able to use the language-independent test suite.
const ATTRIBUTES_STRING = 'Cannot remove boilerplate as either heights, media or sizes attribute is set: '; | ||
const RENDER_DELAYING_SCRIPT_STRING = 'Cannot remove boilerplate because the document contains a render-delaying extension: '; | ||
const AMP_AUDIO_STRING = 'Cannot remove boilerplate because the document contains an extension that needs to know the dimensions of the browser: '; | ||
const UNSUPPORTED_LAYOUT_STRING = 'Cannot remove boilerplate because of an unsupported layout: '; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These messages should perhaps be translatable, as we'd want to show them in the validated URL screen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're not in WordPress here, so we need to discuss how to actually implement translatable strings. We can use regular gettext, but we might need a way to properly map everything.
const INVALID_INPUT_WIDTH = 'Cannot perform serverside rendering, invalid input width: '; | ||
const INVALID_INPUT_HEIGHT = 'Cannot perform serverside rendering, invalid input height: '; | ||
const UNSUPPORTED_LAYOUT = 'Cannot perform serverside rendering, unsupported layout: '; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be translatable because we'd want to show them on the validated URL screen.
optimizer/Configuration.php
Outdated
/** | ||
* Check whether the configuration has a given setting. | ||
* | ||
* @param string $key Configuration key to look for. | ||
* @return bool Whether the requested configuration key was found or not. | ||
*/ | ||
public function has($key) | ||
{ | ||
return array_key_exists($key, $this->configuration); | ||
} | ||
|
||
/** | ||
* Get the value for a given key from the configuration. | ||
* | ||
* @param string $key Configuration key to get the value for. | ||
* @return mixed Configuration value for the requested key. | ||
* @throws UnknownConfigurationKey If the key was not found. | ||
*/ | ||
public function get($key) | ||
{ | ||
if (! array_key_exists($key, $this->configuration)) { | ||
throw UnknownConfigurationKey::fromKey($key); | ||
} | ||
|
||
return $this->configuration[$key]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this class implement ArrayAccess
or Traversable
or the like?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ArrayAccess
would mean we make the configuration mutable, or at leat add an offsetSet()
that throws an exception. I don't really see what the benefit would be.
For Traversable
, I can see us implement that if we have an actual use case where we need to iterate over the configuration. Right now, this is not needed, so adding it just for the sake of it seems premature.
private function hasAncestorWithTag(DOMElement $element, $tagName) | ||
{ | ||
$parent = $element->parentNode; | ||
while ($parent !== null) { | ||
if ($parent instanceof DOMElement && $parent->tagName === $tagName) { | ||
return true; | ||
} | ||
$parent = $parent->parentNode; | ||
} | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this method is going to be called a lot, perhaps it should be optimized better? When collecting the amp_elements
can we omit the ones that are inside of template
elements? Perhaps something to add to the Document
class? For example:
case 'templateless_amp_elements':
$this->templateless_amp_elements = $this->xpath->query( ".//*[ starts-with( name(), 'amp-' ) and not( ancestor::template/ancestor::* = current() ) ]", $this->body );
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can benchmark and optimize this in a subsequent PR.
// If textnode is not JSON parsable, then not used. | ||
$experiment = json_decode($json, /*$assoc*/ true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the value is null
? Should this not actually use json_last_error()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the value is null
, we'll get null
as a result, which is still a valid reason to skip.
} | ||
|
||
// If JSON is empty, then not used. | ||
if (count($experiment) === 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this not also check if it is a countable?
if (count($experiment) === 0) { | |
if (!is_array($experiment) || count($experiment) === 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, just checking for empty($experiment)
should cover all cases correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, is an experiment valid if it is just a scalar like 0
? I don't think so...
This commit moves code around so that all code to be externalized lands in `lib/<package>`. These are meant to be separate roots, so they have their own Composer file and testing setup (a simple one for now). These "external" packages are then pulled in via Composer using `"path"` repositories with relative URIs. To work on an individual package: ``` cd lib/<package> composer install composer test ```
…/amp-wp into add/958-amp-optimizer
/** | ||
* Always ensure we have a <style amp-boilerplate> tag. | ||
*/ | ||
protected function ensure_boilerplate_is_present() { | ||
$style = $this->dom->xpath->query( './style[ @amp-boilerplate ]', $this->dom->head )->item( 0 ); | ||
|
||
if ( ! $style ) { | ||
$style = $this->dom->createElement( Tag::STYLE ); | ||
$style->setAttribute( Attribute::AMP_BOILERPLATE, '' ); | ||
$style->appendChild( $this->dom->createTextNode( amp_get_boilerplate_stylesheets()[0] ) ); | ||
} else { | ||
$style->parentNode->removeChild( $style ); // So we can move it. | ||
} | ||
|
||
$this->dom->head->appendChild( $style ); | ||
|
||
$noscript = $this->dom->xpath->query( './noscript[ style[ @amp-boilerplate ] ]', $this->dom->head )->item( 0 ); | ||
|
||
if ( ! $noscript ) { | ||
$noscript = $this->dom->createElement( Tag::NOSCRIPT ); | ||
$style = $this->dom->createElement( Tag::STYLE ); | ||
$style->setAttribute( Attribute::AMP_BOILERPLATE, '' ); | ||
$style->appendChild( $this->dom->createTextNode( amp_get_boilerplate_stylesheets()[1] ) ); | ||
$noscript->appendChild( $style ); | ||
} else { | ||
$noscript->parentNode->removeChild( $noscript ); // So we can move it. | ||
} | ||
|
||
$this->dom->head->appendChild( $noscript ); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't really make sense to be part of the Meta sanitizer, if “Meta” is referring to meta
tags, but we can move it into a separate sanitizer later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or rather, maybe this sanitizer should be renamed to be something like AMP_Ensure_Required_Markup_Sanitizer
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Phew! 😅
Sorry for leaving you so much work left undone, I didn't get nearly as much time yesterday to work on this as expected. |
You've done enough 😄 |
Summary
Adds an AMP Optimizer library to generate Optimized (aka Transformed) AMP pages from regular AMP pages.
This PR does the following:
amp/common
package (pulled-in via a Composerpath
repository for now)amp/optimizer
package (pulled-in via apath
repository for now)AmpBoilerplate
AmpRuntimeCss
ReorderHead
ServerSideRendering
TransformedIdentifier
amp/optimizer
library to theprepare_response()
to optimize our Amp outputAmpSchemaOrgMetadata
to test and demonstrate extensibilityamp/common
package where appropriateFixes #958
Fixes #3282
Depends on:
Checklist