Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AMP Optimizer library #4019

Merged
merged 153 commits into from
Mar 7, 2020
Merged

Add AMP Optimizer library #4019

merged 153 commits into from
Mar 7, 2020

Conversation

schlessera
Copy link
Collaborator

@schlessera schlessera commented Jan 2, 2020

Summary

Adds an AMP Optimizer library to generate Optimized (aka Transformed) AMP pages from regular AMP pages.

This PR does the following:

  • extracts common code out into an amp/common package (pulled-in via a Composer path repository for now)
  • adds a new optimizer transformation engine in an amp/optimizer package (pulled-in via a path repository for now)
  • adds the following transformers that were suggested as a first step in the Amp SSR for WP discovery document:
    • AmpBoilerplate
    • AmpRuntimeCss
    • ReorderHead
    • ServerSideRendering
    • TransformedIdentifier
  • hooks up the amp/optimizer library to the prepare_response() to optimize our Amp output
  • adds a WP-specific transformer AmpSchemaOrgMetadata to test and demonstrate extensibility
  • adapts all plugin code to use the new amp/common package where appropriate
  • adds direct unit tests for all new code
  • adds "spec" tests that pull in the NodeJs test suite and run it against our PHP implementation
  • adds a basic Travis setup to run the tests for both new packages

Fixes #958
Fixes #3282

Depends on:

Checklist

  • My pull request is addressing an open issue (please create one otherwise).
  • My code is tested and passes existing tests.
  • My code follows the Engineering Guidelines (updates are often made to the guidelines, check it out periodically).

@googlebot googlebot added the cla: yes Signed the Google CLA label Jan 2, 2020
@schlessera
Copy link
Collaborator Author

@westonruter We already discussed having not only a PSR-2 package for the optimizer and a WPCS package for the plugin, but also a shared package with common code.

The first optimizer I'm working on (ServerSideRendering) already has lots of this shared code in the form of the parseLayout functionality, the CssLengthclass, etc...

@pierlon Already wrote a PHP version of most of this, so writing it again for the Optimizer doesn't make sense. The Optimizer also shouldn't have the plugin package as a dependency.

The clean solution is to extract it out into this third shared package (naming to be discussed). However, this would mean for now (while we have everything in one repo) that I'm adding a third namespace root already via this PR (which is already adding the second optimizer one).

Should we go for "clean separation" for now and later deal with the many subfolders either through separate repos or lerna, or do you want me to just copy that shared code into the Optimizer for now (which might lead to inconsistencies that can result in bugs when moving over to a shared library).

@schlessera
Copy link
Collaborator Author

The Go optimizer (powering the Google Cache) and the NodeJS optimizer handle amp-stories differently. The NodeJS optimizer behaves the way the Go version states its future intent: https://github.com/ampproject/amppackager/blob/c9993b8ac4d17d1f05d3a1289956dadf3f9c370a/transformer/transformers/serversiderendering.go#L149

With NodeJS, the presence of an <amp-story> does not hinder removal of the boilerplate. This is the case with the Go version, but the todo states this shouldn't be the case and needs to be changed.

I'd suggest going with the NodeJS version in this case.

@westonruter, @sebastianbenz thoughts?

@westonruter
Copy link
Member

Should we go for "clean separation" for now and later deal with the many subfolders either through separate repos or lerna

Having clean separation with different directories for the various packages makes sense to me. At a future point we can split them into separate repos (for those which aren't left in place) when we make the packages available. It will be easier now to keep everything in one repo while we iterate.

@sebastianbenz
Copy link

@schlessera re amp-story: the Go version is correct. I've filed ampproject/amp-toolbox#567 for Optimizer.

@schlessera
Copy link
Collaborator Author

Having clean separation with different directories for the various packages makes sense to me.

Alright. I'll go with common for now, and am happy to accept any better name later down the road.

src/Dom/Document.php Outdated Show resolved Hide resolved
Copy link
Member

@westonruter westonruter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't finished my review but I wanted to submit what I've seen so far.

Comment on lines 993 to 1001
/**
* Restore the emoji AMP symbol (⚡) from its pure text placeholder.
*
* @param string $html HTML string to restore the AMP emoji symbol in.
* @return string Adapted HTML string.
*/
private function restore_amp_emoji_attribute( $html ) {
return preg_replace( '/(<html [^>]*?)' . preg_quote( self::EMOJI_AMP_ATTRIBUTE, '/' ) . '="([^"]*)"/i', '\1⚡\2', $html, 1 );
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is restore_amp_emoji_attribute needed? We should let the post-processor be adding the amp attribute, as it is currently doing. Additionally, in the case for when there is excessive CSS on a site in the Standard template mode, we serve an AMP document without the amp attribute. So that should be retained.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to build this so that we can use the language-independent testsuite provided at https://github.com/ampproject/amp-toolbox/tree/master/packages/optimizer/spec/transformers/valid

We can of course either ignore these tests, or modify them on-the-fly to adapt them to our specific requirements. But I thought it might be worthwhile to stick as closely to the testable spec as possible within the PHP libraries...?

phpcs.xml Outdated Show resolved Hide resolved
src/Dom/Document.php Outdated Show resolved Hide resolved
Comment on lines 981 to 991
/**
* Covert the emoji AMP symbol (⚡) into pure text.
*
* The emoji symbol gets stripped by DOMDocument::loadHTML().
*
* @param string $source Source HTML string to convert the emoji AMP symbol in.
* @return string Adapted source HTML string.
*/
private function convert_amp_emoji_attribute( $source ) {
return preg_replace( '/(<html [^>]*?)⚡([^\s^>]*)/i', '\1' . self::EMOJI_AMP_ATTRIBUTE . '="\2"', $source, 1 );
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think convert_amp_emoji_attribute should just remove the attribute altogether instead of replacing it with self::EMOJI_AMP_ATTRIBUTE. We will add the amp attribute during post-processing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As stated in #4019 (comment), the goal is to be able to use the language-independent test suite.

optimizer/Transformer/ServerSideRendering.php Outdated Show resolved Hide resolved
optimizer/Error/CannotRemoveBoilerplate.php Outdated Show resolved Hide resolved
optimizer/Error/ErrorProperties.php Outdated Show resolved Hide resolved
Comment on lines 20 to 23
const ATTRIBUTES_STRING = 'Cannot remove boilerplate as either heights, media or sizes attribute is set: ';
const RENDER_DELAYING_SCRIPT_STRING = 'Cannot remove boilerplate because the document contains a render-delaying extension: ';
const AMP_AUDIO_STRING = 'Cannot remove boilerplate because the document contains an extension that needs to know the dimensions of the browser: ';
const UNSUPPORTED_LAYOUT_STRING = 'Cannot remove boilerplate because of an unsupported layout: ';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These messages should perhaps be translatable, as we'd want to show them in the validated URL screen.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not in WordPress here, so we need to discuss how to actually implement translatable strings. We can use regular gettext, but we might need a way to properly map everything.

Comment on lines 20 to 22
const INVALID_INPUT_WIDTH = 'Cannot perform serverside rendering, invalid input width: ';
const INVALID_INPUT_HEIGHT = 'Cannot perform serverside rendering, invalid input height: ';
const UNSUPPORTED_LAYOUT = 'Cannot perform serverside rendering, unsupported layout: ';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be translatable because we'd want to show them on the validated URL screen.

Comment on lines 96 to 121
/**
* Check whether the configuration has a given setting.
*
* @param string $key Configuration key to look for.
* @return bool Whether the requested configuration key was found or not.
*/
public function has($key)
{
return array_key_exists($key, $this->configuration);
}

/**
* Get the value for a given key from the configuration.
*
* @param string $key Configuration key to get the value for.
* @return mixed Configuration value for the requested key.
* @throws UnknownConfigurationKey If the key was not found.
*/
public function get($key)
{
if (! array_key_exists($key, $this->configuration)) {
throw UnknownConfigurationKey::fromKey($key);
}

return $this->configuration[$key];
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this class implement ArrayAccess or Traversable or the like?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArrayAccess would mean we make the configuration mutable, or at leat add an offsetSet() that throws an exception. I don't really see what the benefit would be.

For Traversable, I can see us implement that if we have an actual use case where we need to iterate over the configuration. Right now, this is not needed, so adding it just for the sake of it seems premature.

optimizer/Layout.php Outdated Show resolved Hide resolved
optimizer/Layout.php Outdated Show resolved Hide resolved
Comment on lines 478 to 488
private function hasAncestorWithTag(DOMElement $element, $tagName)
{
$parent = $element->parentNode;
while ($parent !== null) {
if ($parent instanceof DOMElement && $parent->tagName === $tagName) {
return true;
}
$parent = $parent->parentNode;
}
return false;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this method is going to be called a lot, perhaps it should be optimized better? When collecting the amp_elements can we omit the ones that are inside of template elements? Perhaps something to add to the Document class? For example:

case 'templateless_amp_elements':
    $this->templateless_amp_elements = $this->xpath->query( ".//*[ starts-with( name(), 'amp-' ) and not( ancestor::template/ancestor::* = current() ) ]", $this->body );

Props https://stackoverflow.com/a/32272565

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it turns out, current() was not yet part of XPath 1.0, which libXML only supports.

image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can benchmark and optimize this in a subsequent PR.

Comment on lines 532 to 533
// If textnode is not JSON parsable, then not used.
$experiment = json_decode($json, /*$assoc*/ true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the value is null? Should this not actually use json_last_error()?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the value is null, we'll get null as a result, which is still a valid reason to skip.

}

// If JSON is empty, then not used.
if (count($experiment) === 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not also check if it is a countable?

Suggested change
if (count($experiment) === 0) {
if (!is_array($experiment) || count($experiment) === 0) {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, just checking for empty($experiment) should cover all cases correctly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, is an experiment valid if it is just a scalar like 0? I don't think so...

This commit moves code around so that all code to be externalized lands in `lib/<package>`.

These are meant to be separate roots, so they have their own Composer file and testing setup (a simple one for now).

These "external" packages are then pulled in via Composer using `"path"` repositories with relative URIs.

To work on an individual package:
```
cd lib/<package>
composer install
composer test
```
Comment on lines 153 to 182
/**
* Always ensure we have a <style amp-boilerplate> tag.
*/
protected function ensure_boilerplate_is_present() {
$style = $this->dom->xpath->query( './style[ @amp-boilerplate ]', $this->dom->head )->item( 0 );

if ( ! $style ) {
$style = $this->dom->createElement( Tag::STYLE );
$style->setAttribute( Attribute::AMP_BOILERPLATE, '' );
$style->appendChild( $this->dom->createTextNode( amp_get_boilerplate_stylesheets()[0] ) );
} else {
$style->parentNode->removeChild( $style ); // So we can move it.
}

$this->dom->head->appendChild( $style );

$noscript = $this->dom->xpath->query( './noscript[ style[ @amp-boilerplate ] ]', $this->dom->head )->item( 0 );

if ( ! $noscript ) {
$noscript = $this->dom->createElement( Tag::NOSCRIPT );
$style = $this->dom->createElement( Tag::STYLE );
$style->setAttribute( Attribute::AMP_BOILERPLATE, '' );
$style->appendChild( $this->dom->createTextNode( amp_get_boilerplate_stylesheets()[1] ) );
$noscript->appendChild( $style );
} else {
$noscript->parentNode->removeChild( $noscript ); // So we can move it.
}

$this->dom->head->appendChild( $noscript );
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really make sense to be part of the Meta sanitizer, if “Meta” is referring to meta tags, but we can move it into a separate sanitizer later.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or rather, maybe this sanitizer should be renamed to be something like AMP_Ensure_Required_Markup_Sanitizer.

Copy link
Member

@westonruter westonruter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phew! 😅

@westonruter westonruter merged commit b6f86b6 into develop Mar 7, 2020
@westonruter westonruter deleted the add/958-amp-optimizer branch March 7, 2020 05:46
@schlessera
Copy link
Collaborator Author

Sorry for leaving you so much work left undone, I didn't get nearly as much time yesterday to work on this as expected.

@westonruter
Copy link
Member

You've done enough 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
7 participants