Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite pug-code-gen as an babel AST generator #3019

Open
wants to merge 24 commits into
base: master
Choose a base branch
from
Open

Rewrite pug-code-gen as an babel AST generator #3019

wants to merge 24 commits into from

Conversation

jeromew
Copy link
Contributor

@jeromew jeromew commented Jun 1, 2018

cf #2708

please consider this a first working draft version. It passes all tests but may appear a bit rough in some parts.

Main difficulties were :

  • parsing expressions and arguments
  • mixing of buffered & unbuffered code with dangling blocks

I do not know very well the babel tools so I was more in a mode "I need to get this working". Maybe you will be able to see how we can simplify some parts.

In this version I have not tried to re-use already parsed AST. I do not know where we're at on this.

you will wonder why I used generators in some interim wrappers. These can be moved to properties.
Please keep in mind that my objective was to be able to inherit easily from code-gen to create then-pug.
You can see the implementation of the then-pug code-gen here to see how easy it is to implement the then-pug version: https://github.com/jeromew/pug/blob/master/packages/then-pug/lib/pug-code-gen.js

])
},
ast_return: function() {
return [t.emptyStatement(), t.returnStatement(t.identifier('pug_html'))];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't need the emptyStatement. The ; was there originally because I wasn't confident that the code for the template would have been ended in ; or newline. Since it's all a babel ast, we can drop it now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes i was trying to reach source parity when i developed this. The point was to have all the tests pass.

Copy link
Member

@ForbesLindesay ForbesLindesay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an awesome start. One thing I'd like to try and do in this refactor though: instead of each "visitor" pushing into this.ast, lets try to have them return an array of ast nodes. That way we won't need to mess around with all the replaceAST things, which I think add a lot of complication.

This would make doing the "buffer" and "bufferExpression" methods a little harder. I think the answer might be to remove all the optimisation from those two methods. Instead, we could run a separate "optimisation pass" on the array of statements returned by visitors. This optimisation pass could swap 'some' + 'string' for 'somestring' and swap x = x + 'foo';x = x + v; for x = x + 'foo' + v;. That optimiser would be a separate module, which would greatly simplify pug-code-gen.

As for re-using already parsed ASTs, lets leave that off this initial change. Once we are happy that this is stable, we can start thinking about that optimisation.

lit.extra = { rawValue: lit.value, raw: stringify(lit.value) };
return lit;
},
wrapCallExpression: function(node) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intended to act as a plugin hook at some point?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sort of. This is one of the main inheritance point for then-pug

Compiler.prototype.wrapCallExpression = function(node) {
  return t.yieldExpression(node, true);
}

@jeromew
Copy link
Contributor Author

jeromew commented Jun 4, 2018

how do you want this to proceed ? I agree that simplifying the buffer concatenations should be extracted down the chain in an optimisation pass. Maybe we kind find other optimization too. I am not totally sure how to do that elegantly though (not familiar enough with babel)

As you said, it could be better to have these kind of AST returning visitors. I don't like the in-place ast replacing either. This in-place replacement came out of the way the code was organised before. I have no idea if such a rewrite would lead to a dead-end or not.

do you want me to commit additional modifications on this branch (this way we would not lose the successive versions) or to squash the babel branch after a bit more work ?

@ForbesLindesay
Copy link
Member

I think we should proceed in this pull request for now. I don't want to merge it into master until it's almost ready to go live.

@jeromew
Copy link
Contributor Author

jeromew commented Jun 6, 2018

@ForbesLindesay I refactored the code to have visitors return AST (instead of the replaceAST dance). The whole thing seem to be 2x slower (probably because of the many concat that I have used, but also because the removal of the buffer optimizations lead to longer ASTs).

I did not try to improve this with push or other methods. What do you think ? concat is maybe a wrong idea here.

note: maybe this plugin https://github.com/laysent/babel-plugin-transform-string-join is a good start for what we want to achieve for the buffer optimization. It's not what we need but the idea looks close.

@jeromew
Copy link
Contributor Author

jeromew commented Jun 7, 2018

@ForbesLindesay I could add a babel plugin (first babel plugin for me so I hope it is correct) to add the pug_html = pug_html + ... compaction back. This speeds things up a bit though it is still slower than the original code gen. This is probably due to the many AST nodes we are building compared to string source concatenation but there may be some optimization that I am missing.

I'll wait for your review to see where this can go.

@ForbesLindesay
Copy link
Member

Thanks for doing all this. I would definitely expect it to be a little slower. Have you benchmarked the compile times separately to the render times? With the babel plugin optimisation, I would hope to see render times remain pretty much exactly the same. I'm not sure how much of a slowdown is ok for the code-gen.

@jeromew
Copy link
Contributor Author

jeromew commented Jun 7, 2018

no i did not measure the timing of compile vs render. Is there a script that does this already ?
I expect the compiled code to be the same so yes render time should stay in the same ballpark as before.
Regarding compile time, I remember you said somewhere that AST for expressions was already parsed earlier in the chain. Is the AST already passed along ? we could avoid parsing it twice and probably have a time boost.

@ForbesLindesay
Copy link
Member

Yes, they are parsed earlier. Anywhere you see assertExpression or isExpression in https://github.com/pugjs/pug/blob/master/packages/pug-lexer/index.js it is using babel to parse the expression, but we're not currently capturing the result. You could try updating that, which would definitely help.

I can't remember where we have any benchmarks, I'm sure some have been written over the years, but I've been super bad at keeping track of them.

@jeromew
Copy link
Contributor Author

jeromew commented Jun 7, 2018

regarding the compile time, it's always better when it's fast for big code bases + as far as i am concerned I use a sort JIT mixin compiler during development with then-pug on the first run of pages so I am interested in it beeing fast.

Do you mind if I replace is-expression (which uses acorn) by the parseExpression babylon method ?

@jeromew
Copy link
Contributor Author

jeromew commented Jun 7, 2018

@ForbesLindesay I started the lexer-->code-gen short path for AST. Tell me what you think regarding this.

should pug-code-gen expect the AST or should it keep both the AST path and the old code path ? In the examples I commited I kept both code path (allowing, maybe to be more plugins friendly if a plugin wants to generate its own non ast'ied tokens)

@ForbesLindesay
Copy link
Member

Do you mind if I replace is-expression (which uses acorn) by the parseExpression babylon method ?

That would be great, please go ahead.

@ForbesLindesay
Copy link
Member

I think we should allow for a plugin that still uses strings.

Copy link
Member

@ForbesLindesay ForbesLindesay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the code, I think you have a good approach. I'd be very interested to know how much difference it makes :)

},

assertExpression: function (exp, noThrow) {
//this verifies that a JavaScript expression is valid
try {
this.callLexerFunction('isExpression', exp);
return true;
return this.callLexerFunction('parseExpression', exp);
} catch (ex) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need to update the error handling code. Specifically the comment saying:

not coming from acorn

is definitely out of date now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i removed the message. The rest of the code is working.

if (!code.buffer) this.buf.push('{');
this.visit(code.block, code);
if (!code.buffer) this.buf.push('}');
var val = code.val.trim();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to think some more about how we handle unbuffered code blocks, I suspect this may be a big part of where time is spent parsing (we'd need data to confirm though).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was 1 year ago but I remember that I had difficulties with an unbuffered code block as a child of an unbuffered code block + and unmatched braces

- if (true) {
  - a = 3
- } else {
- }

not implementing this was faster from what I remember but it currently works and is untested in pugjs.

@jeromew
Copy link
Contributor Author

jeromew commented Jun 8, 2018

I did a small benchmark using

mixin a()
  div b
div
  div hello
+a()
+a()

original pugjs
compile x 785 ops/sec ±19.97% (76 runs sampled)
compile debug x 870 ops/sec ±2.32% (78 runs sampled)
render precompiled x 2,992,551 ops/sec ±2.27% (71 runs sampled)

AST pugjs
compile x 58.21 ops/sec ±8.89% (60 runs sampled)
compile debug x 54.01 ops/sec ±5.03% (54 runs sampled)
render precompiled x 2,966,531 ops/sec ±2.50% (70 runs sampled)

so we have the same numbers for the rendering of a precompiled template (this is normal since the code is supposed to be the same)

but an easy 10x slowing on the compile operation for this template :-(
I don't know what to do with this.

@ForbesLindesay
Copy link
Member

Hmm, that compile time performance is disappointing, but not necessarily super surprising. I wonder how this scales. If you were to use a very big template, would it still be 10x slower (which would be a problem), or is it more of a constant overhead (which may be manageable).

The other big thing we should probably try to answer is, which stage is slow? Is it all the extra parsing of JS expressions? Is it the somewhat complex handling of unbuffered JS Code? Is it the AST transform optimisations to pre-concatenate the strings? Is it the part where babel actually generates JavaScript code?

If we can figure out where most of the time is spent, we can look at ways to improve that aspect (or failing that, cache it intelligently).

@jeromew
Copy link
Contributor Author

jeromew commented Jun 15, 2018

Yes I agree that we have to look at the moving parts to see understand the timing differences

  • acorn vs babel for expression checking/parsing in lexer
  • with vs babel-plugin-transform-with in code-gen
  • babel-plugin for buf = buf + ... compaction
  • AST to source generation
  • and other things

overall when you look at https://travis-ci.org/pugjs/pug/builds, it looks to be bounded by a 2x slowdown for the whole test suite which could be acceptable maybe.

what do we win with the AST-first approach ?

  • source-maps may not be too far
  • avoid source->AST parsing when different outputs are needed
  • transpiling of AST to other languages
  • easy to inherit in order to create then-pug
  • ..

I'll spend some it on it in the following days.

@TimothyGu as I understand it you have good knowledge of optimisation issues and benchmarks with pugjs. I'll appreciate If you have some advice on this. The whole point of the babel branch was to rewrite pug-code-gen in an AST-first way but we now need to see what could be the next steps and if the win / losses ratio is sufficient to justify such a refactor.

@jeromew
Copy link
Contributor Author

jeromew commented Jun 15, 2018

regarding the lexer, the speed should have improved after the replacement of is-expression by babylon parseExpression.

babylon.parseExpression x 15,762 ops/sec ±21.90% (73 runs sampled)
is-expression x 10,371 ops/sec ±17.09% (79 runs sampled)
Fastest is babylon.parseExpression

I will test the lexer to confirm this.

var Benchmark = require('benchmark');
var suite = new Benchmark.Suite;

var babylon = require('babylon');
var isExpression = require('is-expression');

var expr = [`a`, `a+b`, 'true', '2+b', '(true||a)&&b']

suite
.add('babylon.parseExpression', function() {
  expr.forEach(function(e) {
    babylon.parseExpression(e)
  });
})
.add('is-expression', function() {
  expr.forEach(function(e) {
    isExpression(e)
  });
})
.on('cycle', function(event) {
  console.log(String(event.target));
})
.on('complete', function() {
  console.log('Fastest is ' + this.filter('fastest').map('name'));
})
// run async
.run({ 'async': true });`

@jeromew
Copy link
Contributor Author

jeromew commented Jun 15, 2018

The lexer indeed seems a little faster on the babel branch

lexer on master branch x 1,265 ops/sec ±10.00% (76 runs sampled)
lexer on babel branch x 1,335 ops/sec ±6.72% (79 runs sampled)
Fastest is lexer on babel branch,lexer on master branch

The difference is not very important, probably because the template used does not have many expressions so the time spent in isExpression or parseExpression does not have a huge impact.

nevertheless, babylon.parseExpression seems faster and the babel branch version has cached a full AST version of the expressions.

var Benchmark = require('benchmark');
var suite = new Benchmark.Suite;

var puga = require('./pug-master/packages/pug-lexer');
var pugb= require('./pug-babel/packages/pug-lexer');

var tpl = `
mixin a()
  div b
div
  div hello
+a()
+a()
if a+2>7
  div #{var1}
else
  div #{var2}
`

suite
.add('lexer on master branch', function() {
  puga(tpl)
})
.add('lexer on babel branch', function() {
  pugb(tpl)
})
.on('cycle', function(event) {
  console.log(String(event.target));
})
.on('complete', function() {
  console.log('Fastest is ' + this.filter('fastest').map('name'));
})
// run async
.run({ 'async': true });

@jeromew
Copy link
Contributor Author

jeromew commented Jun 15, 2018

The parser, which is nearly identical in master and babel branch, do not have a noticeable impact.

parser on master branch x 1,087 ops/sec ±8.22% (76 runs sampled)
parser on babel branch x 1,116 ops/sec ±5.68% (78 runs sampled)
Fastest is parser on babel branch

var Benchmark = require('benchmark');
var suite = new Benchmark.Suite;

var master_lexer = require('./pug-master/packages/pug-lexer');
var master_parser = require('./pug-master/packages/pug-parser');

var babel_lexer = require('./pug-babel/packages/pug-lexer');
var babel_parser = require('./pug-babel/packages/pug-parser');
var tpl = `
mixin a()
  div b
div
  div hello
+a()
+a()
if a+2>7
  div #{var1}
else
  div #{var2}
`

suite
.add('parser on master branch', function() {
  master_parser(master_lexer(tpl))
})
.add('parser on babel branch', function() {
  babel_parser(babel_lexer(tpl))
})
.on('cycle', function(event) {
  console.log(String(event.target));
})
.on('complete', function() {
  console.log('Fastest is ' + this.filter('fastest').map('name'));
})
.run({ 'async': true });

@jeromew
Copy link
Contributor Author

jeromew commented Jun 15, 2018

I tried to compare with and the babel-plugin-transform-with solution.

knowing that

  • with takes a src as input so it needs to parse it everytime
  • babel.transformFromAst is configured to not build the code str

the test results end up well in favour of with. @ForbesLindesay maybe you have some insight on why would that be and how we could improve things. Is the babel plugin doing more work ? It seems hard to believe that it is slower : it does not have to output code and starts with a pre-parsed AST while with has to parse the source for every cycle.

addWith x 5,938 ops/sec ±24.56% (70 runs sampled)
babel-plugin-transform-with x 2,626 ops/sec ±2.40% (78 runs sampled)
Fastest is addWith

var Benchmark = require('benchmark');
var suite = new Benchmark.Suite;

var addWith = require('with');
var babylon = require('babylon');
var babel = require('babel-core');
var babelPluginTransformWith = require('babel-plugin-transform-with');

var src = `
if (var1 > 2) {
  pug_escape(var2);
}
`
var globals = ['pug_escape'];
var tpl = "// @with exclude: "+ globals.join(",") +"\n{\nlocals || {};\n"+src+"}"
var ast = babylon.parse(tpl);

suite
.add('addWith', function() {
  var str = addWith('locals', src, ['pug_escape']);
})
.add('babel-plugin-transform-with', function() {
  var str = babel.transformFromAst(ast, null, {
      code: false,
      plugins: [
        babelPluginTransformWith
      ]
    });
})
.on('cycle', function(event) {
  console.log(String(event.target));
})
.on('complete', function() {
  console.log('Fastest is ' + this.filter('fastest').map('name'));
})
.run({ 'async': true });

@TimothyGu
Copy link
Member

@jeromew babel-traverse (which Babel plugins use) is pretty heavyweight with its Path API and thus usually much slower.

@jeromew
Copy link
Contributor Author

jeromew commented Jun 18, 2018

@ForbesLindesay I commited a modification of the way I was building the AST for the babel-plugin-transform-with that I identified as a performance hog (I guess that the babel-template module was doing more than I thought).

On my machine, I went from
PASS packages/pug/test/run.test.js (53.223s)
to
PASS packages/pug/test/run.test.js (21.788s)

when the master branch is at
PASS packages/pug/test/run.test.js (7.143s)

so this is a big win.


Now I benchmarked a simple div to find an additionnal bottleneck. The babel branch now seems faster than the master branch except for 1 thing that makes the perf go down : AST transformation

var w = babel.transformFromAst(file, null, {
      code: false,
      plugins: plugins
    });

basically, for a simple div template I have

compile master x 2,915 ops/sec ±8.35% (75 runs sampled)
compile babel x 268 ops/sec ±9.95% (66 runs sampled)

when both plugins are activated (compaction & addWith)

compile master x 3,001 ops/sec ±9.66% (78 runs sampled)
compile babel x 394 ops/sec ±6.18% (69 runs sampled)

when transformation is applied, but with no plugins at all so the plugin execution in itself does not seem to have a huge impact.

compile master x 2,937 ops/sec ±9.89% (76 runs sampled)
compile babel x 1,503 ops/sec ±6.70% (81 runs sampled)

when no transformation is applied (transformFromAst is not called)

compile master x 2,992 ops/sec ±10.20% (76 runs sampled)
compile babel x 3,026 ops/sec ±3.46% (80 runs sampled)

when no transformation is applied and code generation is removed

do you know if there is a real world pug application on github that I could use as a benchmark for a real world usage ?

I'll dig a bit more but any ideas are welcome !

@ForbesLindesay
Copy link
Member

https://github.com/esdiscuss/esdiscuss.org is a real world application using pug. That improvement is fantastic. I think you're correct that the babel-template module is pretty slow. I wonder if we could write our own ast transformation without using babel itself and get things running faster, given that the transformFromAst function is such a bottle neck. This may now be fast enough to consider merging though. The improvement is huge.

@jeromew
Copy link
Contributor Author

jeromew commented Jun 18, 2018

I am trying to understand why transformFromAst with 0 plugins is so heavyweight. The AST is probably beeing traversed even when no plugin is mentioned.

manually modifying the AST for the compaction should be relatively easy. The addWith case is a bit more work I suppose. Do you think it would make sense to modify the with module to output AST ?

@jeromew
Copy link
Contributor Author

jeromew commented Jun 18, 2018

transformFromAst seem to be invoking a default plugin block-hoist-plugin which is maybe the reason why transformFromAst slows things down even without any user plugin listed.

the cost of transformFromAst is important, but increasing the list of plugins invoked does not have such an impact. I feel like there will always be a transformFromAst down the line because people will want to apply such and such plugin (isn't it the reason an AST-first approach would be nice ?).

Maybe trying babel 7 is worth a try to see what happens. Maybe there are performance improvements that can have an impact here.

@jeromew
Copy link
Contributor Author

jeromew commented Jun 18, 2018

@ForbesLindesay on my machine

compile all esdiscuss.org/views templates with master branch x 2.58 ops/sec ±7.94% (11 runs sampled)
compile all esdiscuss.org/views templates with babel branch x 0.24 ops/sec ±4.43% (5 runs sampled)
Fastest is compile all esdiscuss.org/views templates with master branch

If I leave the transformation, but without any plugin, I get

compile all esdiscuss.org/views templates with master branch x 2.59 ops/sec ±7.38% (11 runs sampled)
compile all esdiscuss.org/views templates with babel branch x 0.65 ops/sec ±5.67% (6 runs sampled)

if I remove the transformation (no call to transformFromAst), I get

compile all esdiscuss.org/views templates with master branch x 2.60 ops/sec ±8.38% (11 runs sampled)
compile all esdiscuss.org/views templates with babel branch x 1.65 ops/sec ±7.17% (9 runs sampled)
Fastest is compile all esdiscuss.org/views templates with master branch

var Benchmark = require('benchmark');
var suite = new Benchmark.Suite;
var fs = require('fs');

var puga = require('./pug-master/packages/pug');
var pugb= require('./pug-babel/packages/pug');


var dir = './esdiscuss.org/views/'

var tpl = fs.readdirSync(dir).filter(function(file){
    return ~file.indexOf('.pug');
  })

suite
.add('compile all esdiscuss.org/views templates with master branch', function() {
  tpl.forEach(function(file) {
     puga.compileFile(dir+file)
  })
})
.add('compile all esdiscuss.org/views templates with babel branch', function() {
  tpl.forEach(function(file) {
     pugb.compileFile(dir+file)
  })
})
.on('cycle', function(event) {
  console.log(String(event.target));
})
.on('complete', function() {
  console.log('Fastest is ' + this.filter('fastest').map('name'));
})
.run({ 'async': true });

@rollingversions
Copy link

rollingversions bot commented May 1, 2020

pug-code-gen (3.0.1 → 3.1.0)

New Features

  • Rewrite pug-code-gen as an babel AST generator

Packages With No Changes

The following packages have no user facing changes, so won't be released:

  • pug
  • pug-attrs
  • pug-error
  • pug-filters
  • pug-lexer
  • pug-linker
  • pug-load
  • pug-parser
  • pug-runtime
  • pug-strip-comments
  • pug-walk

Edit changelogs

@jeromew
Copy link
Contributor Author

jeromew commented May 27, 2020

@ForbesLindesay I still have this branch in mind. Can you share your thoughts on this ? Do you still think it would be a good idea to have a native AST version of the generator or do think it is better to stay with the string based version.

@jeromew
Copy link
Contributor Author

jeromew commented Jun 15, 2020

@ForbesLindesay for info I rebased the AST code-gen PR on master & upgraded it to babel7. Please tell me what I could do in your opinion to go forward and reach an acceptable state.

I understand that this PR has way too many commits now and that it needs a better organisation for review. Ideas welcome.

@jeromew
Copy link
Contributor Author

jeromew commented Jun 16, 2020

@ForbesLindesay for info I added several performance improvements to the AST version of pug-code-gen. Right now a flamegraph on a compilation gives me the following relative times :

1/8 : generation of the code after the AST is built
1/8 : post processing to concatenate duplicates of pug_html = pug_html + ...
1/8 : with
2/8 : AST building (visit)
3/8 : lexer/parser

I would be happy to discuss this if you can find some time.

@jeromew
Copy link
Contributor Author

jeromew commented Jun 17, 2020

@ForbesLindesay I found some other improvements by adding some AST JSON caching for frequent small strings) that are not commited. They improve things but are not game changers regarding performance.

On another aspect, there is probably a big improvement to be found by rewriting the attribute building to make more AST aware (it currently mainly builds the attribute string and is reparsed). I think this is a performance issue, because, we invoke many things (isConstant, toConstant, parseExpression, ..) on things that have already been parsed at the lexer level. The modification would need quite some work so I would prefer a discussion and feedback on this PR first if you can find some time for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants