-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rewrite pug-code-gen as an babel AST generator #3019
base: master
Are you sure you want to change the base?
Conversation
packages/pug-code-gen/index.js
Outdated
]) | ||
}, | ||
ast_return: function() { | ||
return [t.emptyStatement(), t.returnStatement(t.identifier('pug_html'))]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't need the emptyStatement
. The ;
was there originally because I wasn't confident that the code for the template would have been ended in ;
or newline. Since it's all a babel ast, we can drop it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes i was trying to reach source parity when i developed this. The point was to have all the tests pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an awesome start. One thing I'd like to try and do in this refactor though: instead of each "visitor" pushing into this.ast
, lets try to have them return an array of ast nodes. That way we won't need to mess around with all the replaceAST
things, which I think add a lot of complication.
This would make doing the "buffer" and "bufferExpression" methods a little harder. I think the answer might be to remove all the optimisation from those two methods. Instead, we could run a separate "optimisation pass" on the array of statements returned by visitors. This optimisation pass could swap 'some' + 'string'
for 'somestring'
and swap x = x + 'foo';x = x + v;
for x = x + 'foo' + v;
. That optimiser would be a separate module, which would greatly simplify pug-code-gen.
As for re-using already parsed ASTs, lets leave that off this initial change. Once we are happy that this is stable, we can start thinking about that optimisation.
lit.extra = { rawValue: lit.value, raw: stringify(lit.value) }; | ||
return lit; | ||
}, | ||
wrapCallExpression: function(node) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intended to act as a plugin hook at some point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sort of. This is one of the main inheritance point for then-pug
Compiler.prototype.wrapCallExpression = function(node) {
return t.yieldExpression(node, true);
}
how do you want this to proceed ? I agree that simplifying the buffer concatenations should be extracted down the chain in an optimisation pass. Maybe we kind find other optimization too. I am not totally sure how to do that elegantly though (not familiar enough with babel) As you said, it could be better to have these kind of AST returning visitors. I don't like the in-place ast replacing either. This in-place replacement came out of the way the code was organised before. I have no idea if such a rewrite would lead to a dead-end or not. do you want me to commit additional modifications on this branch (this way we would not lose the successive versions) or to squash the babel branch after a bit more work ? |
I think we should proceed in this pull request for now. I don't want to merge it into master until it's almost ready to go live. |
@ForbesLindesay I refactored the code to have visitors return AST (instead of the replaceAST dance). The whole thing seem to be 2x slower (probably because of the many concat that I have used, but also because the removal of the buffer optimizations lead to longer ASTs). I did not try to improve this with push or other methods. What do you think ? concat is maybe a wrong idea here. note: maybe this plugin https://github.com/laysent/babel-plugin-transform-string-join is a good start for what we want to achieve for the buffer optimization. It's not what we need but the idea looks close. |
@ForbesLindesay I could add a babel plugin (first babel plugin for me so I hope it is correct) to add the I'll wait for your review to see where this can go. |
Thanks for doing all this. I would definitely expect it to be a little slower. Have you benchmarked the compile times separately to the render times? With the babel plugin optimisation, I would hope to see render times remain pretty much exactly the same. I'm not sure how much of a slowdown is ok for the code-gen. |
no i did not measure the timing of compile vs render. Is there a script that does this already ? |
Yes, they are parsed earlier. Anywhere you see I can't remember where we have any benchmarks, I'm sure some have been written over the years, but I've been super bad at keeping track of them. |
regarding the compile time, it's always better when it's fast for big code bases + as far as i am concerned I use a sort JIT mixin compiler during development with Do you mind if I replace |
@ForbesLindesay I started the lexer-->code-gen short path for AST. Tell me what you think regarding this. should |
That would be great, please go ahead. |
I think we should allow for a plugin that still uses strings. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at the code, I think you have a good approach. I'd be very interested to know how much difference it makes :)
}, | ||
|
||
assertExpression: function (exp, noThrow) { | ||
//this verifies that a JavaScript expression is valid | ||
try { | ||
this.callLexerFunction('isExpression', exp); | ||
return true; | ||
return this.callLexerFunction('parseExpression', exp); | ||
} catch (ex) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may need to update the error handling code. Specifically the comment saying:
not coming from acorn
is definitely out of date now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i removed the message. The rest of the code is working.
if (!code.buffer) this.buf.push('{'); | ||
this.visit(code.block, code); | ||
if (!code.buffer) this.buf.push('}'); | ||
var val = code.val.trim(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to think some more about how we handle unbuffered code blocks, I suspect this may be a big part of where time is spent parsing (we'd need data to confirm though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it was 1 year ago but I remember that I had difficulties with an unbuffered code block as a child of an unbuffered code block + and unmatched braces
- if (true) {
- a = 3
- } else {
- }
not implementing this was faster from what I remember but it currently works and is untested in pugjs.
I did a small benchmark using
original pugjs AST pugjs so we have the same numbers for the rendering of a precompiled template (this is normal since the code is supposed to be the same) but an easy 10x slowing on the compile operation for this template :-( |
Hmm, that compile time performance is disappointing, but not necessarily super surprising. I wonder how this scales. If you were to use a very big template, would it still be 10x slower (which would be a problem), or is it more of a constant overhead (which may be manageable). The other big thing we should probably try to answer is, which stage is slow? Is it all the extra parsing of JS expressions? Is it the somewhat complex handling of unbuffered JS Code? Is it the AST transform optimisations to pre-concatenate the strings? Is it the part where babel actually generates JavaScript code? If we can figure out where most of the time is spent, we can look at ways to improve that aspect (or failing that, cache it intelligently). |
Yes I agree that we have to look at the moving parts to see understand the timing differences
overall when you look at https://travis-ci.org/pugjs/pug/builds, it looks to be bounded by a 2x slowdown for the whole test suite which could be acceptable maybe. what do we win with the AST-first approach ?
I'll spend some it on it in the following days. @TimothyGu as I understand it you have good knowledge of optimisation issues and benchmarks with pugjs. I'll appreciate If you have some advice on this. The whole point of the |
regarding the lexer, the speed should have improved after the replacement of babylon.parseExpression x 15,762 ops/sec ±21.90% (73 runs sampled) I will test the lexer to confirm this. var Benchmark = require('benchmark');
var suite = new Benchmark.Suite;
var babylon = require('babylon');
var isExpression = require('is-expression');
var expr = [`a`, `a+b`, 'true', '2+b', '(true||a)&&b']
suite
.add('babylon.parseExpression', function() {
expr.forEach(function(e) {
babylon.parseExpression(e)
});
})
.add('is-expression', function() {
expr.forEach(function(e) {
isExpression(e)
});
})
.on('cycle', function(event) {
console.log(String(event.target));
})
.on('complete', function() {
console.log('Fastest is ' + this.filter('fastest').map('name'));
})
// run async
.run({ 'async': true });` |
The lexer indeed seems a little faster on the babel branch lexer on master branch x 1,265 ops/sec ±10.00% (76 runs sampled) The difference is not very important, probably because the template used does not have many expressions so the time spent in isExpression or parseExpression does not have a huge impact. nevertheless, babylon.parseExpression seems faster and the babel branch version has cached a full AST version of the expressions. var Benchmark = require('benchmark');
var suite = new Benchmark.Suite;
var puga = require('./pug-master/packages/pug-lexer');
var pugb= require('./pug-babel/packages/pug-lexer');
var tpl = `
mixin a()
div b
div
div hello
+a()
+a()
if a+2>7
div #{var1}
else
div #{var2}
`
suite
.add('lexer on master branch', function() {
puga(tpl)
})
.add('lexer on babel branch', function() {
pugb(tpl)
})
.on('cycle', function(event) {
console.log(String(event.target));
})
.on('complete', function() {
console.log('Fastest is ' + this.filter('fastest').map('name'));
})
// run async
.run({ 'async': true }); |
The parser, which is nearly identical in master and babel branch, do not have a noticeable impact. parser on master branch x 1,087 ops/sec ±8.22% (76 runs sampled) var Benchmark = require('benchmark');
var suite = new Benchmark.Suite;
var master_lexer = require('./pug-master/packages/pug-lexer');
var master_parser = require('./pug-master/packages/pug-parser');
var babel_lexer = require('./pug-babel/packages/pug-lexer');
var babel_parser = require('./pug-babel/packages/pug-parser');
var tpl = `
mixin a()
div b
div
div hello
+a()
+a()
if a+2>7
div #{var1}
else
div #{var2}
`
suite
.add('parser on master branch', function() {
master_parser(master_lexer(tpl))
})
.add('parser on babel branch', function() {
babel_parser(babel_lexer(tpl))
})
.on('cycle', function(event) {
console.log(String(event.target));
})
.on('complete', function() {
console.log('Fastest is ' + this.filter('fastest').map('name'));
})
.run({ 'async': true }); |
I tried to compare knowing that
the test results end up well in favour of addWith x 5,938 ops/sec ±24.56% (70 runs sampled) var Benchmark = require('benchmark');
var suite = new Benchmark.Suite;
var addWith = require('with');
var babylon = require('babylon');
var babel = require('babel-core');
var babelPluginTransformWith = require('babel-plugin-transform-with');
var src = `
if (var1 > 2) {
pug_escape(var2);
}
`
var globals = ['pug_escape'];
var tpl = "// @with exclude: "+ globals.join(",") +"\n{\nlocals || {};\n"+src+"}"
var ast = babylon.parse(tpl);
suite
.add('addWith', function() {
var str = addWith('locals', src, ['pug_escape']);
})
.add('babel-plugin-transform-with', function() {
var str = babel.transformFromAst(ast, null, {
code: false,
plugins: [
babelPluginTransformWith
]
});
})
.on('cycle', function(event) {
console.log(String(event.target));
})
.on('complete', function() {
console.log('Fastest is ' + this.filter('fastest').map('name'));
})
.run({ 'async': true }); |
@jeromew |
@ForbesLindesay I commited a modification of the way I was building the AST for the On my machine, I went from when the master branch is at so this is a big win. Now I benchmarked a simple var w = babel.transformFromAst(file, null, {
code: false,
plugins: plugins
}); basically, for a simple compile master x 2,915 ops/sec ±8.35% (75 runs sampled) when both plugins are activated (compaction & addWith) compile master x 3,001 ops/sec ±9.66% (78 runs sampled) when transformation is applied, but with no plugins at all so the plugin execution in itself does not seem to have a huge impact. compile master x 2,937 ops/sec ±9.89% (76 runs sampled) when no transformation is applied (transformFromAst is not called) compile master x 2,992 ops/sec ±10.20% (76 runs sampled) when no transformation is applied and code generation is removed do you know if there is a real world pug application on github that I could use as a benchmark for a real world usage ? I'll dig a bit more but any ideas are welcome ! |
https://github.com/esdiscuss/esdiscuss.org is a real world application using pug. That improvement is fantastic. I think you're correct that the babel-template module is pretty slow. I wonder if we could write our own ast transformation without using babel itself and get things running faster, given that the |
I am trying to understand why manually modifying the AST for the compaction should be relatively easy. The addWith case is a bit more work I suppose. Do you think it would make sense to modify the |
transformFromAst seem to be invoking a default plugin the cost of transformFromAst is important, but increasing the list of plugins invoked does not have such an impact. I feel like there will always be a transformFromAst down the line because people will want to apply such and such plugin (isn't it the reason an AST-first approach would be nice ?). Maybe trying babel 7 is worth a try to see what happens. Maybe there are performance improvements that can have an impact here. |
@ForbesLindesay on my machine compile all esdiscuss.org/views templates with master branch x 2.58 ops/sec ±7.94% (11 runs sampled) If I leave the transformation, but without any plugin, I get compile all esdiscuss.org/views templates with master branch x 2.59 ops/sec ±7.38% (11 runs sampled) if I remove the transformation (no call to transformFromAst), I get compile all esdiscuss.org/views templates with master branch x 2.60 ops/sec ±8.38% (11 runs sampled) var Benchmark = require('benchmark');
var suite = new Benchmark.Suite;
var fs = require('fs');
var puga = require('./pug-master/packages/pug');
var pugb= require('./pug-babel/packages/pug');
var dir = './esdiscuss.org/views/'
var tpl = fs.readdirSync(dir).filter(function(file){
return ~file.indexOf('.pug');
})
suite
.add('compile all esdiscuss.org/views templates with master branch', function() {
tpl.forEach(function(file) {
puga.compileFile(dir+file)
})
})
.add('compile all esdiscuss.org/views templates with babel branch', function() {
tpl.forEach(function(file) {
pugb.compileFile(dir+file)
})
})
.on('cycle', function(event) {
console.log(String(event.target));
})
.on('complete', function() {
console.log('Fastest is ' + this.filter('fastest').map('name'));
})
.run({ 'async': true }); |
pug-code-gen (3.0.1 → 3.1.0)New Features
Packages With No ChangesThe following packages have no user facing changes, so won't be released:
|
@ForbesLindesay I still have this branch in mind. Can you share your thoughts on this ? Do you still think it would be a good idea to have a native AST version of the generator or do think it is better to stay with the string based version. |
@ForbesLindesay for info I rebased the AST code-gen PR on master & upgraded it to babel7. Please tell me what I could do in your opinion to go forward and reach an acceptable state. I understand that this PR has way too many commits now and that it needs a better organisation for review. Ideas welcome. |
@ForbesLindesay for info I added several performance improvements to the AST version of pug-code-gen. Right now a flamegraph on a compilation gives me the following relative times : 1/8 : generation of the code after the AST is built I would be happy to discuss this if you can find some time. |
@ForbesLindesay I found some other improvements by adding some AST JSON caching for frequent small strings) that are not commited. They improve things but are not game changers regarding performance. On another aspect, there is probably a big improvement to be found by rewriting the attribute building to make more AST aware (it currently mainly builds the attribute string and is reparsed). I think this is a performance issue, because, we invoke many things (isConstant, toConstant, parseExpression, ..) on things that have already been parsed at the lexer level. The modification would need quite some work so I would prefer a discussion and feedback on this PR first if you can find some time for this. |
cf #2708
please consider this a first working draft version. It passes all tests but may appear a bit rough in some parts.
Main difficulties were :
I do not know very well the babel tools so I was more in a mode "I need to get this working". Maybe you will be able to see how we can simplify some parts.
In this version I have not tried to re-use already parsed AST. I do not know where we're at on this.
you will wonder why I used generators in some interim wrappers. These can be moved to properties.
Please keep in mind that my objective was to be able to inherit easily from code-gen to create
then-pug
.You can see the implementation of the
then-pug
code-gen here to see how easy it is to implement thethen-pug
version: https://github.com/jeromew/pug/blob/master/packages/then-pug/lib/pug-code-gen.js