-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Discussion] Compilation, intermediary functions, statements, expressions, etc #15
Comments
For reference, the previous attempt to use statement-oriented syntax is document here. The implementation used a similar approach but the result was far from perfect - there were a lot of unnecessary intermediate variables and block scopes. I'm not sure how of a performance impact that would have had.
The Previous experiments showed calling the
If you had to use special syntax, function, macro when using I/O in ShenScript, that would mean that any Shen code using I/O wouldn't be portable. That's quite a lot of the code since There's some attempt to cut down on the amount of async-ing and await-ing with this fixed list of known non-async functions, but it's limited and hard to extend since functions and be re-defined to start using I/O even if they originally didn't. Lines 184 to 196 in 0e99050
|
But can't those Shen functions just use the blocking I/O versions of the underlying platform? this wouldn't require any special syntax, just the regular one, and those calls would behave as expected without incurring any overhead. This would not prevent the use of the underlying async functionality, it would just not be done through Shen's primitives. If you wanted to for example write a Web server in Shen, you would not use those primitives anyway, or even worry about portability much, because the way it will be written to perform and behave well will depend a lot on the characteristics of the underlying platform. Will go through those links latter, just wanted to ask about this but this is the part I'm a bit confused about. |
Ok, from the description, Fabrs look very much like what I have in mind, so back then you were already way ahead of where I am now. I understand that the resulting code will be bigger because statements require more bytes than expressions. What I don't understand is what kind of runtime overhead statements could have. Lets say the Javascript compiler is really good at optimizing IIFEs, what it will produce is equivalent to the statements version, right? it will not be faster. Did those benchmarks include compilation time? Thats where I can see some overhead happening.
Ok, thats great. I saw that there was type tracking code, but for the functions I tried the wrapping functions were still present, but now that I think about it it was obvious why (the variables where parameters to the functions, not results of any operation). |
I didn't want to design around special cases where you happen to be able to do sync I/O for a particular operation on a particular platform. The idea was that a function like
...eh, we seem to be of two minds here. I remember the main original Shen website saying "write once, run everywhere" and every platform-specific thing makes that less true. This goes back to comments I've made before like when we say Shen, is supported on platforms X, Y and Z what's "Shen" in each of those cases? Is it just the kernel? Which version of the kernel? According to the color-coded chart on the open source site, "Shen" is a little different per platform so which "Shen" are we talking about? And to a developer, "Shen" might mean the kernel plus common libraries, which might be different per-platform... it's kind of a mess and I don't want to make it any worse. |
They did - the only AOT compilation in ShenScript is when you build it - only the kernel is pre-rendered. All other code gets read and compiled JIT, just like JavaScript.
That's what I was thinking, but the results spoke for themselves. Maybe the code getting generated was excessively noisy in some way, but I'm not sure how. There were a lot of intermediate temp |
Ideally, yes, but once you start building anything useful that goes beyond manipulation of data, you are going to have to depend on functionality provided by the platform. This is already a problem in Scheme, which is far more defined than Shen is, has been around for much longer and is less varied when it comes to platforms on which it runs on. When you want to write code to solve a more complicated problem that involves interaction with the outside world, it is far more likely for it to be non-portable than portable. To make Scheme code portable, what you do is write implementation-specific code for all implementations you want to support, and use But then again, the kind of stuff you see that can be made portably, is not stuff that talks to the operating system, it is stuff like XML parsers, pattern matching macros, etc, not HTTP servers or libraries to talk to databases or anything like that. All the above applies to Common Lisp too, but it is just not as bad, because the Common Lisp spec includes faaaaaar more than Scheme's or Shen, so there is more common ground covered. But even to a lesser degree the problem exists there too. There are only a few I/O primitives in Shen, if you want to write something more complicated, those will be enough, and in that case you will have to depend on Nodejs stuff (same if you were to write something that runs on the browser and manipulates the DOM). |
Ok, I'm going to do the following. I'm going to compile a bunch of Shen functions, then try to manually translate them to a statement oriented version, then run then and compare (using a profile if I can figure out how to do it in node). |
Just a quick update on this. I tried the following functions: (define do-times
0 F -> done
N F -> (do (F void)
(do-times (- N 1) F)))
(define simplify
[Op A B] -> (s Op (simplify A) (simplify B))
A -> A)
(define s
+ M N -> (+ M N) where (and (number? M) (number? N))
+ 0 F -> F
+ F 0 -> F
+ A [+ B C] -> (simplify [+ [+ A B] C])
* M N -> (* M N) where (and (number? M) (number? N))
* 0 F -> 0
* F 0 -> 0
* F 1 -> F
* 1 F -> F
* A [* B C] -> (simplify [* [* A B] C])
Op A B -> [Op A B])
(let Exp [* x [+ [+ [* 12 0] [+ 23 8]] y]]
(time (do-times 200000 (/. _ (simplify Exp))))) I ran this loop, then a modified version without all the async/await, and then one where the code was translated into the statements equivalent, which looks like this: ((s$c, simplify$c) =>
$.d('simplify', $.l(V2 => {
if ($.isCons(V2) && ($.isCons(V2.tail) && ($.isCons($.asCons(V2.tail).tail) && null === $.asCons($.asCons(V2.tail).tail).tail))) {
return $.b(s$c.f, $.asCons(V2).head, $.t(simplify$c.f($.asCons($.asCons(V2).tail).head)), $.t(simplify$c.f($.asCons($.asCons($.asCons(V2).tail).tail).head)));
} else {
return V2;
}
})))($.c('s'), $.c('simplify'));
(($2b$s, simplify$c, $2a$s) =>
$.d('s', $.l((V6, V7, V8) => {
if ($2b$s === V6 && ($.isNumber(V7) && $.isNumber(V8))) {
return $.asNumber(V7) + $.asNumber(V8);
} else if ($2b$s === V6 && 0 === V7) {
return V8;
} else if ($2b$s === V6 && 0 === V8) {
return V7;
} else if ($2b$s === V6 && ($.isCons(V8) && ($2b$s === V8.head && ($.isCons(V8.tail) && ($.isCons($.asCons(V8.tail).tail) && null === $.asCons($.asCons(V8.tail).tail).tail))))) {
return $.b(simplify$c.f, $.r([$2b$s, $.r([$2b$s, V7, $.asCons($.asCons(V8).tail).head])], $.asCons($.asCons(V8).tail).tail));
} else if ($2a$s === V6 && ($.isNumber(V7) && $.isNumber(V8))) {
return $.asNumber(V7) * $.asNumber(V8);
} else if ($2a$s === V6 && 0 === V7) {
return 0;
} else if ($2a$s === V6 && 0 === V8) {
return 0;
} else if ($2a$s === V6 && 1 === V8) {
return V7;
} else if ($2a$s === V6 && 1 === V7) {
return V8;
} else if ($2a$s === V6 && ($.isCons(V8) && ($2a$s === V8.head && ($.isCons(V8.tail) && ($.isCons($.asCons(V8.tail).tail) && null === $.asCons($.asCons(V8.tail).tail).tail))))) {
return $.b(simplify$c.f, $.r([$2a$s, $.r([$2a$s, V7, $.asCons($.asCons(V8).tail).head])], $.asCons($.asCons(V8).tail).tail));
} else {
return $.r([V6, V7, V8]);
}
})))($.s `+`, $.c('simplify'), $.s `*`); Times:
It seems async/await adds considerable overhead, not sure about functions yet as this code I tried happens to not introduce intermediary functions, so the runtime for both non-async versions is the same. Next time I will try something with a bunch of |
On another topic, based on the code I see on * M N -> (* M N) where (and (number? M) (number? N)) but I got this: $2b$s === V6 && ($.isNumber(V7) && $.isNumber(V8)) ?
$.asNumber(V7) + $.asNumber(V8) Just wanted to point that out. |
Using (define let-test
N -> (let N+1 (+ N 1)
N+2 (+ N+1 1)
N+3 (+ N+2 1)
N+4 (+ N+3 1)
N+5 (+ N+4 1)
N+6 (+ N+5 1)
N+7 (+ N+6 1)
N+8 (+ N+7 1)
N+9 (+ N+8 1)
N+9))
(time (do-times 2000000 (/. _ (let-test 100)))) Statements version: $.d('let-test', $.l(V22 => {
const N$2b1 = $.asNumber(V22) + 1; {
const N$2b2 = $.asNumber(N$2b1) + 1; {
const N$2b3 = $.asNumber(N$2b2) + 1; {
const N$2b4 = $.asNumber(N$2b3) + 1; {
const N$2b5 = $.asNumber(N$2b4) + 1; {
const N$2b6 = $.asNumber(N$2b5) + 1; {
const N$2b7 = $.asNumber(N$2b6) + 1; {
const N$2b8 = $.asNumber(N$2b7) + 1; {
const N$2b9 = $.asNumber(N$2b8) + 1;
return N$2b9;
}}}}}}}}})) Times:
So, the intermediary functions add some overhead, but it is much less than async/await. What commit/tag do you recommend for me to check the "fabrs" produced code? I want to try more complicated code examples, but translating them by hand gets tedious. |
Just out of curiosity, I tried the version with the Btw on this example I also got those wrappers, but it was not what I was expecting. |
Just wanted to update on this, turns out I was very wrong here, it seems Node doesn't provide what I thought it provided, and a C extension would be required, which is not ideal. I will keep thinking about this and let you know if I figure out anything. |
Opening this issue so that we can share our ideas here, may help with figuring out the best strategy.
My understanding (and correct me if I'm wrong) is that a lot (it not all) the intermediary functions introduced inside the body of a compiled
defun
are compiledlet
forms.And since most functions end being async, this also introduces a lot of asyncs once those functions have been introduced.
If we have an expression, most of the time it can be converted into an equivalent statement, with its last expression becoming a
return
or an assignment.When some piece of code can be compiled into a Javascript expression, then it is better to do so because it results in a smaller code size, statements would be used when it is not possible to produce an expression.
What comes ahead is an incomplete analysis, and there are probably many holes in my undestanding, so please point those out.
let
expressionsSince
let
s are expressions, using blocks requires inserting a return to the expresion thatproduces the value of the
let
expression.Example:
Gets compiled to:
With blocks and statements the equivalent would be:
But this only works when the result of the
let
expression is also the result of thedefun
.If the
let
expression shows up someplace where its result will be bound to a variable or used as an argument to a function call, then a return cannot be used. What can be done instead, in the case of a variable bind, is to declare the variable aslet Var;
(instead ofconst
), and then introduce the block as before, but with an assigment toVar
instead of areturn
.Example:
Source:
Current:
With statement
The compiler should include this information in the context, so that when the
let
is compiled it knows what to do (eitherreturn
or variable assignment).Since expressions are compiled "from the outside to the inside", annotating the context with this information shouldn't be hard.
The situation where the result of the
let
is used as a function call parameter is more complicated:(+ (let X 1 X) 2)
.If the expression doesn't have side-effects, then it can just be assigned to a gen-symed variable using the method described above, and then that variable used as an argument.
But if there are side effects, this will not work because by doing this transformation naively we are changing the order of execution (Kl should be evaluated left-to-right).
A solution in such cases is to perform a conversion to A-normal form, so that every argument passed to the function call is bound to a variable first, then the variable passed (the transformation is easy to do, I wrote code to do this long ago because it was required for Chibi because it evaluates right-to-left, and also an OCaml port I was working on, this conversion can be done to the Kl code before the compiler even sees it).
It also has a cascading effect, every time this conversion is done, the function call that was an expression becomes wrapped in a
let
, which gets compiled into a statement.And unless side-effectful code is tracked, this conversion would be required every time a
let
shows up as an argument to a function.The good news is that such
let
s are very rare, so all this would only happen in very few cases, if at all.Example conversion:
becomes:
if
expressionsif
expressions are compiled into Javascript's ternary operator, which work quite well because it is an expression. But statements cannot show up inside such expressions, which is a problem now oncelet
expressions get compiled into statements.In such cases
if
s can be compiled into Javascript'sif
s, and the same rules as withlet
can be used to handle the results.For example (basic case, this one wouldn't require any of this, but once compiled this way then statements can be part of
if
branches):Source:
Current:
Proposed:
trap-error
expressionsI see these are wrapped in a function too, but for a
try/catch
same rules as above would apply.And throwing is done with
$.raise
, so throwing an exception doesn't convert an expression in a statement like usingthrow
would.Factorization
If intermediary functions are removed, the constructs introduced by the
factorise-defun
extension can be compiled trivially into labelled blocks and labelled breaks.Since the constructs are new and separate there is no ambiguity and the transformation is direct (none of the analysis required for the above cases is required here, and at the same time, the above cased cannot be confused with these constructs).
The translation is described here: #13 (comment)
Since the conversion done by
factorise-defun
already adds%%return
expressions, the Kl->Js compiled will not have to figure out where to insert returns.Sync/Async
This part is something I'm not very clear on. Is it important for the async stuff to be implicit? Is it possible to instead use sync versions of the primitives that Shen requires, and then provide a new construct to make async calls explicit? if yes, what would the drawbacks be?
I'm adding libuv support on Shen/Scheme, so I'm interested in this not just because of ShenScript, and it is something I haven't given much tought to.
Code size
The statement versions require more bytes than the expressions, but I don't think the difference is THAT big, and if it results in considerably faster code, I think it is worth it.
Many calls that are repeated use functions that are properties of
$
, I think that with some analysis some can be removed (at least when typeckecking and theoptimise
flag are enabled).Also I think expressions like
(+ A B)
can be compiled into$.add(A, B)
instead of$.asNumber(A) + $.asNumber(B)
.The text was updated successfully, but these errors were encountered: