Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variables & Types 2 #383

Open
MichalMarsalek opened this issue Apr 4, 2024 · 0 comments
Open

Variables & Types 2 #383

MichalMarsalek opened this issue Apr 4, 2024 · 0 comments
Labels
architecture enhancement New feature or request high priority nonlocal-analysis Scopes or flow analysis

Comments

@MichalMarsalek
Copy link
Collaborator

MichalMarsalek commented Apr 4, 2024

Here's a new, more specific proposal superseding #354
Implementing this would close #272 #124 #277 #346 #328
I no longer think Polygolf should have scopes or variable declarations. Declarations should be added as dictated by the target langs. Functions should obviously have their own scope, but I woudn't mind removing them for now.
Types of variables should be refined on each assingment. Dead code should be eliminated. Assignments that don't cause side effects (which don't really exist in current Polygolf anyway - except for read opcodes) should be inlined correctly if shorter. Variables should be merged into one if they are not interfering if that's shorter (less declarations etc.).
Examples

Polygolf

$x <- (@0):Ascii;
if ((#$x) < 99) {
  while ((#$x) < 99) {
    $x +<- $x;
  };
  $branch <- ("first" + $x);
}{
  $branch <- "second";
};
println $x;
$y <- (@1);
while (contains $y "  ") {
  $y <- (replace $y "  " " ");
};
println $y;
println $branch ;

Nim

import os
var
 x=1.paramStr
 b="second"
if x.len<99:
 while x.len<99:x+=x
 b="first"+x
x=2.paramStr
while"  "in x:x=x.replace("  "," ")
x.echo
b.echo

Polygolf

$a <- (@0);
$b <- ($a + "!");
$a <- "fail";
println $b;

Python

import sys
print(sys.argv[1]+"!")

Static single assignment

For better understanding of the input programs and achieving the goals outlined above, we need SSA.
Programs should be converted to SSA as the very first step - even before typechecking. Typechecking & inference is then done on the SSA form. This improves the inference because at each use, we know the type depends on the last def (assingment) only. At merge points, the types are unioned, if the union is not representable in Polygolf, (Int | Text) that's an error. Refering to a variable that is not defined along every path that leads to the usage is an error. Ssa bindings are identified by distinct integers.

The above programs would be

$1 <- (@0):Ascii;
if ((#$1) < 99) {
  $3 <- (phi $1 $2);
  while ((#$3) < 99) {
    $2 +<- $3;
  };
  $4 <- (concat "first" $3);
}{
  $5 <- "second";
};
$6 <- (phi $4 $5);
println $3;
$7 <- (@1);
$9 <- (phi $7 $8);
while (contains $7 "  ") {
  $8 <- (replace $9 "  " " ");
};
println $9;
println $6;
$1 <- (@0);
$2 <- ($1 + "!");
$3 <- "fail";
println $2;

I'm thinking the typechecking phase (plugin) could annotate the inferred type on the SSA read (use) nodes. After that a type of each node could be calculated by just examining its descendants, rather than having to lookup the bindings in the entire program.

Type inference

How does one infer types for this program

$i <- 64;
while ($i > 10) {
  $i <- (($i * 2) mod 103);
};

in SSA

$1 <- 64;
$2 <- (phi $1 $3);
while ($2 > 10) {
  $3 <- (($2 * 2) mod 103);
};

?
As we can see, the type of $2 depends on the type of $3 and vice versa. To resolve this, we narrow the type of each ssa binding iteratively.
Here, we

  • infer $1 as 64..64
  • infer $3 as Int because otherwise the type of (phi $1 $3) is not representable
  • infer $2 as Int because that's the union of types of $1 & $3
  • infer $3 as 0..102 because of the result of the modulo
  • infer $2 as 0..102 because that's the union of types of $1 & $3.

We could try guessing some common subtype of the top type (like Ascii, 0..oo, etc.) instead and only fallback to the top type on failure.

Coming out of SSA

When golfing on the SSA is done, we need to get rid of the ssa nodes and introduce variables again.

Variable name allocations

The plugin doing this should be parametrizable by a list of constraints that prevent the ssa binding to be allocated to the same variable. Except for the implicit constraint that the two definitions must not be interfering, these include

  • different original names - this is a very weak constraint that's not needed for correctness, but this is could be used for Polygolf & Python,
  • different target types - this is needed for statically typed langs so that we don't end up sharing int64 & bigint in a single variable for example. If a language relies on this, it should apply a plugin that assings each node its targetType
  • different nature of binding - for example the foreach loop var is immutable in some langs, but not others.

I'm not sure about the exact algorithm, but I would try first "merging" variables that occur together as arguments of a single phi function.

Introducing scopes & var declarations

Some langs (Python) would not apply such plugin.
Others would need to add var declarations. Note that one can not just add the declaration to the first assignment, as is done now, because of there might other assingments that are in a different scope. The simplest implementation would declare all variables at the start of the program. Better algorithms for declaration placement could be implemented, that shoudn't be very hard.

Control flow type narrowing

Types could be narrowed by conditions in if, while & conditional nodes but I feel like that's kinda orthogonal to everything else.

Other paradigms

I believe using the SSA as the primary representation should be good for implementing functional languages. Allocating as few variables (registers) as possible is crucial for languages like brainfuck & Hexagony. Some other help functions and code representations that would be added to make this work should be helpful for some languages as well. For 2D languages like ><> and Hexagony, you need to understand the control flow graph and lay it out in the plane as tightly as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
architecture enhancement New feature or request high priority nonlocal-analysis Scopes or flow analysis
Projects
None yet
Development

No branches or pull requests

1 participant