Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate with RunKit? #16

Open
benjamn opened this issue May 12, 2016 · 12 comments
Open

Integrate with RunKit? #16

benjamn opened this issue May 12, 2016 · 12 comments

Comments

@benjamn
Copy link
Owner

benjamn commented May 12, 2016

Seamless ES modules support would be a very nice feature for their interactive REPL: https://tonicdev.com

Right now I think the only thing preventing https://tonicdev.com/npm/reify from working is that they display a warning that import and export statements are not yet supported.

@benjamn benjamn changed the title Integrate with Tonic? Integrate with RunKit? May 16, 2017
@benjamn
Copy link
Owner Author

benjamn commented May 16, 2017

Still not working: https://runkit.com/benjamn/591b48fd8e88b80012bfc657

@jdalton
Copy link
Contributor

jdalton commented May 16, 2017

\cc @tolmasky of runkit for help.

@tolmasky
Copy link

I'd love to discuss this in detail (a call would be great). The basic problem is that we can't figure out a way to make import/export really work in (multi-cell) REPLs due to their unique hoisting behavior. I've brought this up with node's discussions on import and their response has basically ben "we're not going to think about this right now": nodejs/node-eps#40 . I've put as much of a summary here as I can, and am very open to discussing strategies.

So, I'll first start with the core issue, then move on to how it affects RunKit uniquely. Apologies if I am explaining anything that is already understood, just laying out the initial conditions to show how it is harder in RunKit. The basic problem is that unlike requires, imports do not "evaluate" in the order they appear. This alone, in my opinion, is already an unfortunate feature, as can be seen from this example:

process.env.DB_KEY = "x";

import db from 'db'

gets evaluated as:

db = require('db') // or inlined or whatever.

process.env.DB_KEY = "x";//too late, code has already executed.

This is clearly an anti-pattern or not a great way to design your library, but the reality is that many libraries currently expect to have side effects upon requirement, and the language has now introduced a new feature that makes predicting the order of those side effects considerably harder. Consider that if you were to move process.env.DB_KEY = "x" into its own setup.js file, the following WOULD now evaluate in the expected order:

import `setup`
import db from `db`

Strangely, an import that is meant to behave more like the inlining of code, actually behaves completely different when it is inlined. Basically, imports seem to be designed under the assumption that they contain side-effect free code, but allow code with side effects. Had the rule been that imports could only have side effect free code (in other words, only exports and const exprs), then this would all be fine and RunKit would already support them. However, given that an import can have a top level file write, the order of execution is important but very difficult to control. You can agree or disagree with this decision, but it becomes more complicated when you then apply the conclusions of this decision to REPLs.


So, now the problem with REPLs. In a REPL, you really can't hoist after the fact. The code you first typed and hit enter has already been evaluated, is too late to go back. So an import after a setting of a variable, will actually happen after the setting of the variable, it can't go back and do it before-hand make the import happen first. In a non-time-traveling REPL (like what ships with node), this could just be considered a acceptable quirk of working in a REPL.

However, RunKit allows you to edit previous cells and re-run from that point. RunKit actually rewinds the state of the computer to that point, and picks up where it left off. So if you've written 10 cells, and change cell 5, it'll only redo the work from there. As such, RunKit can make the separate guarantee that the code behaves identically what would happen if you were to copy paste the contents of all the cells, paste them into a file together, and then just run them with node. RunKit will for example notice when a lower cell would affect an earlier cell (for example, changing the definition of a hoisted function, and re-run from the appropriate point).

As such, import and export offer a difficult challenge: the only way to guarantee deterministic re-running is to rerun the entire document from the start upon the addition of ANY import. Merely importing something on cell 20 means you have to re-run the whole document from the beginning, because there's no way to know ahead of time whether that import has some side effect.

So our options are:

  1. Let people write imports and turn RunKit into a "re-run from scratch" environment every time a new import is made.
  2. Only allow imports on the top cell and then throw an error and tell people to use require/dynamic-import on later cells
  3. Completely change the behavior of import and just turn it into a require.
  4. ???

@tolmasky
Copy link

Just to give one extra example. In RunKit you can require notebooks from each other (since they're literally just modules). So this would highlight the different execution orders: when you are writing your notebook, the imports could happen in cell order (if we mimicked the REPL behavior), but as soon as you required it from another notebook, it would then just be a normal file and would execute in hoisted order.

@benjamn
Copy link
Owner Author

benjamn commented May 17, 2017

Thanks for the background, @tolmasky! I definitely appreciate the challenge you're describing, and I don't see any obvious solution. I also understand that your complaint lies more with the semantics of import declarations generally, and less with using a tool like Reify to simulate those semantics.

Reify does hoist import declarations in modules, since the spec demands it, though it would certainly be easier for the compiler to replace the declarations in place, without moving code around, so that

console.log("before");
import { a, b as c } from "./module";
console.log("after");

would just become

console.log("before");
let a, c;
module.watch(require("./module"), {
  a: v => a = v,
  b: v => c = b
});
console.log("after");

Since it's not feasible (as you explained) to hoist the side effects of an import backwards in time, a Reify-enabled REPL just executes each command sequentially, as if the command were a one-line module that happens to share scope with the other commands you've entered.

That programming model clearly does not preserve the RunKit promise of producing the same behavior as concatenating all the commands together into a single module, but maybe it doesn't have to? In any REPL, I think the programmer expects to have control over the order of side effects, based on the order of commands. So perhaps the importable module generated by the RunKit notebook should preserve that ordering behind the scenes, rather than suddenly re-hoisting the import declarations?

You can agree or disagree with that suggestion, but if you agree that RunKit should generate modules where import declarations are evaluated in the same order as you entered your commands, then Reify might give you an easy way to achieve those semantics. Specifically, if you compile each command with Reify, then concatenate the results into a module, any import (or export) declarations will be replaced with the module.watch and module.export API, which works equally well if you hoist the code or just leave it where it is.

While this feels like a departure from the semantics of import declarations in modules, I believe it can be justified by the different expectations programmers have when using a REPL. The benefits (over just using require) would be (1) support for live bindings, and (2) the ability to check that an imported name has actually been exported by the source module.

Are live bindings and error checking worth the trouble? Do live bindings pose any problems for your programming model of re-runnable cells? I don't know the answers to those questions, but I hope I've had some influence on your thinking about these problems.

@tolmasky
Copy link

tolmasky commented Oct 9, 2017

Sorry for the late reply to this, have been continuing to think about it though. I've been trying to get comfortable with the idea of essentially adding a 3rd evaluation type (to the existing "script" and new "module" types), with REPLs/notebooks. In this world, as you describe in your suggestion, imports would in fact evaluate in-order vs. first.

The main thing I've been trying to figure out is still how to bring that into a module setting since the main issue with diverging module/REPL import behavior, is that RunKit notebooks are actually both. They are REPLs, but you can require other notebooks just like modules. So notebook B can do require("@runkit/username/notebook-a"), at which point either its a special module that still resolves imports sequentially, or it introduces subtle bugs by importing things in a different order than when you interacted with it like a REPL. I think it would have to be the former, hence me wanting to call it "notebook resolution" vs REPL resolution. I've just been trying to figure out if there is a mapping to existing features that allows me to reproduce this effect without just saying "we use require, that's the end of it". One possibility is that cells would map to imports themselves:

import whatever from "./cell-1" // now each internal import happens in the same order as the "REPL"
import whatever from "./cell-2"
import whatever from "./cell-3"
import whatever from "./cell-4"

The source transformation is still complicated in that they need to share hoists and stuff (a function in ./cell-4 needs to be accessible from cell-1".

Anyways, the other main thing preventing me from putting imports into RunKit (aside from just time), is the fact that it seems that node has not yet figured out how they will implement the feature. their .mjs solution seems like it might not be its final form, but who knows.

@jdalton
Copy link
Contributor

jdalton commented Oct 9, 2017

@tolmasky
Node's .mjs decision is solid (no longer up-in the air). They're also landing a --loader hook soon.

@bmeck
Copy link

bmeck commented Oct 9, 2017

/me pops up

Hiya! Yes, we will be landing a loader hook. It is super experimental but needs some userland feedback. I wouldn't go all in on it since it is likely to change before ESM is unflagged.

@tolmasky
Copy link

OK, I think I might have a solution that I could get comfortable with (but I want to make sure it will work in Node as at least in Node's current state it isn't working). To summarize, the goal is to take the REPL's currently implicitly agreed upon non-standard behavior and attempt to make it work in a file. Namely, REPL's execute imports in the order they appear interspersed among normal statements (across different inputs), while in a module all imports would be executed first. If you were to copy-paste the code from the REPL into a file, the ordering would thus differ. To compensate for this, the interpretation of "cells" in a REPL is as separate files, each importing each other. So for example:

> console.log("start");
> import _ from "./test"; console.log("end");

where test.mjs is:

console.log("Normally I'd be first");

Translates to:

// cell1.mjs
console.log("start");
// cell2.mjs
import __unused from "./cell1"
console.log("end");

So far so good. Here it is obvious why the printout is "start\bNormally I'd be first\nend" and not "Normally I'd be first\nstart\nend". So to some extent we've reconciled the two systems. However, we must still account for things like this:

> function x() { console.log(a) }
> import _ from "./test"; let a = 10; console.log(x());

Here things are a bit trickier as our desire to separate the two inputs in order to correctly handle the imports breaks the connection between x and a. This should also be handle-able though through exports if I understand live binding correctly:

// cell1.mjs
function x() { console.log(a) }
export let a;
// cell2.mjs
import { a } from "./cell1"
import _ from "./test"; let a = 10; console.log(x());

There are still a few things to figure out though. For example, had I used const instead of let for a, it would be much more difficult to pull this off since I'd want to pre-declare a in cell 1, but I can't do that if its a const. But some properties magic on the export object may provide a workaround. However, my main concern currently is that at least in Node today, live bindings aren't working this way. Is this just a bug?

If we believe this will be implementable, here are the pros and cons as I see them:

  1. If you are writing just one cell, everything works the way you'd expect (we aren't for example saying "in RunKit, imports are just requires and so execute in statement order").
  2. Embeds, having just one cell, would work identical to modules.
  3. importing a RunKit document should "just work" since its just like importing the first cell (which then goes on and imports the rest of the cells). The only issue here is exporting everything that needs to be exported.

Potential problems:

  1. I still don't like that while imports make sense within cells, they remain behaving differently from files. Although this creates an explanation for it, I still think its unintuitive, in a system thats already complex and hard to understand in my opinion (the semantics of import are significantly more complex than require which works like any other function).
  2. It's not clear to me if live binding will work in node
  3. It's not clear now whether node will be using .mjs or not, once again making me feel like "wait and see" is the best approach
  4. It's unclear to me whether node will actually load modules asynchronously. I saw that written in a proposal, but it seems weird that ticks could fire inbetween imports. I think this actually breaks a lot of expectations in synchronous require. If it is the case, then I too would be adding weird ticks inbetween cells.

@benjamn
Copy link
Owner Author

benjamn commented Jan 15, 2018

Thanks for following up with this!

I think it may be difficult to use live bindings to share variables between cells, since live bindings are designed to be one-way: the exporting module can change the exported values later, but importing modules cannot reassign any imported identifiers (at least not in a way that changes the exported value observed by other modules).

Perhaps these REPL cells/files could all share a common scope, and variables like a and x could be hoisted into that scope? Practically speaking, the Node REPL currently has a context object which could maybe serve that purpose, though unfortunately by default context === global in Node (that is, when you run node to get the usual REPL).

There's been some discussion of REPL context and coming up with a better REPL spec in the context of this recent PR to add import support to the Node REPL: nodejs/node#17285 (comment)

In response to that discussion, @bmeck created this repo to discuss/draft a REPL specification: https://github.com/bmeck/js-repl-goal

Summary responses to your numbered problems:

  1. Agreed, this is the main source of confusion to resolve with a better REPL specification.
  2. Immutability of live bindings on the importing side is most likely how Node will continue to work.
  3. Yep, I too am waiting for the npm fork to be considered fully.
  4. REPL commands are processed with an async callback style in Node, so the REPL can handle async evaluation pretty easily (that's how repl: add top-level static import nodejs/node#17285 works: import declarations are extracted from the command and loaded first, and then the rest of the command evaluates in the next tick). I think this principle would be worth formalizing in any REPL specification.

@tolmasky
Copy link

tolmasky commented Jan 15, 2018

Yeah I think you are right about the binding, just misunderstood it originally. This is unfortunate since it once again puts me in a weird place as we really need REPL and file semantics to at the very least have a sensible high level transform between the two, since in RunKit you can require a notebook as a module just doing require("@runkit/username/notebook"), and additionally, we are also working on allowing anyone to npm install a notebook as well with npm install @runkit/tolmasky/install-test. Unfortunately doing some fancy scope tricks (while potentially possible) breaks down in these cases if we can't end up with some artifact that you can drop into a larger project that works the same. Certainly any sort of v8 hacking is out since we won't control the user's end environment.

Our desire is really to not have a "third execution context" (CJS, ESM, REPL), but to have a REPL be an understandable interpretation of a file with "breaks" as if you were just inserting console.log statements where the outputs show up - for a REPL to be elevated to a "living file". I'm aware that this may not be in line with the desires of node's standard REPL, and this may seem like stubbornness on our part, but this was one of the "main points" of RunKit: to really push the "playground" experience that we deeply believe is critical for early users. We have the long term vision of being able to open arbitrary files and inspecting content (file -> repl). Its currently really nice to start something as a hacky playground and "end up" with a usable module others can use (repl -> file) -- this is much harder if fundamental semantics change in the process.

We see lots of people begin their programming experience (or even their projects) in a REPL, then eventually have to move away from the fun experimental environment to the much more rigorous and strict file environment. Although the REPL wasn't necessarily originally intended to be close to a file, it accidentally ended up being that way such that a true playground experience was at hand, and this import stuff really makes it difficult to continue, especially when REPLs were explicitly not considered in the original proposal.

Worst case scenario we could just say "you can only import in the first cell, and then in further cells you have to await import()". This sucks because we don't want you thinking about all your imports ahead of time, but to be fluidly interacting with data and bringing in dependencies as needed - weird cell 1 vs cell 2 rules will only add friction to that. I feel import is already much more complex to understand than require (with its re-arrangement rules and parallel but slightly different destructuring syntax), without having to append RunKit rules. I'll keep thinking about it though.

@bmeck
Copy link

bmeck commented Jan 15, 2018

@tolmasky if it matters the REPL goal was a thing that has been talked about from time to time but avoided. V8 and Node were interested in implementing a 3rd goal since REPL implementations are diverging in behavior and that isn't desirable. It will be presented to TC39 in March by myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants