Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infrastructure tracking: compilation, source mapping, and TypeScript #1338

Open
aschampion opened this issue Jun 8, 2016 · 25 comments
Open

Comments

@aschampion
Copy link
Contributor

aschampion commented Jun 8, 2016

This is an internal tracking ticket for infrastructure for compiling client code through Django-pipeline @tomka and I have been discussing for some time.

Motivation

  • We would like to take advantage of the type safety and productivity benefits of TypeScript for central libraries and APIs in CATMAID, while allowing widgets and experimental features to still be prototyped quickly in plain JS as before.
  • We would like to factor more components into modules to decrease complexity, better delegate responsibilities to components, make composition easier, and allow for reuse outside of the full CATMAID client.

Requirements

These are the capabilities any compiled pipeline must maintain:

  • The client must recompile when edited in a development environment without any explicit action.
  • Source mapping in the browser console.
  • Likewise, no mangling or breaking of the API in the browser console for scripting use.

NOTE: The below plan is no longer valid. See the discussion beginning in 2019 below for a webpack-based approach.

Draft Plan

  • TS Compilation
    • Pipeline plugin
    • Pipeline's development mode should recompile on page load, but if this is too slow there are alternatives, e.g, using gulp
  • Source mapping
    • The Atlantic has a PR adding source mapping support to pipeline (finally)
    • Since we bundle multiple source files together, we need a compressor that is also source-map aware. The PR above uses closure's compressor. To use a compressor like closure we'd need to go ahead and annotate all our modules and exports so that they're still callable from the console. Non-mangling compressors like JSMin produce source maps but don't consume them from earlier steps, or if they do consume only one (like UglifyJS). This may be a non-issue if we can just compile a single TypeScript entry point file, so then there is only a single source map to pass through. Also annoying is that pipeline can't apply different compilers and compressors to different sets of files.
    • Closure can take advantage of TypeScript's typing through JSDoc annotations, for example google has a project that does just that
  • Dependencies
    • We prefer to keep our dependencies in-repo rather than packaged from NPM or bower, but would also want matching TS definitions through Typings

Drawbacks

  • Requires installation of TS and possibly closure compilers for non-docker servers
  • No in-browser editing of compiled files
  • Potentially more latency in the edit > test > edit cycle for developers

Alternatives

  • Django pipeline is unergonomic for modern Javascript ecosystems for SPAs, feeling more oriented towards classic HTML + JS + CSS full page delivery. It might be better to adopt a different asset management strategy.
@aschampion
Copy link
Contributor Author

@tomka I was looking at the webpack feature branch again curious how much work it would be so that we could move dependencies out-of-tree and be able to support TypeScript, etc. It actually seems somewhat straightforward now to be able to avoid name mangling and still separate libraries into separate bundles, and we're most of the way there if we wanted to incrementally migrate libraries or code out of pipeline.

  1. If I PR'd the branch with changes doing this, do you feel likely to merge it, or would the overhead of needing to npm install and webpack outweigh the benefits at this stage?
  2. If yes to 1, as a MVP, would it be better to migrate a dependency (i.e., using NPM instead of an in-tree copy) or to move some small, isolated part of CATMAID-lib to a webpack module?
  3. If no to 1, is there some more elaborate MVP that would make the benefits more compelling? Something like converting some part of CATMAID-lib to TypeScript, or changing a cumbersome form somewhere to React with hot loading?

@clbarnes
Copy link
Contributor

For testing and benchmarking alternative implementations of Arbor.js and friends, I wrote https://github.com/clbarnes/arbor-harness / https://www.npmjs.com/package/arbor-harness, which fetches and mangles the original implementation from raw.github . If those bits of catmaid-lib (arbor, arbor-parser, synapse-clustering) were on npm, that would get easier. Hardly isolated, though!

Would we run into issues surrounding the order in which everything is imported, IIFEs executed, and stuff added to the CATMAID object?

@aschampion
Copy link
Contributor Author

If those bits of catmaid-lib (arbor, arbor-parser, synapse-clustering) were on npm, that would get easier

Making components available elsewhere as their own libraries would be a goal (besides better tooling and organization within CATMAID). First we'd separate them into modules in-tree to sort things out, then later move them out.

Would we run into issues surrounding the order in which everything is imported

It's a consideration, but my plan for incremental conversion would be to start from the "top" of the import order, and have a pseudo webpack entry point before the real legacy CATMAID scripts that would just import whatever is ES6 moduled/webpackified. It's similar to how we've done several other modernization/big cleanup sweeps in the codebase.

If everything were converted at once, it wouldn't be a problem. But I think that would be too high of an activation energy.

@tomka
Copy link
Contributor

tomka commented Feb 21, 2019

Yes, I would merge such a change. The features/webpack branch looks indeed pretty clean. Being able to use TypeScript would be great!

npm install

During regular development, I suppose, that would only be needed if NPM dependencies have changed (or initial setup), i.e. I don't need to call it normally?

webpack

This would also only be need to be called if if the NPM dependencies or our own TypeScript and webpack bundles change, wouldn't it? If so, unless, work is done in TypeScript or bundles, there is no need to call this a lot either.

Both would of course be part of a regular production server update.

but my plan for incremental conversion would be to start from the "top" of the import order

👍

If everything were converted at once, it wouldn't be a problem. But I think that would be too high of an activation energy.

I believe so too and think we should try to do this as an incremental update.

@aschampion
Copy link
Contributor Author

TypeScript would be great!

I was doing something yesterday that required me editing on one computer then reloading and debugging very slowly on another, and waiting a minute to refresh a page and setup some widgets just to find out I'd typoed a property name was very motivating to want to use TS.

During regular development, I suppose, that would only be needed if NPM dependencies have changed (or initial setup), i.e. I don't need to call it normally?

Correct, basically pip install for our JS dependencies.

This would also only be need to be called if if the NPM dependencies or our own TypeScript and webpack bundles change, wouldn't it?

Also correct, although there is also a server/watch mode that will watch for changes and automatically do incremental updates to the bundles, which would be useful if frequently edited code is bundled. The "neat" deep end of this (not that we'd use it) is being able to do thing like React component hot reloading, where you edit React components, and they live-update in the browser because the state of the component is separate from the UI/controls that are getting reloaded.

@clbarnes
Copy link
Contributor

I think it would be nice to move towards using the webpack dev server; by constraining usage of TS / modern JS to a small set of new files to avoid having to manually recompile all the time, we're obviously losing out on a lot of the incentive to switch.

Related, I've been fumbling with catmaid-lib in this context today and have got it to a stage where webpack can bundle it and it doesn't immediately error when loaded in a browser. I haven't really touched any of the IIFEs, so the modules don't really add any value, but it's there as an experiment. But there's still bits of catmaid-lib which depend on globals or CATMAID members which are defined in the other catmaid code, which doesn't seem like it should be the point of catmaid-lib. requestQueue probably won't work because the global needs to be overwritten to set up CSRF.

@aschampion
Copy link
Contributor Author

I've been fumbling with catmaid-lib in this context today

Would it be easier to factor out smaller libraries (like arbor) first and bundle those, or do you think it's best to bundle all of CATMAID-lib at once, then factor it into focused libraries later?

But there's still bits of catmaid-lib which depend on globals or CATMAID members which are defined in the other catmaid code, which doesn't seem like it should be the point of catmaid-lib. requestQueue probably won't work because the global needs to be overwritten to set up CSRF.

Agreed, it makes sense for CATMAID-lib to define the RequestQueue type, but the global default queue instance should be in the non-lib CATMAID, and any requests issued from the lib should take a queue explicitly. Beyond just organizing into libraries, making these coupling points explicit is useful now that we want to talk to multiple backends from one client as part of the federation features.

@aschampion
Copy link
Contributor Author

To add, a concern with bundling all of CATMAID-lib at once and moving it to an NPM package is that some parts of CATMAID-lib still have high churn. The update process is a bit more cumbersome for those if it requires editing and publishing in a separate repo, publishing to NPM, and updating in CATMAID. That would make sense for stable parts like arbor.

However, there may be nothing stopping us from have multiple packages in-tree here. So CATMAID-lib can stay in this repo, with local dependencies from the main CATMAID front-end package to the CATMAID-lib packages, and each of the individual CATMAID-lib packages get published separately to NPM as part of the release process. I like that option if we can get it set up, as there's no new impedance to our existing workflow but all the benefits of separate, published libraries.

@clbarnes
Copy link
Contributor

clbarnes commented Feb 22, 2019

Would it be easier to factor out smaller libraries (like arbor) first and bundle those, or do you think it's best to bundle all of CATMAID-lib at once, then factor it into focused libraries later?

Yes, probably. My current factoring doesn't really benefit from the modularisation because it's still dependent on the import order, and most of the imports are only being done for the side-effect of adding stuff to the global CATMAID object. Long-term I think the goal would be to completely decouple CATMAID-lib from that global object, and just let the caller populate it.

The exports are:

  • CATMAID
  • requestQueue (global instance)
  • Arbor
  • CircuitGraphAnalysis
  • SynapseClustering
  • InstanceRegistry
  • fetchSkeletons

Those in bold are good targets for excising first.

Beyond just organizing into libraries, making these coupling points explicit is useful now that we want to talk to multiple backends from one client as part of the federation features.

Agreed; this is an opportunity to take a more systematic approach and separate them properly. The other such point I found is that active_skeleton.js needs the global SkeletonAnnotations from catmaid, and access to CATMAID.TracingTool which is also defined in catmaid.

Minor point, here for documentation: d3 will need to remain an in-tree dependency because node 7+ can't build d3 <4. There may be similar issues with other dependencies; I haven't looked too closely at their version constraints yet.

there may be nothing stopping us from have multiple packages in-tree here

https://github.com/lerna/lerna seems to be popular for managing interdependent JS projects in a monorepo, and handles symlinking to each other as local dependencies, deduplicating shared external dependencies, synchronising version updates and publishing changes.

I've seen a fair bit of advice recommending that frontend and backend are treated essentially as separate projects, which took me a while to get my head around given our current repo structure. If we're considering making "real" JS libraries and building the frontend, it might be worth separating the concerns a bit by giving them (closer to) top-level directories rather than keeping them buried in django/applications/catmaid/static/thekitchensink/etc ; especially if we were to move towards not having them managed by django at all. The compiled bundles could always go into those django/etc paths: that could make our lives a bit simpler when it comes to not serving source files.

So our repo could look more like

django/
docs/
scripts/
sphinx-doc/
frontend/
  catmaid/
    package.json
    webpack.config.json
    src/
  catmaid-lib/
    package.json
    src/
  arbor/
    package.json
    src/
  vendor/
    d3/
package.json
lerna.json
README.md
LICENSE
.travis.yml
...etc

For reference: https://github.com/dan-kez/lerna-webpack-example

@clbarnes
Copy link
Contributor

Removing the IIFE could (/should) mean unindenting most of the file, thus trashing the git blame anyway, thus removing the block against using an autoformatter.

@aschampion
Copy link
Contributor Author

While I'm a fan of autoformatting, because of the peculiarities of JS, especially the heavy use of anon functions, I'm initially skeptical that one that wouldn't hurt readability exists. Happy to be persuaded, though.

OTOH we should autoformat and static analyze the hell out of the backend. I aspirationally put flake8 in Travis years ago and for awhile played the minigame of reducing warnings.

@aschampion
Copy link
Contributor Author

Also with any formatting tools, they have to be compatible with all of our dev environments, which can be a tall order:

dev editors
Tom vim
Andrew Sublime, vim
Chris PyCharm
Will VS Code

Even getting jshint working for everyone wasn't painless.

@aschampion
Copy link
Contributor Author

Relatedly, if we ES6ify a lot of classes at the same time we remove the IIFE, not that much indentation should change.

@clbarnes
Copy link
Contributor

Yeah, I have noticed prettier can be pretty funky about putting way too many lines around short expressions. I don't envision the environment being a big issue; if we had an npm script to do it, called in a pre-commit hook (and then it was checked in CI to make sure people were using the hook) the editor wouldn't need to be involved.

It's a good point about saving ourselves the blame churn by ES6ing stuff at the same time, though.

@aschampion
Copy link
Contributor Author

aschampion commented Feb 22, 2019

The only decent JS autoformatter I've been able to find is Prettier. A big personal nit is that when it line breaks single statements, it uses one tab-width instead of two, and there's no way to configure this:

myObject
  .foo();

But I could probably bring myself to live with that. More annoying is that it changes code that looks like this:

      if (stackViewer.isLayerRemovable(key)) {
        container.append($('<div class="layerClose">')
            .append($('<input type="button" value="x" class="remove"/>')
                .click(makeRemoveHandler(stackViewer, key))));
      }

Into this

      if (stackViewer.isLayerRemovable(key)) {
        container.append(
          $('<div class="layerClose">').append(
            $('<input type="button" value="x" class="remove"/>').click(
              makeRemoveHandler(stackViewer, key),
            ),
          ),
        );
      }
Config

{
  "arrowParens": "always",
  "bracketSpacing": true,
  "htmlWhitespaceSensitivity": "css",
  "insertPragma": false,
  "jsxBracketSameLine": false,
  "jsxSingleQuote": false,
  "parser": "flow",
  "proseWrap": "preserve",
  "requirePragma": false,
  "semi": true,
  "singleQuote": true,
  "tabWidth": 2,
  "trailingComma": "all",
  "useTabs": false,
  "printWidth": 88
}

To me the former is much more readable, and I think it's less preference based.

@tomka
Copy link
Contributor

tomka commented Feb 22, 2019

I am not in favor of having an autoformatter or enforcing a particular style. Of course, everyone is free to use them on their code personally, but I feel the general style decisions we make and the code style we follow are reasonable enough. Leaving some room for variation and evolution of styles is useful to in my opinion.

Also, to me the idea of introducing webpack/etc first for a small set of our library functions or just a dependency seems preferable over changing everything in one go. I'd like to gain some experience with the new workflow and design options first, before we decide if it is a good idea to change the whole code base at once. I won't be able to spend much time on this and can't really judge the implications and required works for all this yet.

The same is true for general project layout and publishing options (npm publishing, lerna, versions, etc.). I am curious and happy to discuss these things, but I feel like it would be useful if we could decouple this decision from including npm as JavaScript dependency manager and webpack for its TypeScript support and bundling capabilities.

Generally I would prefer fewer tools we need to build and setup things. When updating a lot of production servers and having to remember an >5 step update process isn't ideal. Of course it could be automated and abstracted, but simpler would be nicer I believe.

Agreed, it makes sense for CATMAID-lib to define the RequestQueue type, but the global default queue instance should be in the non-lib CATMAID, and any requests issued from the lib should take a queue explicitly. Beyond just organizing into libraries, making these coupling points explicit is useful now that we want to talk to multiple backends from one client as part of the federation features.

👍

I've seen a fair bit of advice recommending that frontend and backend are treated essentially as separate projects

That's certainly reasonable for the type of application CATMAID is and reorganizing our code base accordingly is certainly useful and something I would support (e.g. like you suggested above). Such a codebase update would certainly also benefit from a more modular design options, possibly even with individually publishable components. How we would do this exactly is also affected by how well weback and npm dependencies work for us. Therefore, I'd like to gain some experience with webpack and npm dependencies first before I can offer a more qualified opinion on such a change. :-)

@clbarnes
Copy link
Contributor

clbarnes commented Feb 22, 2019

Generally I would prefer fewer tools we need to build and setup things. When updating a lot of production servers and having to remember an >5 step update process isn't ideal. Of course it could be automated and abstracted, but simpler would be nicer I believe.

This is a good point. There's a school of thought in favour of committing the bundles (webpack's output) to git. If we did this, then the deployment complexity wouldn't change at all - the only difference would be that the javascript code living in django/applications/catmaid/static would be webpack bundles built on the dev machines rather than source, which would live elsewhere, and collectstatic wouldn't need to do any concatenation etc.. I'd be in favour of this; we'd just need to run the webpack build on CI as well to could prove that the bundles are in sync with the source. That would also mean that our dependencies in the output files are in-tree just like they are now, agnostic of whether they are fetched on the dev machine by npm or from in-tree.

@aschampion
Copy link
Contributor Author

I am not in favor of having an autoformatter or enforcing a particular style. Of course, everyone is free to use them on their code personally, but I feel the general style decisions we make and the code style we follow are reasonable enough.

Agreed on this front, at least in that it's a completely independent topic from the webpack infrastructure being discussed here, it would have to be an unanimous decision, and there's no compelling reason at the moment.

I like autoformatting in some contexts, because it does a few things:

  • Making all code uniform in style can increase overall readability, because you quickly become very optimized in reading that style. This can outweigh specific cases where readability suffers.
  • With practice it removes an entire decision set from your coding process. You're never thinking, "should I line break before the function call here or after I open the closure block."
  • Generalizing on both of those points, the code becomes much more content focused.

That said, I don't think there's good reason to do it in the CATMAID frontend now, and I don't think nontrivial JS is suited to it because of the syntax quirks and idioms in the language, unlike Rust or Python. JS also doesn't have much syntax that's difficult to manually format consistently (unlike, e.g., fn signatures in Rust with multiple generics, HRTBs, bounds). I ran several files through Prettier, and with the exception of some specific cases with lots of existing syntax noise, I was almost always happier with the original.

If we still had a lot of ice age code from pre-strict-mode, pre-jshint, pre-IIFE, I would be in favor of a one-off run, but as things are I agree we've converged on a reasonable style.

There's a school of thought in favour of committing the bundles (webpack's output) to git.

Honestly I dislike this practice. It makes the repo huge, requires setting additional defaults on git grep to avoid duplicated hits, makes merging more tedious. Pixi does/did and I often had problems when bisecting it that their bundles were out of sync with the code because unless your CI is set up right and you have a CI-pass-gated-merge dev branch it's likely to happen. While I sympathize with wanting the install/update process to be as simple as possible and have done many commits to that end, adding a npm install and webpack build seems reasonable considering CATMAID is 70% a javascript application.

Having a script that does

pip install -r requirements.txt
manage.py migrate --all
npm install
webpack
manage.py collectstatic

etc., possibly with a mode that does a non-modifying version of each first to check and display which needs to run seems like a better amelioration for these tasks.

@clbarnes
Copy link
Contributor

Agreed that formatting is not a priority here; it was a throwaway comment I made when I recognised that my catmaid-lib experiments were necessarily blowing away the git history anyway.

A third way of doing deployment (although whether it's simpler is up for debate) would be for travis to bundle the static files and throw a tarball onto an FTP server (on tagged releases) which could be untarred onto each catmaid server. Saves having to manage a node environment as well as python on every server, and the build step in deployment. But it would make it harder to include third-party apps, it would make the development deployment very different to production, and we'd probably want to put all of our CSS, images etc. through the same wringer, which doesn't sound much like an MVP for now.

@clbarnes
Copy link
Contributor

Noting down issues as I hit them while experimenting:

If we end up using lerna (which I think is a good idea if we were to split out and put components on npm), in-tree dependencies (as d3 will have to be) would need to be owned by particular subproject. If that dependency was needed by multiple subprojects, we'd either need to duplicate it (unpleasant), or have it owned by one subproject and re-exported for consumption by others (introduces more cross-dependencies than strictly necessary).

@clbarnes
Copy link
Contributor

Here's my experimentation repo. I haven't tested it very hard but it loads without error...

It includes 3 subprojects: Arbor (which includes SynapseClustering), catmaid-lib (depends on local Arbor, vendorised d3, and some npm libraries), and CATMAID (depends on catmaid-lib, handles all the globals). catmaid-lib has been refactored to strip out IIFEs and use ES6 imports. CATMAID has some typescript and ESnext in it, which gets transpiled down to target IE11 and a few versions back of major desktop browsers; it's just got a few stubby classes which are required by catmaid-lib.

I haven't looked at how bad the name mangling is. The typing isn't enforced (cf mypy).

The output bundles include sourcemaps. I haven't checked whether common dependencies are deduplicated.

https://github.com/clbarnes/CATMAID-build-prototype

@aschampion
Copy link
Contributor Author

Chris and I talked about the next iteration of the experiment targeting ES6 straight from TypeScript so there's only bundling, no babel, and thus minimal/no name mangling.

@clbarnes
Copy link
Contributor

clbarnes commented Dec 10, 2019

Further offline discussions:

As per #1955, an option for a performant 64-bit ndarray implementation is a wrapper around rust's ndarray; a build system would be useful so we don't have to commit another large wasm blob to the source tree. Typing still has many advantages for entering, maintaining, and expanding the codebase.

The build prototype doesn't represent an MVP - we could simplify it significantly for the first pass by keeping all of our JS in the same package, with vendorised dependencies (possibly switching one small, modern library over to an npm dependency, as a proof of concept). This avoids the extra layer of config and redirection that is lerna; we could then split out libraries at a later date if we wanted. The downside of keeping the vendorised JS is that it all works by mutating the global scope. Side-effect-only imports are an option, although I believe we'll lose the benefits of typescript for libraries which function that way. Heavily-used features could get typescript type definition files (cf mypy stubs) in the interim. We would still benefit from using typescript in our own code, or even just the import system which lends itself to static analysis. Also for the interim period, we could have an index.js which just imports everything and inserts it into the CATMAID namespace object. Commonly used components like the request queue, skeleton source and so on could be early targets for typing. Like mypy, the incorporation of type annotations can be gradual and optional to begin with.

Uncertain points (for me at least) is how much effort it'll take for typescript to become useful when so much of what we touch can't make use of it without us adding definitions, and also what the development workflow will look like (replacing django-pipeline, updating the extension system etc.).

@clbarnes
Copy link
Contributor

A bit more tracking:

The build prototype above is just about splitting and refactoring existing JS code into a webpack-y fashion - it gets us closer to having resolvable dependencies and so on once webpack is integrated with CATMAID, but doesn't do anything to get us there, and so is probably more of a stage 2 thing.

I did do some work on a prototype to get django and webpack playing with each other, and got it working to an extent. Basically, you run a webpack dev server alongside the django dev server. The webpack dev server detects changes to frontend files and recompiles them in memory, while dumping out a file giving information on its last build. The django dev server reads that file and knows to make requests for static files to the webpack dev server rather than its own static directory. The webpack bundles can still happily make requests to django's REST endpoints: the landing page is handled by django's templating, but loads the static files from the webpack dev server. In theory, all we need to do is write a monster index.js which loads everything from all of the JS files in the project.

I'll push it when I'm home so you can take a look. Still some outstanding queries:

  • import executes files in their own namespace, so things which aren't explicitly exported don't enter the globals. Will that break all of our vendorised dependencies, which presumably work by mutating the global namespace? Or do they explicitly mutate window/ document (which I think should still work)? We don't want to blow away our entire git history with unindents and have to refactor every javascript file for explicit exports: we could instead export the IIFE (so that it's not II'd) and pass it the CATMAID object.

e.g.

(function(CATMAID) {
  CATMAID.variable = "potato";
})()

becomes

export default function(CATMAID) {
  CATMAID.variable = "potato";
};

and index.js would have

const CATMAID = {};

import myLibFn from "./myLib.js";
myLibFn(CATMAID)

import myOtherLibFn from "./myOtherLib.js";
myOtherLibFn(CATMAID)

...

This would just be for the first pass of getting webpack to do what pipeline is doing currently. In the longer term we want files to explicitly import what they need from each other. Exports not currently tied to the CATMAID global object can probably be converted to real exports, as there's few enough of them.

  • Some files are straddling the line between backend and frontend. HTML templates rely on stylesheets, which ideally would be bundled by webpack, but are templated/served directly by django. For now, we can keep CSS with the backend, but there may be cases where templates rely on stylesheets relating to dependencies (which would preferably be handled by webpack).

Re. timeline: we may possibly be having a CATMAID hackathon some time around May, possibly with a view to hiring another dev or two. If we're going to make significant changes to the dev workflow, we probably want to do that before undertaking hackathon projects and introducing people to the codebase.

@clbarnes
Copy link
Contributor

clbarnes commented Jan 20, 2020

Django + webpack prototype, with a CATMAID-y backend layout, but frontend files put into their own top-level directory. This is a proof of concept for a dev setup: haven't tested a production deployment.

https://github.com/clbarnes/django-webpack

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants