-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeScript Bytecode Interpreter / Runtime Types #47658
Comments
That is an impressive write-up, but I have to question the purpose of opening yet another issue about a feature that has been explicitly rejected again and again by the TypeScript team. |
As you can see the main discussion issue #3628 has never been rejected and is still open, just like many other feature requests that although never confirmed neither rejected are still open. Also, this issue is not about a vague feature request, but a concrete implementation and analysis of the market demand plus a proposal for a bytecode interpreter which can be used for runtime types but also for other scenarios like performance improvements as well. So, quite different to anything seen yet, in terms of elaborateness, analysis, design proposal, and implementation demonstration. To even remotely compare this to any other issue regarding runtime types is mystifying to me. |
This is awesome. However, it is worth noting that there is an explicit language design non-goal to:
I'm not on the TS team and am not the one making decisions about whether to leave issues open or close them, but if I had to wager on it I'd bet that this will be declined quickly, no matter how amazing it is. We'll see, I guess. |
@jcalz yes, indeed. I've written about that in the original post, and at the end. Getting a stance on that defines if and how this will be made available to the users. Doesn't have to be necessarily in TypeScript's core, its good enough if the TypeScript team decides to support the undertaking in other ways. I think it has already shown that this non-goal is not realistic. Maybe at the beginning 10 years ago it was plausible for JavaScript, but not today. At the end the user/market decides how a language will be used, and it (see Demand section in OP) clearly showed that type information are used in runtime and a lot of people and projects use and rely on it in all sorts of programs. |
Sorry, I don't see where you acknowledged the explicit non-goal that essentially prohibits this endeavor in TS proper. Looks like you mentioned goals 3, 4, and (checks notes) 9, but not non-goal #5. I might have missed it though. |
That's fine. I'm more interested in what the TypeScript team is saying about that and especially about the last paragraph of the original post. It's not a must-have to have this in TypeScript itself. However, there's not much value in sticking to years old goals that might be meanwhile obsolete due to market movements. So let's concentrate on making the use cases of packages that have 76mio downloads/month possible in better ways instead of finding ways against it and why it shouldn't even be considered in the first place due to abstract nice-to-have goals. |
First off, this was a super fun and interesting read, so thank you for posting it. It's great to see people exploring what's possible, and this looks like a very promising project. To cut straight to the point: Is this in scope for us, now? Still no. The absolute plethora of options in this space, both for native-to-TS options like io-ts and 3rd-party-to-TS like JSON schema validators, shows that there's a wide range of what people want in the space, and that many projects with different approaches are capable of delivering tools with value. Regarding design goals - we're extremely committed to the erasability of the type system. Keeping the type system erasable is a very strong invariant that allows us to do other important things. One of the reasons we're able to make the type system more powerful and expressive from version to version is that we have the flexibility to e.g. change type inference at a particular position from This isn't a theoretical concern: people already |
@RyanCavanaugh note the last paragraph of the original post:
Would you recommend that @marcj flesh out these features into concrete suggestions and raise them in new issues (assuming such issues do not already exist)? Or are those also out of scope? |
@RyanCavanaugh I'd also like to highlight this ask. While "runtime types" indeed seems like a risky and "Out of Scope" feature to add to the default typescript build, why not let experiments happen outside the typescript tree? The community needs just a little bit of help. The upside is that a lot of useful innovation will happen that is otherwise not possible, and sometimes even discouraged today. |
We do take API requests through issues and have added a fair number of functions to the public API surface upon being asked. Emit plugin support is also available and we'd take issues on things there if things are needed. We don't support transform plugins in tsconfig.json because we think this operation should not be a risky one:
|
Why not support |
I have to mention my project (not listed in the table above) tst-reflect. I'm missing reflection in TypeScript(/JavaScript) because of Dependency Injection. TypeScript will never implement reflection, sorry to all of us, I'm just realist. So I've created my own quite huge reflection system.
Generic types supported! Usage is inspired by C# reflection. Here is REPL with example. Runtime Type, even runtime generic
function printClassInfo<TType>()
{
const type = getType<TType>(); // <<== Here is the generic type used!
if (!type.isClass())
{
return;
}
console.log("class " + type.name);
console.log("full type identifier: " + type.fullName);
const properties = type.getProperties();
const methods = type.getMethods();
console.log("Properties");
console.log(
properties.map(prop =>
`${AccessModifier[prop.accessModifier]} ${Accessor[prop.accessor]} ${prop.name}: ${prop.type.name}
).join("\n")
);
console.log("Methods");
console.log(
methods.map(method => AccessModifier[method.accessModifier] + " " + method.name
+ "("
+ method.getParameters().map(param => param.name + ":" + param.type.name).join(", ")
+ "): " + method.returnType.name
+ (method.optional ? " [optional]" : "")
).join("\n")
);
}
class Bar {
foo: string;
bar: any;
toSomething(): void;
}
printClassInfo<Bar>(); |
Deserialization is the one thing that absolutely requires some runtime representation of compile-time types. There are many different approaches to deserialization, but they all have the same general shape: Perhaps having a simple standard type like this would help. An advanced proposal could also include an API for composing these types: e.g. a standard operation for constructing a |
Ok, cool, was worth a try. Since runtime type information is out of scope and the others points weren't addressed, I assume there is no interest in any of this, so I'm going to close this one. Thanks for the fast feedback. |
FYI, this TypeScript bytecode interpreter has now been released together with a full-featured framework that tries to utilise its potential to the fullest: https://deepkit.io/blog/introducing-deepkit-framework |
Mark |
For anyone reading this: This has been tested last year and a proof of concept high-performance type-checker has been built using the byte-code approach outlined in this issue which confirms this theory actually works. It can be found here: https://github.com/marcj/TypeRunner. It shows how it can speed up type checking by many hundreds to thousands of times. I can not build it alone as it's too time-consuming. But if we could make it happen it would not only shape the future of TypeScript itself but the whole industry, bringing so many architectures to the next level, allowing debugging of types interactively, increase the Interoperability of many languages (replacing JSON schema with TS), and rendering so many transpilers (esbuild, SWC) obsolete. We have meanwhile dozens of JS packages that make type information available in runtime in all sorts of bizarre ways. Together they have a staggering 180 million installs per month! Much more than TypeScript itself. I think it couldn't be more clear that people really want this. |
@marcj Hey, still subbed to this thread. TypeRunner looks amazing. Do you plan on publishing a specification for the bytecode? |
@sinclairzx81 not planned to make one, although I have a subset more or less standardised since they are used in backward compatible ways in Deepkit's runtime types since now two years. |
Hi All, so just banging my head around this issue. It is really a shame that we don't have reflection when using typescript. so much effort put into strongly and correctly type everything and then all that info is lost at runtime 🥲. Anyway... from what I can see the biggest issue for the TS team point of view is this one:
Now, here's my idea for a simple API that could tackle this issue. It's straightforward and, I think, not too tricky to implement. import { reflect } from 'tsc';
interface User {
name: string;
age: number;
address?: string;
}
const user: User = { name: 'John', age: 33 };
// 'reflect' is a special function provided by 'tsc' for runtime reflection, the js output would be an object literal instead a function call.
const userMetadata = reflect<typeof user>();
console.log('userMetadata.kind); // Outputs: 'interface' (or a similar kind identifier) This way, we're not messing with TypeScript's design goals of keeping type information erasable, as For simplicity the output could be a serializable version of the AST with all nodes that are not related to types removed (or something similar) and all type dependencies embedded (not ideal from a memory point of view but at least we would have reflection). This way there would be NO need to emit anything beyond the You could argue that the typescript compiled output const user = { name: 'John', age: 33 };
const userMetadata = {
kind: X,
flags: [Z, Y],
parent: null, // or other node if there are type dependencies
children: [
name: { kind: X, flags: XWZ, parent: userMetadata}
age: ....
address: .....
]
};
console.log('userMetadata.kind); // .... Anyway just my two cents fro an elegant solution to the problem. Unfortunately, I'm not a tsc compiler guru, so I can't whip up a pull request right now. |
TypeScript currently solves a lot of issues with working with JavaScript today. From type checking itself, transpiling, to AST and language services for compilers, linters, and editors. The gradual and structural typings are unique in a way and it seems a perfect fit for JavaScript. However, two major aspects are currently unsolved: a runtime type aka reflection system and performance issues. Having types available in runtime enables use cases that currently are simply not possible and for which many hundreds expensive, complex, and unsexy workarounds have been built.
In this post I’d like to talk about the use cases and current state of runtime type solutions, a proposal for a full TypeScript runtime type system aka type reflection solution, and a bytecode interpreter with instruction set specification as a by-product that can potentially be used as alternative to the current type checker with performance improvements, and allows new adoption of TypeScript in completely new non-javascript environments.
Runtime Types
What does it mean to have runtime types? There are several levels of reflection systems that could be built. From very basic reflection like with emitDecoratoraMetadata to reasonable one that emits all types of functions, classes, to a more verbose one that emits interfaces, type aliases, and type functions, to very verbose ones that essentially annotate each variable + control flow analysis. What is necessary is defined by the use-case at hand. In a perfect world, we would have access to all types of all symbols (including variables), but that’s not necessarily possible or desired due to restrictions, like e.g. bundle size.
I categorise possible reflection systems in four levels:
Level 1 is already integrated in JavaScript itself via
typeof
.Level 2 can be partially achieved by using either class decorators and
emitDecoratorMetdata
, or fully via schema declaration libraries. It means however a lot of additional boilerplate code and complexity.Level 3 is currently not available. Although there are a few PoC transformers that try to extract the most basic type information, they are far from being able to reflect all TypeScript types and also impose substantial overhead.
Level 4 is somewhat unreasonable, very excessive, and has probably no use-case where level 3 is not already sufficient enough.
I consider level 3 the most reasonable and doable as it makes the whole TypeScript type system available in runtime in a way that supports all type expressions while not excessively emit too much reflection data for each and every variable or expression. This level is what this proposal is trying to achieve.
Demand
There are meanwhile a lot of libraries trying to solve this issue and making types available in runtime, in one way or another. Some provide basic type support, while other support additionally validation and database information, and a few support relatively complex type expressions (like union, intersections).
Here is a list of known libraries and their download count:
State of 28. January 2022
There a many hundreds more that all re-implement a way to make types available in runtime. The space is defragmented quite a lot and APIs not streamlined. If they all would be able to use TypeScript types in runtime, they probably would.
Although it can be considered a good thing that types are erasable per default, it’s clear that types have value. And removing them destroys the value they have and makes it necessary to make them available again in an expensive way. The value depends highly on the project at hand. For many projects and people that value is so big that they create new libraries making the types available in runtime in one way or another, and invest in total many million dollars doing so. The value proposition is self-explanatory once you have a use case at hand that deals with types. TypeScript should thus provide an official way of making them accessible in runtime.
Considering the total download amount of 76mio downloads/month for these libraries compared to Typescript's 123mio download/month it shows a clear picture of the huge demand runtime types have. Including ajv which uses essentially json-schema draws an even more demanding picture, because json-schema could be in the JavaScript world replaced with TypeScript using this proposal. Although not the whole user base of ajv can be included in the calculate since a lot just consume an external json-schema or one that is shared with other non-JavaScript tech stack.
Workarounds
There are several ways to get runtime types. Some are much more complex than others. They differentiate in how much TypeScript types are actually expressible and available in runtime.
Classes with decorators
For more complex types like an array of strings, the type needs to be annotated manually.
The idea is that a decorator is used at a class property and
emitDecoratorMetadata
is enabled so that basic type information are automatically available in runtime. The overhead is substantial as a lot of code is generated for relatively little functionality. The functionality is not only limited in the type expressiveness (anything more complex needs to be annotated manually), but also other key functionality like circular imports are simply not supported and need a workaround usually called “forward reference” where an arrow function is used to defer symbol resolution, e.g.forwardRef(() => MyType)
. There are other disadvantages like property and parameter names do not survive a minimisation process and functions/interfaces are not supported.Custom DSL (domain specific language) with code generator
A code generator then generates TypeScript types. This has the obvious disadvantage of learning yet another type language and use for every change a code generator. This increases the project complexity and mental model enormously. Also, usually those DSL are much less powerful than TypeScript so that the expressiveness of types and the program suffer.
Schema declaration
Declaration of a schema happens usually via method chaining API.
There are usually helper types included in such libraries to extract the declared type for the type system.
Transformers
Transformers parse the AST of the TypeScript source or Checker.ts types, and emit some way of runtime type information. That information could look like that:
This type of data structure can get big pretty fast and is not suitable for a big and complex code base. Its structure is very inefficient in terms of size and runtime overhead. Also, generics with type functions (conditional types, mapped types, etc) are not supported with such an arrangement. On the other side this is very user friendly as one can use simply write TypeScript without learning anything fundamentally new. It just works - which is a fundamental advantage.
Relevant Github Issues
There are many Github issues created in the TypeScript repository regarding runtime type data in one way or another. This also represents as a proof for further demand, beside the list above of projects that have actually invested tons of resources into making it happen.
All of these issues can be solved with the proposed solution. The list is not complete, but should indicate how broad the demand is.
Design Goals of TypeScript
TypeScript has several design goals that need to be considered before implementing a solution for type reflection. Strictly speaking there are three points that oppose a runtime type system, namely:
Impose no runtime overhead on emitted programs.
Emit clean, idiomatic, recognizable JavaScript code.
Use a consistent, fully erasable, structural type system.
TypeScripts design goals explicitely state in point 9 that type information should be erasable. With this proposal this goal is not defeated. It is and should still be possible to erase type information completely if the user has no need for them.
While goal 3 and 4 are one of the reasons for TypeScript’s success and fundamental a good thing, it’s inevitable to break or bend them in order to make types available in runtime. Runtime types create an overhead, although done correctly a very small one, it’s still an overhead. It has a cost. The cost can be justifiable, but still is a cost. This proposal tries to produce as little runtime overhead as possible. TypeScript types are not JavaScript and probably never will, hence any structure or bytecode emitted in JavaScript that represent TypeScript types will not be clean, idiomatic, and recognizable JavaScript code. It will be valid JavaScript though. However, since the trend goes to minimise and further optimise the generated JavaScript code from the TypeScript compiler up to a point of not being readable anymore, those points vanish more and more. A valid point is still though that emitted JavaScript should not contain unnecessary code that never will be used, so making the runtime types configurable and three-shakeable is crucial.
Use Cases
A lot of use cases can be found where runtime types could be a game changer regarding new functionality that currently is just not possible, regarding user experience improvements and code quality in general.
Serialization/deserialization
Even basic types like a class with date properties
class {created: Date}
need a proper deserializer to get the correctDate
type back when deserializing JSON. Serialization to JSON and deserialize back to JavaScript objects is a core functionality that almost every application needs. A use case is receiving a HTTP request with a JSON body in a backend application and deserialize it correctly.Type casts
Type casts (
x as string
) in TypeScript are no real casts. They do not change anything in runtime. However, if the types are available in runtime, there could be a function that does actual type casting.Validation
With types that add meta-data one can annotate models with validation information and use them in a validator to validate arbitrary data against very complex types. Type and actual data validation is then very easily possible.
Validation can go as far as automatically validate each function argument and throw an error if invalid, like other language do it like PHP. It would be further possible to automatically resolve overloaded functions to their correct implementation. Currently, this needs to be done manually.
Automatic type guards
Type guards need to be written manually, but with a runtime type system they can be generated on-demand automatically.
Dynamic type computation
By supporting dynamic type computation, arbitrary type functions can be used in runtime to generate types based on runtime data. E.g. is it possible to construct an object literal based on a schema stored in a database, which can then be used for all other use cases outlined here.
Even generic types can be instantiated with arbitrary runtime types as arguments.
API documentation
By having type information available in runtime, an API documentation tool can be generated on-the-fly without a build process (like typedoc).
Dependency Injection
Runtime types allow for the first time to write code/services against abstractions/interfaces instead of against implementations and auto-write them. Most DI container libraries use workarounds like using string/symbol identifier to abstract that away, which is not ideal. Also, many use decorators and
emitDecoratoraMetadata
which requires additional boilerplate code.An example of dependency injection container with interface support could look like that:
Configuration system
By adding resolution information to computed types in the runtime type system its possible to resolve configuration values automatically in the easiest to use way possible.
When getting the runtime type information of
[Service.host](http://Service.host)
its type information include the information that its typestring
was created via an index operator ofMyAppConfiguration
and'host'
so that there is a link to the configuration class itself. A service container can then auto-wire the configuration values.HTTP router parameter matching
When dealing with HTTP route parameters, a lot of times there are placeholders and query parameters involved. Those are used to match a route from a list of all registered routes and automatically validated and deserialized.
Type safe RPC
There are many RPC implementations and possible transport encodings available to implement one: GraphQL, Protobuffers/gRPC, etc. With having all function types available in runtime, a library can use those information to automatically serialize parameter and deserialize/validate them in the backend.
ORM entities
ORM entities try to map relation tables or document collections to objects. For this a lot of meta-data is necessary to describe the table correctly: Primary key, auto-increment, index, unique, collation, constraints, foreign key, and more. With types that add meta-data it is possible to have those information at runtime available so that query builder, data mapper, and SQL migrations can work correctly.
Type reflection
With a general type reflection API that allows to easily discover and navigate through types its possible for library authors to implement new use cases easily.
Bytecode Interpreter
I thought a lot about how to actual emit and represent the type information in JavaScript without imposing too much initial runtime overhead (in terms of memory, parsing, and executing) and too much code size (bundle size is important). I asked myself “What could be the smallest possible encoding for types”? The obvious answer is a number, a simple byte. A string? A byte. A number? A byte. An array of numbers? Probably two bytes. A union of string|number? Multiple bytes arranged clever. An object with properties? Probably a lot of bytes. Having such an arrangement of raw bytes implies to have some kind of interpreter that reads them and constructs handy type objects, e.g.
{kind: 'string'}
instead of just having a simple number like0
, especially when the type is more complex.Primitives
So, bytes encode a type. For primitive types like
string
,number
,boolean
,bigint
, etc this is rather trivial. When giving each of them a unique number, then an interpreter can easily construct type objects. For example:In JavaScript itself, it can then be encoded just with an array of numbers:
And an interpreter can easily read them and construct a type. The interpreter aka processor is a stack based virtual machine. A very simple implementation could look like that:
This seems rather straightforward.
Who creates those number arrays? The compiler. A TypeScript transformers reads the AST and knows what AST node can be encoded to which TypeOp. This transformer is what I call the type compiler. It creates little byte programs that can be interpreted by a runtime stack machine. The compiler and processor are tightly coupled by a contract that precisely describes what byte ops are available and how they behave in the processor.
What about more complex types? Lets think about an array:
We introduce a new op
array = 4
, which has in its implementation defined to pop the last stack entry and use it as element type. Lets implement the op in our processor:So, with the bytes
[0, 4]
(which is[string, array]
) we get an object back of the structure:because the first op
0
pushes{kind: 'string}
onto the stack, and the second op4
pops that type and pushes a new one using that type as itselementType
.As one can see, the op
array
needs at least one type on the stack or it will fail. This implies that the given bytes need to be valid. The compiler is responsible for creating a valid bytes array so that the processor never crashes.Stack references and op parameters
When dealing with type literals, e.g.
'abc'
a new concept in the bytes array arrangement is required. Lets first define a new TypeOp for literal:We need suddenly the string
'abc'
somehow in our processor. A solution could be to just contain'abc'
in the array:['abc', 5]
But where do we know where to start processing ops then and how do we know to pick the first entry of the array? We need to introduce op parameters and an initial (input) stack frame for references.We change the byte array structure to the following:
Here two things changed: First the actual byte array of ops moved to the last entry of the array, and the op
5
has a0
as parameter. The op5
expects always an additional value behind it which it will read and then jump over. The read value (in this case0
) is then used as array index to access the actual literal value'abc'
.We could call now this array of
typeD
a program. It has a memory with values and an actual program with ops. As convention we say always the last entry of this array is the actual ops byte array and everything in front of it is a value or reference to something. Can be a string, number, boolean, bigint, or a reference to a class, function, or another program.An implementation could look like that:
Stack frames: Classes and unions
For types like object literals, classes, or functions those op bytes arrangements are more verbose and a bit more complex, but follow the same pattern: There will be an op
property
andclass
that interact between each other, so that at the end the last opclass
reads the stack which contains onlyproperty
types and constructs a type object{kind: 'object', types: properties}
.The
array
op pops only and always a single item from the stack. Aclass
or anunion
op can however have an arbitrary amount of members, an arbitrary amount of types it needs to pop from the stack. There are two ways to encode this: Either by giving theclass
/union
op an additional parameter that indicate how many stack entries it should pop or we can introduce stack frames, so that those two ops essentially always pop a whole stack frame. I chose to use stack frames so that previous operators can push an arbitrary amount of types onto the stack without hard coding the amount in the program itself. This makes it somewhat more dynamic.First introduce new bytecode ops.
A program for representing a class and union could look like that:
Then extend the processor to support stack frames and implement the new ops.
Type functions
Its all easy for primitive and simple types like outlined above. But as soon as type aliases, generics, type functions like conditionals, and mapped types are involved stuff gets much more complex.
When you think about those more complex types and type functions, it becomes obvious that TypeScript evolved to be an actual language on its own with variables, functions, arguments, that is Turing complete. It happens to be the case that the semantics of this language can easily be mapped in a stack machine with a few registers. All TypeScript types and type function can be represented in bytecodes that can run in a processor like shown above. Since TypeScript basically also supports variables and closures it’s necessary to use stack frames. Luckily we already introduced them for class/union. Also, mapped types and a few other types contain sub function calls, it’s necessary to introduce a (function) calling convention into the stack structure.
Inline other types
In order to implement inlining other types (functions) we have to introduce one new op:
The
inline
op expects a single parameter pointing to another program.The implementation of
inline
could look like that.Note that this example uses a recursive implementation: This is not ideal and will lead to stack size exceeded errors with very complex types. It also does not support circular types or circular imports, but it should make it clear how it fundamentally works.
Once generic arguments are involved, it gets more complex pretty quickly. First the type with the generic type parameter has this information encoded in the program using variables/function parameters, which is a reserved stack entry of the current stack frame. The caller makes sure that this stack entry is correctly filled when calling the other type.
Variables
Type aliases with type parameters, mapped types, and
infer
introduce variables. Variables are slots on the stack with known address. To implement variables, we need to introduce a new operator that loads an address and pushes its value onto the stack. Also to not interfere with the way pop()’ing all types from a stack frames work, we have to make the stack aware of how many variables it contains. We set as convention that variables are placed always at the beginning of the stack frame and when pop()’ing a stack frame, those variables are excluded.An generic type alias would look like that:
The implementation of processor changes:
To support passing arguments to an program with variables, the
variable
implementation could check anargs
array and read from it instead of setting it to{kind: 'never'}
.The type compiler knows where to find a variable identifier and can correctly calculate
loads
call parameters.Calling convention
A few type functions require to call sub functions of a program: mapped types, distributive conditional types, and conditional types. In JavaScript one could say they are blocks, but in this bytecode interpreter those blocks are compiled as sub functions.
To make this work, we need 3 new ops.
Since the type functions above have sub functions with closure (function in a function), those sub functions are embedded in the current program at the very beginning.
jump
with one parameter makes sure that the embedded sub functions are omitted initially,return
marks the end of each embedded function, andcall
with one parameter (position of the sub function) triggers the calling convention (usually done internally and not emitted as op code)Before we can compile this type function in bytecodes, we need a few new ops:
The mappedType pops an entry from the stack created by the
keyof
op and tries to loop over it. The loop calls for each entry ofkeyof
the sub function at2
. The calling convention says that a “call” (to2
in this case) creates a new stack frame and puts in its first stack entry the return address. Oncereturn
is called, the return address is read and the processor jumps to that address again. The mappedType op uses a registry to track at which position it currently is in order to know if another call to2
is necessary or if the program can continue. It modifies for each iteration the second variable so that the sub function has the correct value when doingloads 1 1
.The implementation of all of these ops are a bit more complex and probably out of scope for this post. To implement the whole TypeScript type system there are a lot more ops required. Currently the instruction set contains over 81 ops. If the TypeScript teams decides that this approach is worth considering I’d love to document each and every op with example code like I did above.
Attach type information
Now after defining the representation of a type in runtime code, the next problem to solve is how to attach those information to actual JavaScript symbols like classes and functions.
The idea is that for classes and functions I get the full type information when the symbol is passed to a function.
To make this possible its necessary to attach the bytecode program to each function and class. I decided to choose to write it at a property called
__type
. So that emitted JavaScript could look like that:Interfaces and type aliases could be emitted with their name prefixed with something unique.
This approach has as advantage that no global WeakMap is required and the program is GC automatically when the function or class is not referenced anymore. This however is limited to function and classes only. It’s not possible with this approach to annotate type information to object literal expressions or variables. Exporting type aliases and interfaces as const variables makes it possible to be tree-shakeable.
Why
Why is a bytecode interpreter proposed and not something more trivial like serializing a type as JavaScript objects? One can easily emit JavaScript objects that describe a type directly instead of introducing bytecode ops and an interpreter. For example:
The reason I chose a bytecode representation is that the emitted JavaScript code is much smaller and allows dynamic type computation. The latter is not possible when the final computed type is serialized. Resolving generic type arguments also require some sort of runtime type computation and type checks in order to narrow a generic union down to the real passed argument. Also, serializing a complex class with many properties and all its attributes (visibility, readonly, initializer, optional) makes the emitted serialized objects huge and the bundle size too large. They also impose a greater runtime overhead cost in terms of execution and memory footprint. My goal was instead to design a format that has the fewest overhead cost possible and generate load only when the type is actually requested.
Encoding
The program bytes are encoded in printable ASCII characters, from the code range 33 upwards.
That means the program array structure from above
Is actually encoded in JavaScript like that:
This makes a complex type much smaller compared to the array number representation.
Type checking / Performance Improvements
The bytecode interpreter as currently designed and implemented only allows static types to be compiled. That means no inferred types or control analysis is implemented. Since it supports the
extends
type operator its able to check whether two types are compatible. This is currently limited toextends
only and has not implemented more detailed type checks like all type variances.However, by using such an architecture it would probably be possible compiling a whole TypeScript file into bytecodes and execute this in a very fast VM. That VM can be written in faster languages like C++/Rust and operate entirely on those bytescodes. By implementing error collection it would be further possible to collection validation errors in this VM which basically converts it into a very fast type checker.
This approach of compile a TypeScript file into bytecode, putting it as a binary file on disk, and run it on demand in an ultra fast VM enables probably much faster type checking and simplifies caching.
I say here “probably” because I’ve not tested it yet. If the TypeScript teams says its worth investing further I’d love to build a PoC for that. I’d implement the VM in C++ and convert the type compiler from a transformer to a standalone TS program that is based on the TS compiler AST.
Beyond the Stars
A bytecode interpreter, or rather an bytecode compiler and official spec of an instruction set for a virtual machine, could allow new ways of using TypeScript in non-JavaScript environments. For example could it be possible to define types in TypeScript and use them in cross-language environments for e.g. serialization and validation, equal to what JSON-Schema currently is used for. Concretely allows that to define types in TypeScript (a subset of TypeScript, essentially only the type system, not the JavaScript semantics) and have those types as single source of truth sitting in a .ts file. Other languages like go, C++, Rust, can read its bytecode and compute the type in their runtime allowing them to work with those types to e.g. create serialization or validation functions.
It would enable writing ORMs entities, JSON schema, ProtoBuffers, and many other DSL in TypeScript without having the full fledged TypeScript library or JavaScript as dependency.
One could go as far and argue that this instruction set and bytecode interpreter could be standardised and implemented in the JavaScript engines like v8 directly. Then based on that further ECMAScript features can be implemented (like overloaded functions, function parameter validation, and more). I think TypeScript is here to stay and making JavaScript itself more powerful on successful patterns TypeScript introduced is something the whole industry would appreciate.
Implementation Proof of Concept
I implemented most of the outlined use cases already in a library called Deepkit. It contains
@deepkit/type
for serialization/validation/type guards,@deepkit/orm
for an ORM that supports classes and interfaces as entities,@deepkit/rpc
that supports an fully type-safe RPC implementation with automatic serialization/validation,@deepkit/injector
a dependency injection library that supports auto-wire, interfaces as dependencies, and more.The current released version of these libraries use a custom type system based on classes with decorators, however I’ve rewritten and implemented almost all libraries onto the new runtime type system in the branch
feature/autotype
that works like described above. The core functionality is in this folder: https://github.com/deepkit/deepkit-framework/tree/feature/autotype/packages/type/src/reflectionIt contains the type compiler (that generates the bytecode from the AST in a TypeScript transformer), the processor (the stack machine that processes the bytecode and generates types), as well as some handy reflection classes and type utility functions.
Some important characteristics of this implementation:
as
is not supported yet. Also there are a few differences and not all types are exactly what TypeScript would emit, see for example: Template literal wrong inferred value when multiple placeholders #47048.API
To get the runtime type information from a type, a function call is necessary. To arbitrary types
typeOf<T>()
can be used. For classes and functions there are two ways to get type information: Raw type objects and handy Reflection classes:The various
Reflection*
classes are inspired by other reflection APIs like the one of PHP. The code of those classes can be found here: https://github.com/deepkit/deepkit-framework/blob/feature/autotype/packages/type/src/reflection/reflection.tsCompiler
The compiler can be found here: https://github.com/deepkit/deepkit-framework/blob/feature/autotype/packages/type/src/reflection/compiler.ts
It’s a rather complex transformer that needs access to various internal types from the TypeScript compiler API in order to get all necessary information.
All bytecode ops can be seen here: https://github.com/deepkit/deepkit-framework/blob/feature/autotype/packages/type/src/reflection/type.ts#L1976-L2141
Processor
The processor can be found here: https://github.com/deepkit/deepkit-framework/blob/feature/autotype/packages/type/src/reflection/processor.ts#L248
Caching
Types in this implementation are cached. The cache is attached to the bytecode array directly and only set when the program is not generic.
Type types
Type object information are simple objects that have all type information in a raw format. There is a big enum contain all ReflectionKind and for each type an interface that has a kind, additional type specific properties, and parent + other runtime information like type annotations.
See https://github.com/deepkit/deepkit-framework/blob/feature/autotype/packages/type/src/reflection/type.ts#L21-L452
Reflection
Reflection is the heart of the implementation. Its API and tests can be seen here: https://github.com/deepkit/deepkit-framework/blob/feature/autotype/packages/type/tests/reflection/reflected/integration.spec.ts
Type decorators
In order to be able to implement certain functionalities like ORM and validation its necessary to annotate types with additional meta-data. For validation its validation constraints like minimum, maximum, minLength, etc. For ORM its meta-data like primaryKey, autoIncrement, reference, index, etc. TypeScript already supports branded types which can not directly be used here. Instead it uses a pattern almost identical to branded types but making the brand optional.
In the reflection system, intersections with objects that contain a
__meta
property are handled in a special way. They expressions are stored at the type directly.With those information at the type directly it’s possible to change the behaviour of certain types in an extendable way.
Serialization
Serialization examples can be seen in the tests: https://github.com/deepkit/deepkit-framework/blob/feature/autotype/packages/type/tests/reflection/reflected/serializer.spec.ts
The Serializer API itself is highly customizeable and the default serializer can be found here: https://github.com/deepkit/deepkit-framework/blob/feature/autotype/packages/type/src/serializer.ts#L1643
Casts
Casts is nothing more than running the deserializer of the default serializer. It supports converting types that are convertable and throws if its not convertable.
https://github.com/deepkit/deepkit-framework/blob/feature/autotype/packages/type/tests/reflection/reflected/serializer.spec.ts#L210
Type guards
Type guards are an inherent part of serializers since union types need a type check in order to know which serializer function should be used. Its API and tests can be seen here: https://github.com/deepkit/deepkit-framework/blob/feature/autotype/packages/type/tests/reflection/reflected/typeguard.spec.ts
Validation
Type guards are one part of validation. Content validation another. API and tests can be seen here: https://github.com/deepkit/deepkit-framework/blob/feature/autotype/packages/type/tests/reflection/reflected/validation.spec.ts
Summary
I think this post is already too big, so I stop here, although there can be a lot more said about the bytecode interpreter, runtime types, and its possibilities.
If the TypeScript team considers this worth investigating further, I’d love to help to integrate this into TypeScript itself and make it a reality for all those users downloading solutions for that million times each month.
I understand if this feature is not in TypeScript's interest (as outline in its goals), however I think considering the shown demand in the industry and further described opportunities it might be worth either rethinking about the goals or at least support this undertaking with external partners/projects. Even if it will not be part of TypeScript officially it would help enormously if the Typescript team accepts this use case as valid and provide certain features (like making certain compiler APIs public or allowing transformers in tsconfig.json) and supports it in a way that it can be implemented in the best way possible.
The text was updated successfully, but these errors were encountered: