Replies: 4 comments 4 replies
-
Ahem. We can embed JSON into Type Args via
We haven't used metadata too much yet, but do remember that the "origin" of metadata is not necessarily the same extension as the op. For example, guppy may store debug locations in metadata. This means ops cannot expect to be able to interpret all their metadata. My view is that
I see your point here, and I agree that there is an argument for leaf ops (MakeTuple,UnpackTuple,Tag,Noop,CallIndirect,possibly Call,FuncDecl,Const,LoadConstant,LoadFunction,Lift) to be extension ops, for the reasons you state. I do not think that argument extends to "Parent" ops (DFG,CFG,FuncDefn,TailLoop,Conditional). Disallowing extension ops to have children is an important design choice that:
The ops in the second half of the above parenthetical after possibly: There is a tradeoff to be made here. Reasoning about a closed set of operations with only a closed set of ways of loading constants, calling functions, etc. is an important strength of keeping these "hardcoded". Keeping classes of functionality in "hardcoded" ops means we don't have to expose this functionality to the extension system. One way of thinking about this, that captures both the "parent" ops and "hardcoded" leaf ops, is: any ops using non-value edges(i.e Static,Const,Function,ControlFlow,Hierarchy) should be "hardcoded".
I agree that this would be great, but it does present implementation difficulties. This is analogous to how I think the current system, I agree that our current approach does deserialise on every access, which is not great, but I suggest it can be adequately solved by a caching layer outside |
Beta Was this translation helpful? Give feedback.
-
Oh. That quite significantly restricts what extensions can do, and excludes almost everything that I wanted to use HUGR for in the first place. Instead of "MLIR with linear types" we would end up with "LLVM with linear types". The following should not be builtin, but profit immensely from having hierarchical boxes:
If we exclude operations with children from HUGR, those would not be possible. In particular, the Oxford office would be unable to use HUGR for their plans and would have to build their own thing. Let's for now define "General HUGR" as basically just the data structure, with an infrastructure for rewrites, analysis, serialisation, visualisation etc. Then there would be a "Restricted HUGR" that is a subset of that, including the "core" operations and simple extensions. If we build "Restricted HUGR" we would have to build "General HUGR" as well. If we build "General HUGR", we can still write analyses that only work on the restricted subset. These passes would then also be applicable to general hugr modules after lowering. This is not premature generalisation; I have very concrete use cases in mind already that I simply wouldn't be able to do. I agree that the dynamic typing approach is somewhat clunky, but it is not essential. There is another option that is growing on me: We want to have a declarative specification of operations anyway, including operation names, types, docs, etc. There is also already code for this. Yet there also are quite a few
I can experiment with how that would look like. |
Beta Was this translation helpful? Give feedback.
-
Some general responses to above discussions:
|
Beta Was this translation helpful? Give feedback.
-
I've found a way to recontextualise the operation properties as type arguments approach that makes me a bit less skeptical. We have been encoding values as types in order to pass them as properties to operations. This reminds me of datatype promotion in Haskell (see Giving Haskell a Promotion or DataKinds). Any type is given the kind The signature of an operation is a type schema, consisting of
A prefix of the type variables of the operation's signature are to be supplied as properties to the operation. The operation specifies which type variables belong to this prefix. These can be types, but due to datatype promotion, they can also be values. When a type variable belongs to the operation's properties, it must be explicitly specified. The remaining type variables are inferred.
Type Constraints as Escape Hatches for Custom Logic There are operations whose types depend on the operation properties in complex ways. In these situations we can use the type constraint language as an "escape hatch". When an extension requires custom typing, it can declaratively specify a collection
This story does not yet address how to capture operations with nesting. There are some approaches which I can think of, but I'd need a bit to mull it over and see how it can be made coherent. Code Generation I typically share concerns about maintainability whenever the standard build process is messed with. However, it appears to me that in our case codegen would lead to more maintainable code. We could add an extension and simultaneously generate documentation, specialised Rust code, and convenient Python bindings. We could change implementation details without having to update every extension implementation (be it builtin or custom). I expect the number of operations that we will deal with to realistically be in the low hundreds, so that would be a non-trivial amount of tedious work when done manually. As precedent, I want to point out that Cranelift uses code generation for their opcode and instruction enums (see here). Because Cranelift does not have an extensible instruction set, this is done merely for the convenience of it. With extensibility coming into the mix, the case for codegen would be even stronger. |
Beta Was this translation helpful? Give feedback.
-
Extensibility of the HUGR is not an escape hatch or an attempt at future proofing, but rather a central design goal. Most operations in any HUGR will be in the form an extension. We aren't building a compiler, but rather a library to create compilers. This issue is to discuss some points where I feel the implementation is currently at odds with this goal, and propose some general and implementation ideas on how we can address this.
Egalitarian Data Structure
A benchmark for extensibility is how much of the core operations can (or even are) implemented with the same mechanisms that would be used for operations that are provided by extensions. Perhaps “egalitarian” is a good shorthand name for this, measuring the amount of additional performance and usability that is afforded to the core but not to extensions.
We have an
OpType
enum which lists the core operations, plusCustomOp
for everything else. The custom operations are described by anExtensionOp
(or anOpaqueOp
if the extension is not loaded). Unpacking the layers of indirection (both pointer and conceptual), a custom operation therefore consists of its name (a string) and a list of type arguments. Most operations in a practical HUGR will be custom operations, and thus stringly typed in this way.Moreover the only arguments that can be provided to the operation directly are in the form of type parameters. This would prevent us from implementing custom operations that carry additional information which is not a type (unless we apply hacks such as embedding JSON into the type language). Such additional information is possible to provide to core operations, such as in
FuncDecl
:Neither the name of the function (a string) nor its signature (a type schema) are types themselves, and so could not be passed to a custom operation as type arguments. We therefore could not express
FuncDecl
as an extension operation at all.A different point where this happens is for metadata, which is essentially an uninterpreted JSON value. We do need the capacity to deal with uninterpreted metadata so that we can build tools that do not have to know about all extensions. However for metadata of known extensions, we currently have no choice but to serialise and deserialise the metadata every time we want to look at it, or keep it around in external maps. This not only has a performance impact, but is also potentially unergonomic. We do not store the
NodeType
in the JSON metadata map, and for the same reason we should also allow similar affordances to extension metadata.Operations, Attributes and Properties
A suggestion on how to move forward: To optimise for the convenience of users of the hugr library that are mainly concerned with extensions outside of the core, we should implement all core operations via extensions. This makes us feel the pain points of the extension mechanism early, and prevents us from taking shortcuts by hardcoding the core stuff. In particular I would suggest that the
hugr-core
crate is mainly concerned with the HUGR datastructure itself, with any concrete operation living in thehugr
crate.We can annotate each node with an operation name and operation properties. If no extension recognises the operation name, the properties remain uninterpreted as a (JSON) value. When an extension does know the operation, it should be able to deserialise the property into a Rust type and store that type directly in the HUGR. Similarly, node metadata should come in form of attributes, consisting of an attribute name and value. The value starts off as JSON and can be deserialised into the appropriate Rust type by an extension. Once parsed, we should be able to access properties and attributes by their types. When combined with the appropriate checks, this is an instance of Parse, don't validate.
Implementation Suggestion
There are a few ways to implement this. I have experimented with several of these over the last couple of days. I now believe that the following solution makes decent tradeoffs. We already have a design that is similar to an ECS (entity component system) pattern: nodes are already identified by their index, and additional metadata is attached via maps that are keyed with the node index. If you see a node, its operation, its properties and attributes as a row in a table, an ECS inspired data structure would store each column separately.
Taking inspiration from several existing Rust implementations of ECS, we can make a
Hugr
store the columns in a map that is indexed by its type (see for example the anymap crate). For every attribute typeA
this type indexed map would contain a store which associates node indices toA
s. Depending on the attribute, this could be a hash map, a btreemap, a vector, a bitset or anything else that is appropriate. While this approach does (internally) use trait objects and dynamic dispatch, we only pay this cost whenever we retrieve the store for an attribute and not for every attribute value individually. Moreover any dynamic typing is hidden behind a strictly typed API. This design also works well with the Rust borrow checker, since it allows us to immutably borrow some attributes while writing to another. Potentially this can also help with parallelism in some cases.We can not use
anymap
directly since we require some additional structure on the attribute stores. In particular we would still like to be able to serialise an attribute store. Luckily this is not too difficult to implement. A proof of concept can be found here: https://github.com/CQCL/hugr/blob/feat/attributes/hugr-core/src/hugr/attributes.rsFor properties we can take a similar approach. I am still experimenting with how this could look like precisely. It should be possible to store properties in such a way that it is very efficient to iterate over all nodes for a particular operation, together with their properties. This is a very common and performance sensitive access pattern (for example every pattern match begins with this).
Beta Was this translation helpful? Give feedback.
All reactions