Skip to content

Latest commit

 

History

History
364 lines (315 loc) · 38.5 KB

README.org

File metadata and controls

364 lines (315 loc) · 38.5 KB

FlowCL

A graph-oriented, optionally-typed, optionally-verifiable language built over Common Lisp.

WARNING: FlowCL is currently in beta! Both the code itself and the paradigm it aims to crystallize in code are subject to change.

  • In addition, the project structure is in flux, and it is currently published primarly to get external overviews, criticism, and recommendations on relevant materials (e.g. similar projects or writing on similar concepts). This notification will be removed from the README when enough features have been implemented to provide an idea of how v1 of the API should be designed.

The Flow language is a long-term project on my part, attempting to extend Common Lisp (and hopefully other runtimes in the future) with the ability to interactively build programs around the flow of information.

There are a few core concepts which guide the design and implementation of Flow:

  • Graph-oriented: Flow is intended to [TBC]

FlowCL is intended to integrate with its host languages, allowing integration into existing codebases.

Design principles

Some key principles of the FlowCL project:

  • Compatibility: FlowCL strives to be compatible with code, libraries, and paradigms used by the framework’s users.

    Functionality will be encapsulated as much as possible (e.g. within function/procedure/object definitions), and external endpoints can ideally be added directly to your code with no change in operation or design (save, hopefully, for increased ease-of-coding and potentially also performance).

    FlowCL will also attempt to use libraries such as “trivial-extensible-sequences” to make drop-in replacement even easier, by enabling compatibility with code that was designed for non-FlowCL structures.

  • Scalability: FlowCL is meant to be a basis upon which small and large projects are built. As such, it should allow composition of its elements into larger structures without significant compromises. Ease of use will be a particular target to maintain as code or data structures are composed into larger structures.

    Another target of this goal is reliability. In a mature state, FlowCL will match the reliability of production level code, and be usable in the same.

    Ideally, FlowCL code should also provide efficiency boosts to the user. In some cases this will be outweighed by other factors, such as ease of use. However, we aim to identify opportunities for our code to optimize not only based on its own structure, but also based on information gleaned from how it is being used in a larger context, allowing us to automate a significant portion of the optimization work that would normally be done manually by programmers. This is the same approach used by software such as TensorFlow, to allow users to easly build large systems without unnecessarily large hardware costs or loss of performance.

  • Ease of use: FlowCL aims for developers to easily use it.

    In part, this is a longer-term goal to implement documentation and tutorials to ease comprehension and adoption of the project. In the shorter term, this aspect will be fulfilled primarly through clear, understandable naming and documentation within the code.

    This goal also applies to the design of the framework itself. Endpoints intended for external use should be designed to easily integrate with each other and external code, such as through the creation of utilities for creating or manipulating FlowCL constructs (note that compatibility with native constructs would be a very nice feature for such utilities). In addition, where possible internal functionality should be constructed in such a way as to allow users to easily modify the system, without constantly breaking things or having to deal with excessive or obtuse interdependencies.

  • Extensibility: One of the strengths of CL is its strong culture of backward-compatibility. However, sometimes this can also impede the ability to modify and improve a project.

    There are two main approaches to solve this issue. Firstly, we will design symbols intended as external endpoints to be as independent as possible from the internal implementations. This should allow us more freedom to improve the FlowCL project and the code of its users without any action on their part.

    Secondly, we will not limit ourselves to publishing a single package. Once the FlowCL project has matured enough to have a stable API, we will suffix package names with the “v0.01” convention. The latest stable sub-version will be aliased to its version, and the latest stable version will be aliased to un-suffixed “default” package names (i.e. the package “flow-cl-v0.01” will be aliased to “flow-cl-v0”, which will be aliased to “flow-cl”). Many projects implement this process via Github branches, but our approach allows both active developers and users to track package versions without dealing with branches or commit history, making it significantly easier for package users to contribute improvements.

    • NOTE: “Major version” package names (e.g. “v1”) are intended to be retained indefinitely. However, the minor versions underlying them (e.g. “v0.01”, “v0.02”, etc) are very likely to be discarded, and are mainly for development purposes. Once FlowCL is mature enough for the above system to be implemented, we strongly recommend either pinning code to a major version or just using the default packages.

Installation

Place this repository somewhere ASDF or Quicklisp can find it as a local project, and then use `(ql:quickload :flow-cl)` (or `(asdf:load-system :flow-cl)`, which quicklisp wraps in its quickload feature).

Usage

Progress and Roadmap

Future plans are currently vague, but currently involve the following points:

  • [-] Develop a prototype for node/graph computation
    • [-] Develop an extensible node abstraction
      • [X] Figure out the basic conceptual structure of the core node class
        • Currently thinking about making this just a directed graph of inputs/outputs with a value slot for node data/metadata
      • [ ] Implement a decent prototype of the core node class
      • [-] Figure out extension points
      • [-] Figure out how to define behavior for specific node subclasses
      • [-] Figure out how to contain and execute computations in a node
      • [-] Figure out how to traverse nodes by their connections
        • [X] Implement simple traversals (e.g. BFS, DFS)
        • [ ] Verify that the traversal API is extensible and allows full expression of the power of the underlying node-network systems
      • [ ] Figure out how to execute a network of nodes
        • [ ] Figure out reasonably-efficient execution in the DAG context
          • [ ] Figure out how to deal with parallel execution
            • Maybe mark nodes based on which “parallel stream” they’re assigned to, and only allow pure nodes to have multiple streams? Streams themselves could have lists of start and end nodes, and fulfilling all the start nodes would start stream execution while progressively signaling outputs from particular sub-nodes in the stream to queue into the output nodes of that stream. Non-pure nodes would always have stream 0, which either halts at a stream-0 node when non-zero streams are running with that node as an output (how does this deal with intentional parallelization, where you want the pure streams to have outputs interleaved with the impure streams?) (I suppose if each parallel execution line had its own set of streams, then there would only be one execution line which actually executed impure nodes, so outputs of a stream could just be queued into stream 0 in the order that they were outputted to? That might make race conditions between streams though. Maybe store the outputs as outputs of a stream and just compress the dataflow graph for that transition into the stream nodes? Anyway, there shouldn’t really be duplicate calls to the same impure node (or the same node at all) unless done in a loop, which severely restricts the cases where we would be running multiple streams at once anyway…) or simply allows interleaving inputs from streams with inputs from the base thread (this doesn’t feel like safe-by-default semantics for parallel execution…).
        • [ ] Figure out how to deal with loops
          • Can I just make loops and tail-calls equivalent to each other? This would section off the code inside a loop from being optimized together with the code outside it, which is a performance loss, but makes the compilation process significantly simpler.
            • For instance, it means that a single-threaded execution can be relied upon to not visit the same node twice except when going through a loop, which means I can compress sections of pure nodes into streams fearlessly without losing accuracy.
            • This would mean that you can’t create universes in earlier versions of a loop and then refer back to them later, I think? Or maybe I should allow a function to do this, though the universe would persist across tail-calls in that case, which might lead to a stack overflow on the universe stack even if we have tail-call elimination for loops themselves.
              • Since universes can be manipulated, there’s a simple workaround of just merging the loop universe with the one above it, which could be implemented as a simple macro over the loop form itself. This may impede type inference, however; I suppose the fact that a single iteration of the loop (/ function) could be analyzed without interference from this makes that simpler, but if future iterations depend on things defined in past iterations, either we get an infinite loop of types or we cut inference short somehow and end up with a limited conception of the loop’s output type.
          • I suppose I could just leave compilation of loops as an implementation detail, so future implementors may or may not find case-specific ways to work through this to get context-aware compilation of a loop.
            • Tail-call elimination should be taken as a default, of course, though I suppose it already is
        • [ ] Figure out how to reduce code-graphs based on runtime information
          • Shouldn’t compromise other features like traversal or looping
      • [ ] Figure out how macros work in node networks
        • [ ] Figure out how to structure macroexpansion time vs runtime
          • Should I make a distinction and do multiple expansion passes before the execution pass? Integrate macroexpansion into the type propagation pass, maybe?
        • [ ] Figure out if it’s possible to ergonomically denote the scope of a macro, and if so how
        • [ ] Figure out how macros interact if their range of influence overlaps
      • [ ] Develop a compilation framework for networks of nodes
        • [ ] Figure out how to find patterns in graphs and replace them with more efficient alternatives before/outside-of runtime
          • Preferably this allows both noticing patterns before runtime and just simplifying things based on runtime values…
            • Would the latter (simplifying code based on runtime values) require the full dependent typing system?
        • [ ] Figure out how to denote compilation behavior of individual nodes in relation to their contexts in a graph
          • [ ] Figure out what kinds of behavior and relevant aspects might be needed
            • Cost estimates? How to represent these?
            • (optional) Probability estimates for code-paths / a subset of possible value-types at a point?
              • With the right planning algorithms, this would allow things like reasonable auto-optimization of multi-device computations
          • [ ] Figure out a way to implement compilation and aspect-tracking in an extensible manner
          • [ ] Figure out how to actually denote the compilation behavior
          • [ ] Figure out how to deal with different behavior of different nodes/node-classes
        • [ ] Figure out how to choose between multiple possible compilation directions
        • Resources:
          • [ ] Stream Fusion, to completeness: https://arxiv.org/abs/1612.06668
            • Establishes a framework for optimizing stream-based code to match or exceed hand-optimized performance.
            • Definitely something I want to look at to see where I can use it. Even if the type system ends up interfering with stream fusions this could be implemented locally in sub-graphs where type inference has been fully completed.
          • [ ] Handling non-reducible flow graphs: http://www.faadooengineers.com/online-study/post/cse/compiler-design/227/handling-non-reducible-flow-graphs
            • A very short article (2-3 paragraphs) roughly outlining how duplicating parts of the flow graph allows you to find dominators in otherwise non-reducible graphs.
            • I feel like we could further narrow down where we might need to do these duplications given the syntax of Flow (e.g. only at universe-defining points, since those are the only ones you can arbitrarily transport evaluation to), but I’m not sure
    • [ ] Develop an extensible object/class abstraction for graphs
      • A graph is intended to be a collection of nodes with aggregate behavior.
      • [ ] Figure out how to collect nodes in a graph and refer to a graph’s components
        • [ ] Figure out how to store the nodes
        • [ ] Figure out how to access a graph’s components with reasonable efficiency
        • [ ] Figure out how to execute a graph
        • [ ] Figure out how to compile a graph
          • [ ] Figure out how to denote compilation characteristics of graphs
          • [ ] Figure out how graph compilation interacts with e.g. one graph being contained by another
      • [ ] Figure out graph-specific operations
        • [ ] Comparison between graphs
          • Should probably have a way to determine if graphs have the same structure, and if so call them equal
        • [ ] Copying graphs
        • [ ] Including one graph in another
          • [ ] How do we do this without sharing the same graph object across instances?
          • [ ] How do we update the included graphs if the original changes?
          • [ ] Compilation
      • [ ] Develop a way for graph programs to modify themselves, i.e. graph macros
        • [ ] Figure out how this differs from just networks of nodes
        • [ ] Figure out the semantics of execution times and graph compilation
        • [ ] Figure out how to formalize the concept of modifiers on existing nodes/graphs/graph-segments
    • [-] Develop some proof-of-concepts for the usefulness of the node abstraction in normal CL code
      • [-] Reactive programming
        • [X] Prototype exists
        • [ ] Update after the node semantics are tied down
      • [-] Syntax for making node programs in CL code
        • [X] Prototype exists
        • [ ] Update after the node semantics are tied down
          • If we made markers in the call graph (e.g. with @ as a fundamental reader-macro for traversal to different points in the unevaluated computation graph) we could use a reader stack for linearly reading terms as nodes, and merge them into the graph when the node’s specifications are fulfilled
            • How would we determine nodes (and node properties) from names? We can’t know the argument count of a function without the function already being defined, but that requires processing the function definition form
              • Instead of having a reader stack, maybe rely on the execution passes for stack functionality instead? Each item is read based on reader macros (i.e. numbers, strings, symbols) and starts out with exactly 1 input (the environment from the preceding node) and exactly 1 output, unless we use @ references to link multiple nodes into/out-of a node. The order of the creation of the input/output links could determine the order of the corresponding input/output execution paths/agents (e.g. an if would have the first output-node created in the read pass be true and the second be false). Function application would work like in Haskell, where in theory every function has one input (plus the environment input, in our case) but multi-argument functions generate temporary functions to read the rest of the inputs (which could in theory be optimized away for better parallelization via compiler macros during execution).
                • What do multiple input paths mean???
                • Instead of having 1 input and 1 output, maybe put the inputs before the node itself, just add an additional first argument for the environment, and add a reader macro for collections of inputs with decent semantics for linking things from other places?
                  • Note: presumably we’re allowing renaming of positions, since the read process should be linear. So we could literally name arguments things like @1, @2, etc if we wanted, and then rename them as needed.
                  • Note: How would this work with macros like apply? Should function argument lists be allowed to request a certain number of input nodes (which won’t be given the output of this node as an input) going after them in a linear sequence, bringing back some of the above Haskell-like semantics?
                    • Function argument lists would then include both a before-list (which are all linked in parallel) and an after list (which are structurally positioned as a line after the current node (presumably in terms of semantics they’ll be indistinguishable from the before-list, evaluating in parallel after which we evaluate the node itself)). In cases of macroexpansion the inputs and outputs of the macro node (assuming none of them are at the same execution level as the macro node itself) would be atomic, and so considered to be evaluated.
                • Might still need a reader stack to implement things like backwards arrows (for convenience / syntax-sugar reasons)
            • How would we enclose sets of nodes in the read context?
              • Make syntax sugar for parentheses to become start/end nodes, and then during execution interpret these as enclosing atomic objects for the purposes of forms like if?
            • We’d want to track a jump stack
            • We’d want to define and track named positions
            • How would we determine which node should be pushed to the stack / named-position?
              • Last node added?
                • How does this interact with arguments?
                  • Do we just not add argument nodes in until they’re resolved in the reader stack as part of a call of some sort?
            • Would we store the reader stack along with the node position, and update them both on traversal?
              • Feels like this would interfere with linking different regions together…
            • How would we traverse the universe stack in this case?
              • This should be an in-code traversal, to be honest, as opposed to the reader traversal being described in the grandparent. Maybe do it through a fundamental function for moving the execution context that reaches it to a place specified by its input, similarly to how the grandparent defines a fundamental function for moving the reader context?
                • Said function would have to either have no outputs, ignore its outputs, or have some approach to an execution jump stack which allows returning values when you jump back to somewhere and also forgives cases where you don’t jump back (e.g. we don’t want to block the execution context on that particular node outputting something).
        • Resources:
      • [ ] Compiling program dataflow
        • Possible implementation: implement a simple from-scratch neural network and auto-parallelize and auto-optimize
  • [ ] Develop a framework for execution contexts
    • [ ] Tracking the current execution environment as a graph is traversed
      • [ ] Storing the execution environment
      • [ ] Representing computations that read/write to the execution environment
      • [ ] Efficiency
        • Resources:
          • [ ] Sources/inspiration/etc for embedding reactivity into an interpreter?: https://www.reddit.com/r/ProgrammingLanguages/comments/1b0mkdt/sourcesinspirationetc_for_embedding_reactivity/
            • Answers have some useful links;
              • Interactive FUnctional Programming seems particularly useful as a cross-platform optimization measure, even if I end up having to limit it to pure functions instead of the full code.
                • Should this be by compiler discretion, caching some values for common code paths and using those for adaptation (basically machine learning for performance)? Or should the user define where this optimization is applied?
                  • Among other considerations, the former gives implementors more flexibility
                  • Among other considerations, the latter allows the functionality to be added in the generalized language spec
  • [ ] A simple interprocedural register allocation algorithm and its effectiveness for LISP: https://dl.acm.org/doi/pdf/10.1145/59287.59289
    • [ ] Glicol - A graph-oriented language developed in Rust: https://webaudioconf.com/_data/papers/pdf/2021/2021_8.pdf
      • Glicol uses a clock to avoid recomputation if the value of something is already known at the current clock time (essentially meaning that none of the dependencies have changed since the last computation). A similar system could be used in Flow, since with both language environment and external environment being first-class objects we have the runtime managing all possible inputs to a node
  • [ ] Figuring out the semantics to denote “universes” in the graph
    • [ ] Syntax for this?
    • [ ] How will this actually work?
  • [ ] Inheritance between environments
    • [ ] Figure out limits on which universes can inherit from which others
      • E.g. based on position in the graph
    • [ ] Dealing with inherited values and their updates
    • [ ] Dealing with access syntax/semantics
    • [ ] Multiple inheritance
      • Most likely this will end up creating a DAG of environments
    • [ ] Representing the outside world (i.e. the host language and external environment) in the universe context
      • Likely will be the parent of the runtime’s other universes, with the Flow runtime creating a child universe for itself from which all other universes can inherit
      • [ ] Dealing with external events
        • Examples: filesystem, network, etc
      • [ ] Dealing with interop
  • [ ] Tracking interactions between environment and runtime code
    • [ ] Tracking which parts of the environment are affected by specific code
      • May need to work out the full type system for this…
      • This falls into tracking the purity of code (all interactions with the environment are read-only)
  • [ ] Moving execution between different places on the graph
    • How do I maintain the execution-context continuations in this case?
    • How do we get unwind-protect like functionality, for cases where
  • [ ] Parallel execution of graphs
    • This would need to be worked out along with the execution contexts, to avoid the latter obstructing the former and requiring hacky workarounds.
  • [ ] Develop an optional dependent typing framework for graphs
    • [ ] Study dynamic typing
      • [ ] Figure out the semantics of type representation and manipulation
      • [ ] Figure out the semantics for type tracking
        • [ ] Figure out how these types will behave and be organized
          • [ ] Figure out the basics of how this variant of types work
          • [ ] Figure out how types will interact with the other language features
      • [ ] Figure out if I can get away with just making other types warn unless required by the programmer, and make the entire system dynamic with warnings that way
    • [ ] Study HM typing
      • [ ] Figure out the semantics of type representation and manipulation
      • [ ] Figure out the semantics for type tracking
        • [ ] Figure out how these types will behave and be organized
          • [ ] Figure out the basics of how this variant of types work
            • [ ] Look at Haskell as a potential case-study
          • [ ] Figure out how types will interact with the other language features
      • [ ] Figure out how this typing can be done optionally
      • [ ] Figure out if we need this or can skip directly to fully-provable types
    • [ ] Study dependent typing and proof-based languages
    • [ ] Figure out the interfaces between the different type-systems
    • [ ] Figure out the user-API for this
      • [ ] How do users specify the type system to use?
      • [ ] How do users manipulate each type of type?
      • [ ] How do users make assertions for each type system, and how do those assertions influence compilation?
      • [ ] How can users extend the above type system for different implementaitons and/or extensions of the above concepts?
        • It would be best if I could find a way to make this something that can be added on later, to speed up the development of a language prototype.
    • [ ] Figure out the semantics for interactions bet
    • [ ] Implement the type system
      • [ ] Implement the framework for atoms, functions, macros, etc
      • [ ] Implement the framework for types
      • [ ] Implement the framework for universes
      • Resources:
  • [-] Develop the user interface for this language
    • Needs to be done in tandem with the other steps
    • [-] Syntax?
      • [ ] Look into Lisps
      • [ ] Look into ML
      • [-] Look into Ithkuil
        • https://www.ithkuil.net/
        • Words have a base to which modifiers and parameters are applied.
        • Both here and in general in natural language, sentences seem to be the points where you escape local contexts/parentheses and return to the grounding of the current universe
    • [ ] Representation of data/programs?
    • [ ] Restrictions on user behavior and their bypasses?
  • [ ] Implement sample libraries
    • Goals:
      • Demonstrate how to write pure Flow code
      • Demonstrate how to write external code and near-seamlessly interop with it
    • [ ] Interop with pure CL
    • [ ] Efficient arithmetic / linear algebra
    • [ ] Pure-Flow Machine learning framework
    • [ ] Library for interop with an external Python process through the REPL and filesystem
  • [ ] Narrow down a core kernel for better compatibility
    • [ ] Move as much functionality as possible into the language syntax itself
    • [ ] Operationalize the semantics for functionality that can’t be bootstrapped
    • [ ] Make a sample implementation in another language
  • Complete the lazy programming sub-framework?
    • This may not be worth it, compared to just starting anew
    • This may be better served in a separate library rather than as a part of the FlowCL implementation.
    • Improving the efficiency.

      Time-wise, the lazy-vector structure matches native code by applying functions to its internal vector cache, providing reasonable time efficiency.

      However, the space-efficiency of the laziness system is rather bad; for extremely large inputs, I’ve sometimes run into heap failures without explicit garbage-collection. We need to find ways to reduce the space taken by lazy structures

      • lazy-cons (which is also currently the backbone of lazy-vector) is a significant offender here; reimplementing it to use an internal list cache like lazy-vector, and perhaps using the cdr to store tails when lazy-consing multiple lazy sequences, might work to resolve this issue, and could also allow reusing the lazy-vector code which permits sharing of evaluation where possible.

        However, this has the issue that it forces the user to only use structures which cache their results. This is not necessarily a good thing, considering that they may be using lazy sequences to process data too large to actively keep in memory. Further thought is required here.

      • lazy-vector should benefit from explicitly using a function as its generator rather than shelling out to lazy-cons objects. lazy-vec may not even need to be reworked; if we can figure out another function template for lazy-vectors not derived from lazy-conses, we will likely not need to change the lazy-vector so much that lazy-vec can’t just be tweaked a little to store and retrieve the input lazy-cons in the right place.
      • NOTE: If lazy-vector stops being a subclass of lazy-cons, we will need to rework the class architecture. I think a good approach would be to make a “lazy-sequence” object defining the shared slots and specializing most of the generic methods, with only the lowest-level methods directly specializing on different lazy sequences

      We also might want to optimize their garbage collection (perhaps using weak-pointers for reuse of defunct objects that haven’t been garbage collected?).

    • Improving the feature-set

      Currently, the only features implemented are the core thunk class, and lazy-cons and lazy-vector as replacement sequences for lists and vectors.

      One direction for progress is incorporating streams into the library, allowing them to be wrapped as lazy-sequences. We may need to restrict this functionality considering that you can’t go backwards in a stream, but our lazy stream object should at least allow the composition of mapping, filtration, etc that lazy sequences currently permit.

      • Perhaps this could be done via documentation suggesting restrictions on usage to avoid making calls that are supposed to refer to the same items in the stream, and/or creating a macro for stream interaction and which inherently contains such restrictions.

        The macro approach seems particularly interesting. If the user can specify a set of sequences which are meant to receive the data from the stream, then the macro could internally create a cache object which only retains those aspects of the stream which are necessary given the current state of the various sequences. Including any non-lazy sequences would force full reading and caching of the stream, but lazy-sequences would wait on cache reads and stream-reads (to extend the cache) based on which elements were forced to evaluate.

        Once no objects were waiting on a value of the macro’s internal cache, that value could be discarded (possibly using the :offset value with periodic copy-seq commands, if using a lazy-vector?). This would allow a rolling track on only those values which are necessary, without the user needing to manually program such caching.

        However, this method requires either lazy sequences which discard data automatically (e.g. the current implementation of lazy-cons, if we make the old heads space-efficient enough to be feasible for production); or, preferably, a method to update the list of receiving sequences, so that the user can decide when to discard old data. Perhaps just allowing setf on the defined sequences would be enough for this.

        • Incidentally, having this feature in the core macro would make it much easier to implement a convenience macro on top of it, which tracks the size of the receiving sequences and automatically replaces them with copies (if the user is confident that only the current datapoint and/or a preset amount of history will be relevant).

      We also need to explicitly develop the design to optionally permit interaction without preservation of values (either directly in the code or through making it easy to discard cached information). This is necessary for processing of extremely large values (as mentioned in the musing on stream wrappers above), but seems at odds with the goal of maintaining performance (unless calculating the step function is extremely computation-heavy); further thought is required here.

    • Improving the integration.

      The lazy-programming framework is currently placed literally just in the flow-cl package. The first priority here is moving it to its own package and figuring out how the package architecture is going to work.

      After that, I need to figure out how users will interact with this system. Obviously one mode is to use the enclosed lazy sequences as drop-in sequence objects compatible with Common Lisp sequence functions (assuming your implementation is compatible with trivial-extensible-sequences, or you use the appropriate fallback code).

      Another mode would be to make specific use-cases for lazy-sequences (such as wrapping streams) and writing drop-in macros to implement that specific feature.

      Ideally, another mode will be added using a macro to implement a DSL for lazy programming. The result of code written in that DSL would be the output of the form, allowing for integration into normal CL code. However, this is a minor goal, and takes a backseat to working on other subsystems of the overall FlowCL framework.

Author

  • Swapneil Singh

Copyright

Copyright (c) 2023 Swapneil Singh

License

Licensed under the MIT License.