diff --git a/talos/Readme.rst b/talos/Readme.rst index 95c9d8e5b..dfe2b6d49 100644 --- a/talos/Readme.rst +++ b/talos/Readme.rst @@ -26,3 +26,10 @@ To run ``talos`` you need a Daedalus specification describing the parser. Talos cabal run talos -- -n 4 tests/T003.ddl The ``-n`` flag controls how many example inputs are generated. In this example, we generate four possible inputs that satisfy the parser. + +Developing Talos +---------------- + +See the [implementatino guide]_. + +.. _[implementatino guide]: implementation.md diff --git a/talos/implementation.md b/talos/implementation.md new file mode 100644 index 000000000..1c644c0bd --- /dev/null +++ b/talos/implementation.md @@ -0,0 +1,185 @@ + +Talos implementation +==================== + +The Talos tool synthesises valid documents from DDL grammars. The tool has the following phases: + +1. Parsing the input file and converting to Talos-flavoured Core; +2. Analysing the Core to decompose the grammar into slices; +3. Traversing the Core to synthesize a document, calling into slice model generators when the start of a slice is encountered; + +The implementatino is discussed below + +Text to Core +------------ + +The majority of the translation from the text file into Core is +implemented as part of Daedalus, leaving us with a [Core module](../daedalus-core/src/Daedalus/Core/Decl.hs). + +Talos [performs](src/Talos.hs) a number of standard and specific +Core to Core passes ([Talos.Passes]), namely + +* Fail propagation (making cases partial) +* Polymorphic type monomorphisation +* Constant folding +* Naming arguments to functions; +* Naming the argument to string matches (which is also the result of the match) + +These passes ensure that the analysis phase has a name for the +arguments to a function (as the argument to a function may be the +start of a slice, so it needs a name), and that we can safely ignore +string matches when slicing (as a string match always returns its +argument on success, so we just use the argument directly). + +Slicing +------- + +Briefly, the [Talos.Analysis.Exported][] module contains the data +structures used by the synthesis phase of Talos, using definitions +from [Talos.Analysis.Slice][] The below notes on the implementation of +slicing is not required to use the results. + +The slicing phase lives under [Talos.Analysis][]. The aim of the +slicing phase is to decompose the input grammar into a collection of +independent slices (defined in [Talos.Analysis.Slice][], along with +sliced expressions in [Talos.Analysis.SLExpr][]). A slice is, in +essence, a grammar where some nodes have been replaced by holes. +Slices are independent if each choice in the input grammar (i.e., byte +values and grammar alternatives) occurs in exactly one slice (or no +slice if the choice cannot impact the feasibility of synthesis). +Slices are also closed, in the sense that all (parts of) variables +used in the grammar are defined in the grammar. + +The slicing process has two phases, firstly a fixpoint is constructed; +and secondly, the fixpoint is _exported_ [Talos.Analysis.Exported][]. +The export process simplifies the representation of slices from that +used in the fixpoint construction, including replacing holes in +expressions by default values (which should never actually be +examined), and forgetting which abstract environment (see below) was +used (to avoid carrying around the class in the remainder of Talos). + +The slicing algorithm is abstract in the way in which the grammar is +analyzed; the abstraction is in +[Talos.Analysis.AbsEnv][]. Briefly, an +_abstract environment_ relates variables with _abstract predicates_, +and has a precondition operation which calculates the +abstract environment required for an expression to satisfy an abstract predicate. + +The two main abstract environments are the whole-variable environment +([Talos.Analysis.VarAbsEnv][]) +and the field-sensitive abstract environment +([Talos.Analysis.FieldAbsEnv][]). +The whole-variable abstract predicate carries no information (we want +the whole variable), and the abstract environment is equivalent to a +set of variables --- the abstract precondition operation then just +calculates the set of free variables. + +The field-sensitive abstract predicate projects a subset of the fields +in a struct, including nested fields for fields with structure type. +The abstract precondition operation then calculates which fields of +the free variables are used in an expression. + + + + + +The slicing algorithm uses the following auxiliary modules + * [Talos.Analysis.Eqv][]: notions of equivalence used to determine if a fixpoint has been reached; + * [Talos.Analysis.Merge][]: merging is used to combine slices when they are discovered to be non-independent (i.e., rely on the same choice); + * [Talos.Analysis.Domain][]: the domain over which the slicing algorithm operates, containing the in-progress slices for the current grammar function; + * [Talos.Analysis.Fixpoint][]: a naive algorithm for constructing the fixpoint for the slicing algorithm + * [Talos.Analysis.Monad][]: the monad containing the state for slicing, mostly this builds on Talos.Analysis.Fixpoint. + +Non-dependent Document Generation +--------------------------------- + +The outer synthesis loop is in [Talos.Synthesis][]. + +Slice Model Generation +---------------------- + +A model for a slice is a path ([Talos.SymExec.Path][]), which can be +thought of as a parse tree with holes (corresponding to the holes in +the slice). The path tells the outer synthesis loop about decisions +that were made during slice synthesis (i.e., choices and bytes which +occur in the slice). + +The main modules are: + * [Talos.Strategy][]: Strategy execution and utils (timing etc.) + * [Talos.Strategy.Monad][]: Core monad for strategies (random numners, Core definitions, etc) + * [Talos.Strategy.PathSymbolic][]: Current main symbolic strategy (pathsymb) + * [Talos.Strategy.PathSymbolicM][]: Monad and aux types for the pathsymb strat + * [Talos.Strategy.BTRand][]: Random selection with backtracking strategy. + * [Talos.Strategy.Symbolic][]: Experimental symbolic strategy (probably should be deprecated) + * [Talos.Strategy.SymbolicM][]: Monad for the symbolic strat + +Modules for pathsymb: + * [Talos.Strategy.PathCondition][]: Path condition guards for the + pathsymb strat, corresponding to sets of choices in the slice + * [Talos.Strategy.MuxValue][]: Symbolic values used in pathsymb. + The basic idea is that values in the symbolic execution are sets + of values, each value being guarded by a path condition. + +with helper modules: + * [Talos.Strategy.MemoSearch][]: A memoising search procedure (used by symbolic strat) + * [Talos.Strategy.OptParser][]: An option parser for strat arguments + * [Talos.Strategy.DFST][]: A DFS monad transformer + * [Talos.Strategy.SearchTree][]: An explicit representation of search trees + +Symbolic Execution +------------------ + +These modules are involved in talking to the solver: + * [Talos.SymExec.SolverT][]: Solver support with support for context management + * [Talos.SymExec.StdLib][]: Support SMTLIB definitions + * [Talos.SymExec.ModelParser][]: Reading back models from SMTLIB + * [Talos.SymExec.Expr][]: Symbolic execution of expressions + * [Talos.SymExec.Funs][]: Define functions in SMT from Core functions + * [Talos.SymExec.SemiValue][]: Values with non-value leaves + * [Talos.SymExec.SemiExpr][]: SemiValue with SExpr leaves + * [Talos.SymExec.Type][]: Converting type to SMT types + + + +[Talos.SymExec.SolverT]: src/Talos/SymExec/SolverT.hs +[Talos.SymExec.StdLib]: src/Talos/SymExec/StdLib.hs +[Talos.SymExec.ModelParser]: src/Talos/SymExec/ModelParser.hs +[Talos.SymExec.Expr]: src/Talos/SymExec/Expr.hs +[Talos.SymExec.Funs]: src/Talos/SymExec/Funs.hs +[Talos.SymExec.SemiExpr]: src/Talos/SymExec/SemiExpr.hs +[Talos.SymExec.SemiValue]: src/Talos/SymExec/SemiValue.hs +[Talos.SymExec.Type]: src/Talos/SymExec/Type.hs +[Talos.SymExec.Path]: src/Talos/SymExec/Path.hs +[Talos.Analysis]: src/Talos/Analysis.hs +[Talos.Analysis.AbsEnv]: src/Talos/Analysis/AbsEnv.hs +[Talos.Analysis.VarAbsEnv]: src/Talos/Analysis/VarAbsEnv.hs +[Talos.Analysis.FieldAbsEnv]: src/Talos/Analysis/FieldAbsEnv.hs +[Talos.Analysis.Eqv]: src/Talos/Analysis/Eqv.hs +[Talos.Analysis.Merge]: src/Talos/Analysis/Merge.hs +[Talos.Analysis.Domain]: src/Talos/Analysis/Domain.hs +[Talos.Analysis.Fixpoint]: src/Talos/Analysis/Fixpoint.hs +[Talos.Analysis.Monad]: src/Talos/Analysis/Monad.hs +[Talos.Analysis.Slice]: src/Talos/Analysis/Slice.hs +[Talos.Analysis.SLExpr]: src/Talos/Analysis/SLExpr.hs +[Talos.Analysis.Exported]: src/Talos/Analysis/Exported.hs +[Talos.Strategy]: src/Talos/Strategy.hs +[Talos.Strategy.Monad]: src/Talos/Strategy/Monad.hs +[Talos.Strategy.DFST]: src/Talos/Strategy/DFST.hs +[Talos.Strategy.SearchTree]: src/Talos/Strategy/SearchTree.hs +[Talos.Strategy.SearchT]: src/Talos/Strategy/SearchT.hs +[Talos.Strategy.SymbolicM]: src/Talos/Strategy/SymbolicM.hs +[Talos.Strategy.PathSymbolicM]: src/Talos/Strategy/PathSymbolicM.hs +[Talos.Strategy.MuxValue]: src/Talos/Strategy/MuxValue.hs +[Talos.Strategy.BTRand]: src/Talos/Strategy/BTRand.hs +[Talos.Strategy.Symbolic]: src/Talos/Strategy/Symbolic.hs +[Talos.Strategy.PathSymbolic]: src/Talos/Strategy/PathSymbolic.hs +[Talos.Strategy.PathCondition]: src/Talos/Strategy/PathCondition.hs +[Talos.Strategy.MemoSearch]: src/Talos/Strategy/MemoSearch.hs +[Talos.Strategy.OptParser]: src/Talos/Strategy/OptParser.hs +[Talos.Synthesis]: src/Talos/Synthesis.hs +[Talos.Passes]: src/Talos/Passes.hs + + + + + diff --git a/talos/notes/2023-01-28-code-walkthrough.org b/talos/notes/2023-01-28-code-walkthrough.org new file mode 100644 index 000000000..05439c66e --- /dev/null +++ b/talos/notes/2023-01-28-code-walkthrough.org @@ -0,0 +1,163 @@ + +* Core walkthrough + +** Operation + +1. Run Daedalus +2. Slice +3. Outer loop, calling ... +4. ... strategies + +** Concepts + + - Entangled :: wrt variables: when the choice of a variable impacts + the valid choices for another variable + + +** [[file:~/galois/safedocs/daedalus/daedalus-core/src/Daedalus/Core/Grammar.hs::data Grammar =][Grammar]] and [[file:~/galois/safedocs/daedalus/daedalus-core/src/Daedalus/Core/Expr.hs::data Expr =][Expr]] + + Core data-structures targeted by Daedalus front-end, and used by +Talos. We assume (it should be true) that variable binders are +unique, if only the GUID part of the Name. This means we can use +variables as node identifiers in the AST (e.g. have a Map Name t) + +** Synthesis outer loop + + The synthesis is syntax-directed: we follow the syntax of the +grammar to synthesise bytes until we hit a variable which starts a +slice --- part of the synthesis state is a map from variable to slice. + + - [[file:~/galois/safedocs/daedalus/talos/src/Talos/Synthesis.hs::synthesiseG SelectedHole][synthesisG random mode]] + + - [[file:~/galois/safedocs/daedalus/talos/src/Talos/SymExec/Path.hs::data SelectedPathF ch ca a =][SelectdPath]] + + This type is how we record decisions that were made when solving + slices (a SelectedPath is a model for a slice). Essentially + tells which decisions to make. It is strongly (structurally) + related to grammars (fills in bytes and choices). + + When a slice is solved, the result (a SelectedPath) is merged + into any path we have from previous slice solving. The path is + inspected in synthesiseG, and is initially SelectedHole, o.e., we + have made no choices/constraints on future choices. + + - [[file:~/galois/safedocs/daedalus/talos/src/Talos/Synthesis.hs::synthesiseG ::][Worker for Grammar (synthesiseG)]] + +** Slices + + A slice is a fragment of a grammer, by 'fragment' I mean a grammar + where some nodes in the ast are replaced by 'holes'. A slice + occurs when a grammar is partial: all bits of the grammar which + impact the partiality are in the slice, the idea being that the + slice contains all the information needed to synthesise the (bits + of) the variables bound in the slice. + +*** Example + + As an example, consider + + block + a = A + x = P + y = Q + z = R + a < x is true + y > 0 is true + + becomes: + + - 'slice starting at a' + + block + a = A + x = P + y = HOLE + z = HOLE + a < x is true + HOLE + + - 'slice starting at y' + + block + a = HOLE + x = HOLE + y = Q + z = HOLE + HOLE + y > 0 is true + + with 'z' no being used in a partial way. + +*** Notes + + We won't talk about how we generate a slice, it is enough to know + that slices are + + - closed :: contain all information needed to synthesise the bytes + in the slice + - independent :: slices can be synthesised without worrying about + other slices + - composable :: two slices can be merged into a new slice; the sum of + all slices is the dependent part of the grammar (the remnants + always succeeed) + + Note tha SelectedPaths are also composable; merging paths should + result in a model for the merged slices. Because slices are + independent, a choice occurs in at most 1 model, so merging of + models (and slices) is trivial, and is just hole-plugging. + + Slices can be at the field level, so a struct-typed variable may + have mutiple slices, one for each field that is a dependency. One + further wrinkle is that slicing depends on how things are used, and + so there is an additional FInstId which is used to tell the + synthesis which function summary to use. + + - [[file:~/galois/safedocs/daedalus/talos/src/Talos/Analysis/Slice.hs::data Slice' cn sle =][Slice definition]] + + This is what the strategies have to solve, it is parametric in + how calls are represented, and how expressions are represented. + + - [[file:~/galois/safedocs/daedalus/talos/src/Talos/Analysis/Exported.hs::type ExpSlice = Slice' ExpCallNode Expr][Instantiation of Slice']] + + This is how slices are used outside of the slicing algorithm + (hence 'exported' --- all the slicing-specific annotations etc + get removed). + +** Strategies + + A strategy is a way of solving a slice: it is a function from ExpSlices to SelectedPaths. + + - [[file:~/galois/safedocs/daedalus/talos/src/Talos/Strategy.hs::findModel ::][Running Strategies]] + + There is a bit of infrastructure for running strategies, including + timing etc. Strategies can generate many models (it is more + efficient to ask for multiple models from the solver) so we run the + strategy once and consult the cache when we next see that slice. + + The simplest strategy is the random selection with backtracking + strategy, which is made a bit mode complicated by abstracting over + the method of backtracking. + + - [[file:~/galois/safedocs/daedalus/talos/src/Talos/Strategy/BTRand.hs::stratSlice ::][Random strat]] + + We mainly use the PathSymbolic strategy, which constructs a big + solver model along with a parametric SelectedPath, by which I mean + a SelectedPath where the decisions are solver variables. The + solver model is then used to instantiate the parametrict + SelectedPath. + + - [[file:~/galois/safedocs/daedalus/talos/src/Talos/Strategy/PathSymbolic.hs::pathSymbolicStrat :: Strategy][Path symbolic strategy]] + + This strategy is made (much mode) complicated as we avoid having + non-product types in the solver. We can discuss this if there is + time/interest. + + + + + + + + + +