From 5732b219bceb4ea71d9aa7a1cd3a64f0ffc43480 Mon Sep 17 00:00:00 2001 From: Simon Winwood Date: Fri, 20 Jan 2023 00:06:32 +1100 Subject: [PATCH 1/4] talos: Initial implementation guide --- talos/implementation.md | 154 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 154 insertions(+) create mode 100644 talos/implementation.md diff --git a/talos/implementation.md b/talos/implementation.md new file mode 100644 index 000000000..8a1c58a1f --- /dev/null +++ b/talos/implementation.md @@ -0,0 +1,154 @@ + +Talos implementation +==================== + +The Talos tool synthesises valid documents from DDL grammars. The tool has the following phases: + +1. Parsing the input file and converting to Talos-flavoured Core; +2. Analysing the Core to decompose the grammar into slices; +3. Traversing the Core to synthesize a document, calling into slice model generators when the start of a slice is encountered; + +The implementatino is discussed below + +Text to Core +------------ + +The majority of the translation from the text file into Core is +implemented as part of Daedalus, leaving us with a [Core module](../daedalus-core/src/Daedalus/Core/Decl.hs). + +Talos ([performs](src/Talos.hs)) a number of standard and specific +Core to Core passes ([Talos.Passes](src/Talos/Passes.hs)), namely + +* Fail propagation (making cases partial) +* Polymorphic type monomorphisation +* Constant folding +* Naming arguments to functions; +* Naming the argument to string matches (which is also the result of the match) + +These passes ensure that the analysis phase has a name for the +arguments to a function (as the argument to a function may be the +start of a slice, so it needs a name), and that we can safely ignore +string matches when slicing (as a string match always returns its +argument on success, so we just use the argument directly). + +Slicing +------- + +Briefly, the +([Talos.Analysis.Exported](src/Talos/Analysis/Exported.hs)) module +contains the data structures used by the synthesis phase of Talos, +using definitions from +([Talos.Analysis.Slice](src/Talos/Analysis/Slice.hs)). The below +notes on the implementation of slicing is not required to use the +results. + +The slicing phase lives under +([Talos.Analysis](src/Talos/Analysis.hs)). The aim of the slicing +phase is to decompose the input grammar into a collection of +independent slices (defined in +([Talos.Analysis.Slice](src/Talos/Analysis/Slice.hs)), along with +sliced expressions in +([Talos.Analysis.SLExpr](src/Talos/Analysis/SLExpr.hs))). A slice is, +in essence, a grammar where some nodes have been replaced by holes. +Slices are independent if each choice in the input grammar (i.e., byte +values and grammar alternatives) occurs in exactly one slice (or no +slice if the choice cannot impact the feasibility of synthesis). +Slices are also closed, in the sense that all (parts of) variables +used in the grammar are defined in the grammar. + +The slicing process has two phases, firstly a fixpoint is constructed; +and secondly, the fixpoint is _exported_ +([Talos.Analysis.Exported](src/Talos/Analysis/Exported.hs)). The +export process simplifies the representation of slices from that used +in the fixpoint construction, including replacing holes in expressions +by default values (which should never actually be examined), and +forgetting which abstract environment (see below) was used (to avoid carrying +around the class in the remainder of Talos). + +The slicing algorithm is abstract in the way in which the grammar is +analyzed; the abstraction is in +([Talos.Analysis.AbsEnv](src/Talos/Analysis/AbsEnv.hs)). Briefly, an +_abstract environment_ relates variables with _abstract predicates_, +and has a precondition operation which calculates the +abstract environment required for an expression to satisfy an abstract predicate. + +The two main abstract environments are the whole-variable environment +(([Talos.Analysis.VarAbsEnv](src/Talos/Analysis/VarAbsEnv.hs))) +and the field-sensitive abstract environment +(([Talos.Analysis.FieldAbsEnv](src/Talos/Analysis/FieldAbsEnv.hs))). +The whole-variable abstract predicate carries no information (we want the whole +variable), and the abstract environment is equivalent to a set of +variables --- the abstract precondition operation then just calculates +the set of free variables. + +The field-sensitive abstract predicate projects a subset of the fields +in a struct, including nested fields for fields with structure type. +The abstract precondition operation then calculates which fields of +the free variables are used in an expression. + + + + + +The slicing algorithm uses the following auxiliary modules + * ([Talos.Analysis.Eqv](src/Talos/Analysis/Eqv.hs)): notions of equivalence used to determine if a fixpoint has been reached; + * ([Talos.Analysis.Merge](src/Talos/Analysis/Merge.hs)): merging is used to combine slices when they are discovered to be non-independent (i.e., rely on the same choice); + * ([Talos.Analysis.Domain](src/Talos/Analysis/Domain.hs)): the domain over which the slicing algorithm operates, containing the in-progress slices for the current grammar function; + * ([Talos.Analysis.Fixpoint](src/Talos/Analysis/Fixpoint.hs)): a naive algorithm for constructing the fixpoint for the slicing algorithm + * ([Talos.Analysis.Monad](src/Talos/Analysis/Monad.hs)): the monad containing the state for slicing, mostly this builds on Talos.Analysis.Fixpoint. + +Non-dependent Document Generation +--------------------------------- + +The outer synthesis loop is in [Talos.Synthesis] + + +Slice Model Generation +---------------------- + +Symbolic Execution +------------------ + + +[Talos.SymExec.SolverT]: src/Talos/SymExec/SolverT.hs +[Talos.SymExec.StdLib]: src/Talos/SymExec/StdLib.hs +[Talos.SymExec.ModelParser]: src/Talos/SymExec/ModelParser.hs +[Talos.SymExec.Expr]: src/Talos/SymExec/Expr.hs +[Talos.SymExec.Funs]: src/Talos/SymExec/Funs.hs +[Talos.SymExec.SemiExpr]: src/Talos/SymExec/SemiExpr.hs +[Talos.SymExec.SemiValue]: src/Talos/SymExec/SemiValue.hs +[Talos.SymExec.Type]: src/Talos/SymExec/Type.hs +[Talos.SymExec.Path]: src/Talos/SymExec/Path.hs +[Talos.Analysis]: src/Talos/Analysis.hs +[Talos.Analysis.AbsEnv]: src/Talos/Analysis/AbsEnv.hs +[Talos.Analysis.VarAbsEnv]: src/Talos/Analysis/VarAbsEnv.hs +[Talos.Analysis.FieldAbsEnv]: src/Talos/Analysis/FieldAbsEnv.hs +[Talos.Analysis.Eqv]: src/Talos/Analysis/Eqv.hs +[Talos.Analysis.Merge]: src/Talos/Analysis/Merge.hs +[Talos.Analysis.Domain]: src/Talos/Analysis/Domain.hs +[Talos.Analysis.Fixpoint]: src/Talos/Analysis/Fixpoint.hs +[Talos.Analysis.Monad]: src/Talos/Analysis/Monad.hs +[Talos.Analysis.Slice]: src/Talos/Analysis/Slice.hs +[Talos.Analysis.SLExpr]: src/Talos/Analysis/SLExpr.hs +[Talos.Analysis.Exported]: src/Talos/Analysis/Exported.hs +[Talos.Strategy]: src/Talos/Strategy.hs +[Talos.Strategy.Monad]: src/Talos/Strategy/Monad.hs +[Talos.Strategy.DFST]: src/Talos/Strategy/DFST.hs +[Talos.Strategy.SearchTree]: src/Talos/Strategy/SearchTree.hs +[Talos.Strategy.SearchT]: src/Talos/Strategy/SearchT.hs +[Talos.Strategy.SymbolicM]: src/Talos/Strategy/SymbolicM.hs +[Talos.Strategy.PathSymbolicM]: src/Talos/Strategy/PathSymbolicM.hs +[Talos.Strategy.MuxValue]: src/Talos/Strategy/MuxValue.hs +[Talos.Strategy.BTRand]: src/Talos/Strategy/BTRand.hs +[Talos.Strategy.Symbolic]: src/Talos/Strategy/Symbolic.hs +[Talos.Strategy.PathSymbolic]: src/Talos/Strategy/PathSymbolic.hs +[Talos.Strategy.PathCondition]: src/Talos/Strategy/PathCondition.hs +[Talos.Strategy.MemoSearch]: src/Talos/Strategy/MemoSearch.hs +[Talos.Strategy.OptParser]: src/Talos/Strategy/OptParser.hs +[Talos.Synthesis]: src/Talos/Synthesis.hs +[Talos.Passes]: src/Talos/Passes.hs + + + + + From cd806ef3cf117f715b7844f5d12e5b9be79ee90f Mon Sep 17 00:00:00 2001 From: Simon Winwood Date: Fri, 20 Jan 2023 00:23:40 +1100 Subject: [PATCH 2/4] talos: Fix up links in impl doc --- talos/Readme.rst | 7 +++++ talos/implementation.md | 70 ++++++++++++++++++----------------------- 2 files changed, 38 insertions(+), 39 deletions(-) diff --git a/talos/Readme.rst b/talos/Readme.rst index 95c9d8e5b..dfe2b6d49 100644 --- a/talos/Readme.rst +++ b/talos/Readme.rst @@ -26,3 +26,10 @@ To run ``talos`` you need a Daedalus specification describing the parser. Talos cabal run talos -- -n 4 tests/T003.ddl The ``-n`` flag controls how many example inputs are generated. In this example, we generate four possible inputs that satisfy the parser. + +Developing Talos +---------------- + +See the [implementatino guide]_. + +.. _[implementatino guide]: implementation.md diff --git a/talos/implementation.md b/talos/implementation.md index 8a1c58a1f..7fd96292f 100644 --- a/talos/implementation.md +++ b/talos/implementation.md @@ -16,8 +16,8 @@ Text to Core The majority of the translation from the text file into Core is implemented as part of Daedalus, leaving us with a [Core module](../daedalus-core/src/Daedalus/Core/Decl.hs). -Talos ([performs](src/Talos.hs)) a number of standard and specific -Core to Core passes ([Talos.Passes](src/Talos/Passes.hs)), namely +Talos [performs](src/Talos.hs) a number of standard and specific +Core to Core passes ([Talos.Passes]), namely * Fail propagation (making cases partial) * Polymorphic type monomorphisation @@ -34,22 +34,16 @@ argument on success, so we just use the argument directly). Slicing ------- -Briefly, the -([Talos.Analysis.Exported](src/Talos/Analysis/Exported.hs)) module -contains the data structures used by the synthesis phase of Talos, -using definitions from -([Talos.Analysis.Slice](src/Talos/Analysis/Slice.hs)). The below -notes on the implementation of slicing is not required to use the -results. - -The slicing phase lives under -([Talos.Analysis](src/Talos/Analysis.hs)). The aim of the slicing -phase is to decompose the input grammar into a collection of -independent slices (defined in -([Talos.Analysis.Slice](src/Talos/Analysis/Slice.hs)), along with -sliced expressions in -([Talos.Analysis.SLExpr](src/Talos/Analysis/SLExpr.hs))). A slice is, -in essence, a grammar where some nodes have been replaced by holes. +Briefly, the [Talos.Analysis.Exported][] module contains the data +structures used by the synthesis phase of Talos, using definitions +from [Talos.Analysis.Slice][] The below notes on the implementation of +slicing is not required to use the results. + +The slicing phase lives under [Talos.Analysis][]. The aim of the +slicing phase is to decompose the input grammar into a collection of +independent slices (defined in [Talos.Analysis.Slice][], along with +sliced expressions in [Talos.Analysis.SLExpr][]). A slice is, in +essence, a grammar where some nodes have been replaced by holes. Slices are independent if each choice in the input grammar (i.e., byte values and grammar alternatives) occurs in exactly one slice (or no slice if the choice cannot impact the feasibility of synthesis). @@ -57,29 +51,28 @@ Slices are also closed, in the sense that all (parts of) variables used in the grammar are defined in the grammar. The slicing process has two phases, firstly a fixpoint is constructed; -and secondly, the fixpoint is _exported_ -([Talos.Analysis.Exported](src/Talos/Analysis/Exported.hs)). The -export process simplifies the representation of slices from that used -in the fixpoint construction, including replacing holes in expressions -by default values (which should never actually be examined), and -forgetting which abstract environment (see below) was used (to avoid carrying -around the class in the remainder of Talos). +and secondly, the fixpoint is _exported_ [Talos.Analysis.Exported][]. +The export process simplifies the representation of slices from that +used in the fixpoint construction, including replacing holes in +expressions by default values (which should never actually be +examined), and forgetting which abstract environment (see below) was +used (to avoid carrying around the class in the remainder of Talos). The slicing algorithm is abstract in the way in which the grammar is analyzed; the abstraction is in -([Talos.Analysis.AbsEnv](src/Talos/Analysis/AbsEnv.hs)). Briefly, an +[Talos.Analysis.AbsEnv][]. Briefly, an _abstract environment_ relates variables with _abstract predicates_, and has a precondition operation which calculates the abstract environment required for an expression to satisfy an abstract predicate. The two main abstract environments are the whole-variable environment -(([Talos.Analysis.VarAbsEnv](src/Talos/Analysis/VarAbsEnv.hs))) +([Talos.Analysis.VarAbsEnv][]) and the field-sensitive abstract environment -(([Talos.Analysis.FieldAbsEnv](src/Talos/Analysis/FieldAbsEnv.hs))). -The whole-variable abstract predicate carries no information (we want the whole -variable), and the abstract environment is equivalent to a set of -variables --- the abstract precondition operation then just calculates -the set of free variables. +([Talos.Analysis.FieldAbsEnv][]). +The whole-variable abstract predicate carries no information (we want +the whole variable), and the abstract environment is equivalent to a +set of variables --- the abstract precondition operation then just +calculates the set of free variables. The field-sensitive abstract predicate projects a subset of the fields in a struct, including nested fields for fields with structure type. @@ -91,16 +84,16 @@ the free variables are used in an expression. The slicing algorithm uses the following auxiliary modules - * ([Talos.Analysis.Eqv](src/Talos/Analysis/Eqv.hs)): notions of equivalence used to determine if a fixpoint has been reached; - * ([Talos.Analysis.Merge](src/Talos/Analysis/Merge.hs)): merging is used to combine slices when they are discovered to be non-independent (i.e., rely on the same choice); - * ([Talos.Analysis.Domain](src/Talos/Analysis/Domain.hs)): the domain over which the slicing algorithm operates, containing the in-progress slices for the current grammar function; - * ([Talos.Analysis.Fixpoint](src/Talos/Analysis/Fixpoint.hs)): a naive algorithm for constructing the fixpoint for the slicing algorithm - * ([Talos.Analysis.Monad](src/Talos/Analysis/Monad.hs)): the monad containing the state for slicing, mostly this builds on Talos.Analysis.Fixpoint. + * [Talos.Analysis.Eqv][]: notions of equivalence used to determine if a fixpoint has been reached; + * [Talos.Analysis.Merge][]: merging is used to combine slices when they are discovered to be non-independent (i.e., rely on the same choice); + * [Talos.Analysis.Domain][]: the domain over which the slicing algorithm operates, containing the in-progress slices for the current grammar function; + * [Talos.Analysis.Fixpoint][]: a naive algorithm for constructing the fixpoint for the slicing algorithm + * [Talos.Analysis.Monad][]: the monad containing the state for slicing, mostly this builds on Talos.Analysis.Fixpoint. Non-dependent Document Generation --------------------------------- -The outer synthesis loop is in [Talos.Synthesis] +The outer synthesis loop is in [Talos.Synthesis][]. Slice Model Generation @@ -108,7 +101,6 @@ Slice Model Generation Symbolic Execution ------------------ - [Talos.SymExec.SolverT]: src/Talos/SymExec/SolverT.hs [Talos.SymExec.StdLib]: src/Talos/SymExec/StdLib.hs From aca04ac510ef3361eaa7d75b2ec01cd927ac5b5c Mon Sep 17 00:00:00 2001 From: Simon Winwood Date: Fri, 20 Jan 2023 01:02:42 +1100 Subject: [PATCH 3/4] talos: fleshing out implementation doc --- talos/implementation.md | 41 ++++++++++++++++++++++++++++++++++++++++- 1 file changed, 40 insertions(+), 1 deletion(-) diff --git a/talos/implementation.md b/talos/implementation.md index 7fd96292f..1c644c0bd 100644 --- a/talos/implementation.md +++ b/talos/implementation.md @@ -95,13 +95,52 @@ Non-dependent Document Generation The outer synthesis loop is in [Talos.Synthesis][]. - Slice Model Generation ---------------------- +A model for a slice is a path ([Talos.SymExec.Path][]), which can be +thought of as a parse tree with holes (corresponding to the holes in +the slice). The path tells the outer synthesis loop about decisions +that were made during slice synthesis (i.e., choices and bytes which +occur in the slice). + +The main modules are: + * [Talos.Strategy][]: Strategy execution and utils (timing etc.) + * [Talos.Strategy.Monad][]: Core monad for strategies (random numners, Core definitions, etc) + * [Talos.Strategy.PathSymbolic][]: Current main symbolic strategy (pathsymb) + * [Talos.Strategy.PathSymbolicM][]: Monad and aux types for the pathsymb strat + * [Talos.Strategy.BTRand][]: Random selection with backtracking strategy. + * [Talos.Strategy.Symbolic][]: Experimental symbolic strategy (probably should be deprecated) + * [Talos.Strategy.SymbolicM][]: Monad for the symbolic strat + +Modules for pathsymb: + * [Talos.Strategy.PathCondition][]: Path condition guards for the + pathsymb strat, corresponding to sets of choices in the slice + * [Talos.Strategy.MuxValue][]: Symbolic values used in pathsymb. + The basic idea is that values in the symbolic execution are sets + of values, each value being guarded by a path condition. + +with helper modules: + * [Talos.Strategy.MemoSearch][]: A memoising search procedure (used by symbolic strat) + * [Talos.Strategy.OptParser][]: An option parser for strat arguments + * [Talos.Strategy.DFST][]: A DFS monad transformer + * [Talos.Strategy.SearchTree][]: An explicit representation of search trees + Symbolic Execution ------------------ +These modules are involved in talking to the solver: + * [Talos.SymExec.SolverT][]: Solver support with support for context management + * [Talos.SymExec.StdLib][]: Support SMTLIB definitions + * [Talos.SymExec.ModelParser][]: Reading back models from SMTLIB + * [Talos.SymExec.Expr][]: Symbolic execution of expressions + * [Talos.SymExec.Funs][]: Define functions in SMT from Core functions + * [Talos.SymExec.SemiValue][]: Values with non-value leaves + * [Talos.SymExec.SemiExpr][]: SemiValue with SExpr leaves + * [Talos.SymExec.Type][]: Converting type to SMT types + + + [Talos.SymExec.SolverT]: src/Talos/SymExec/SolverT.hs [Talos.SymExec.StdLib]: src/Talos/SymExec/StdLib.hs [Talos.SymExec.ModelParser]: src/Talos/SymExec/ModelParser.hs From 2fc50793775de7f5c2aec93bf4c7ee213c74c43c Mon Sep 17 00:00:00 2001 From: Simon Winwood Date: Mon, 30 Jan 2023 21:45:59 +1100 Subject: [PATCH 4/4] talos: added walkthrough notes --- talos/notes/2023-01-28-code-walkthrough.org | 163 ++++++++++++++++++++ 1 file changed, 163 insertions(+) create mode 100644 talos/notes/2023-01-28-code-walkthrough.org diff --git a/talos/notes/2023-01-28-code-walkthrough.org b/talos/notes/2023-01-28-code-walkthrough.org new file mode 100644 index 000000000..05439c66e --- /dev/null +++ b/talos/notes/2023-01-28-code-walkthrough.org @@ -0,0 +1,163 @@ + +* Core walkthrough + +** Operation + +1. Run Daedalus +2. Slice +3. Outer loop, calling ... +4. ... strategies + +** Concepts + + - Entangled :: wrt variables: when the choice of a variable impacts + the valid choices for another variable + + +** [[file:~/galois/safedocs/daedalus/daedalus-core/src/Daedalus/Core/Grammar.hs::data Grammar =][Grammar]] and [[file:~/galois/safedocs/daedalus/daedalus-core/src/Daedalus/Core/Expr.hs::data Expr =][Expr]] + + Core data-structures targeted by Daedalus front-end, and used by +Talos. We assume (it should be true) that variable binders are +unique, if only the GUID part of the Name. This means we can use +variables as node identifiers in the AST (e.g. have a Map Name t) + +** Synthesis outer loop + + The synthesis is syntax-directed: we follow the syntax of the +grammar to synthesise bytes until we hit a variable which starts a +slice --- part of the synthesis state is a map from variable to slice. + + - [[file:~/galois/safedocs/daedalus/talos/src/Talos/Synthesis.hs::synthesiseG SelectedHole][synthesisG random mode]] + + - [[file:~/galois/safedocs/daedalus/talos/src/Talos/SymExec/Path.hs::data SelectedPathF ch ca a =][SelectdPath]] + + This type is how we record decisions that were made when solving + slices (a SelectedPath is a model for a slice). Essentially + tells which decisions to make. It is strongly (structurally) + related to grammars (fills in bytes and choices). + + When a slice is solved, the result (a SelectedPath) is merged + into any path we have from previous slice solving. The path is + inspected in synthesiseG, and is initially SelectedHole, o.e., we + have made no choices/constraints on future choices. + + - [[file:~/galois/safedocs/daedalus/talos/src/Talos/Synthesis.hs::synthesiseG ::][Worker for Grammar (synthesiseG)]] + +** Slices + + A slice is a fragment of a grammer, by 'fragment' I mean a grammar + where some nodes in the ast are replaced by 'holes'. A slice + occurs when a grammar is partial: all bits of the grammar which + impact the partiality are in the slice, the idea being that the + slice contains all the information needed to synthesise the (bits + of) the variables bound in the slice. + +*** Example + + As an example, consider + + block + a = A + x = P + y = Q + z = R + a < x is true + y > 0 is true + + becomes: + + - 'slice starting at a' + + block + a = A + x = P + y = HOLE + z = HOLE + a < x is true + HOLE + + - 'slice starting at y' + + block + a = HOLE + x = HOLE + y = Q + z = HOLE + HOLE + y > 0 is true + + with 'z' no being used in a partial way. + +*** Notes + + We won't talk about how we generate a slice, it is enough to know + that slices are + + - closed :: contain all information needed to synthesise the bytes + in the slice + - independent :: slices can be synthesised without worrying about + other slices + - composable :: two slices can be merged into a new slice; the sum of + all slices is the dependent part of the grammar (the remnants + always succeeed) + + Note tha SelectedPaths are also composable; merging paths should + result in a model for the merged slices. Because slices are + independent, a choice occurs in at most 1 model, so merging of + models (and slices) is trivial, and is just hole-plugging. + + Slices can be at the field level, so a struct-typed variable may + have mutiple slices, one for each field that is a dependency. One + further wrinkle is that slicing depends on how things are used, and + so there is an additional FInstId which is used to tell the + synthesis which function summary to use. + + - [[file:~/galois/safedocs/daedalus/talos/src/Talos/Analysis/Slice.hs::data Slice' cn sle =][Slice definition]] + + This is what the strategies have to solve, it is parametric in + how calls are represented, and how expressions are represented. + + - [[file:~/galois/safedocs/daedalus/talos/src/Talos/Analysis/Exported.hs::type ExpSlice = Slice' ExpCallNode Expr][Instantiation of Slice']] + + This is how slices are used outside of the slicing algorithm + (hence 'exported' --- all the slicing-specific annotations etc + get removed). + +** Strategies + + A strategy is a way of solving a slice: it is a function from ExpSlices to SelectedPaths. + + - [[file:~/galois/safedocs/daedalus/talos/src/Talos/Strategy.hs::findModel ::][Running Strategies]] + + There is a bit of infrastructure for running strategies, including + timing etc. Strategies can generate many models (it is more + efficient to ask for multiple models from the solver) so we run the + strategy once and consult the cache when we next see that slice. + + The simplest strategy is the random selection with backtracking + strategy, which is made a bit mode complicated by abstracting over + the method of backtracking. + + - [[file:~/galois/safedocs/daedalus/talos/src/Talos/Strategy/BTRand.hs::stratSlice ::][Random strat]] + + We mainly use the PathSymbolic strategy, which constructs a big + solver model along with a parametric SelectedPath, by which I mean + a SelectedPath where the decisions are solver variables. The + solver model is then used to instantiate the parametrict + SelectedPath. + + - [[file:~/galois/safedocs/daedalus/talos/src/Talos/Strategy/PathSymbolic.hs::pathSymbolicStrat :: Strategy][Path symbolic strategy]] + + This strategy is made (much mode) complicated as we avoid having + non-product types in the solver. We can discuss this if there is + time/interest. + + + + + + + + + +