diff --git a/TODO.md b/TODO.md index 81a4b14..7c6934f 100644 --- a/TODO.md +++ b/TODO.md @@ -4,3 +4,5 @@ Place for TODOs/FIXMEs, especially if not expressed in other places. See README. * FIXME: repeating `newtype` and `newcons` definitions should be errors. * TODO: calibrate timeout parameters (number of iterations before forced exit from simple abduction, joint abduction etc.) * FIXME: `separate_subst` has a default argument `keep_uni=false`. Rethink for each use case if it is the correct argument. +* FIXME: export the type of tuples to OCaml using asterisks instead of commas. +* FIXME: either mute the "unification" type variables on let-in node annotations by printing them as `_`, or move them to the other side of `=` if that helps. \ No newline at end of file diff --git a/doc/invargent-manual.pdf b/doc/invargent-manual.pdf new file mode 100644 index 0000000..21c5fd9 Binary files /dev/null and b/doc/invargent-manual.pdf differ diff --git a/doc/invargent-manual.tm b/doc/invargent-manual.tm new file mode 100644 index 0000000..e98d611 --- /dev/null +++ b/doc/invargent-manual.tm @@ -0,0 +1,552 @@ + + + + +<\body> + |||<\author-affiliation> + Institute of Computer Science + + University of Wrocªaw + >>> + + <\abstract> + InvarGenT is a proof-of-concept system for invariant generation by full + type inference with Guarded Algebraic Data Types and existential types + encoded as automatically generated GADTs. This user manual discusses + motivating examples, briefly presents the syntax of the InvarGenT + language, and describes the parameters of the inference process that can + be passed to the InvarGenT executable. + + + + + Type systems are an established natural deduction-style means to reason + about programs. Dependent types can represent arbitrarily complex + properties as they use the same language for both types and programs, the + type of value returned by a function can itself be a function of the + argument. Generalized Algebraic Data Types bring some of that expressivity + to type systems that deal with data-types. Type systems with GADTs + introduce the ability to reason about return type by case analysis of the + input value, while keeping the benefits of a simple semantics of types, for + example deciding equality can be very simple. Existential types hide some + information conveyed in a type, usually when that information cannot be + reconstructed in the type system. A part of the type will often fail to be + expressible in the simple language of types, when the dependence on the + input to the program is complex. GADTs express existential types by using + local type variables for the hidden parts of the type encapsulated in a + GADT. + + Our type system for GADTs differs from more pragmatic approaches in + mainstream functional languages in that we do not require any type + annotations on expressions, even on recursive functions. Our + implementation: InvarGenT, see , also includes linear + equations and inequalities over rational numbers in the language of types, + with the possibility to introduce more domains in the future. + + + + The concrete syntax of InvarGenT is similar to that of OCaml. However, it + does not currently cover records, the module system, objects, and + polymorphic variant types. It supports higher-order functions, algebraic + data-types including built-in tuple types, and linear pattern matching. It + supports conjunctive patterns using the keyword, but it does + not support disjunctive patterns. It does not currently support guarded + patterns, i.e. no support for the keyword of OCaml. + + The sort of a type variable is identified by the first letter of the + variable. ,,,,,,,... + are in the sort of terms called , i.e. ``types proper''. + ,,,,,,,... + are in the sort of linear arithmetics over rational numbers called + . Remaining letters are reserved for sorts that may be added + in the future. Type constructors and value constructors have the same + syntax: capitalized name followed by a tuple of arguments. They are + introduced by and respectively. The + declaration might be misleading in that it only lists + the sorts of the arguments of the type, the resulting sort is always + . Values assumed into the environment are introduced by + . There is a built-in type corresponding to declaration + and definitions of numeric constants + ... The programmer can use + declarations to give the semantics of choice to the + data-type. + + In examples here we use Unicode characters. For ASCII equivalents, take a + quick look at the tables in the following section. is a + function comparing values provided representation of their types: + + <\code> + newtype Ty : type + + newtype Int + + newtype List : type + + newcons Zero : Int + + newcons Nil : >a. List a + + newcons TInt : Ty Int + + newcons TPair : >a, b. Ty a * Ty b + > Ty (a, b) + + newcons TList : >a. Ty a > Ty + (List a) + + newtype Boolean + + newcons True : Boolean + + newcons False : Boolean + + external eq_int : Int > Int > + Bool + + external b_and : Bool > Bool > + Bool + + external b_not : Bool > Bool + + external forall2 : >a, b. (a > b + > Bool) > List a + > List b > Bool + + \; + + let rec equal1 = function + + \ \ \| TInt, TInt -\> fun x y -\> eq_int x y + + \ \ \| TPair (t1, t2), TPair (u1, u2) -\> \ + + \ \ \ \ (fun (x1, x2) (y1, y2) -\> + + \ \ \ \ \ \ \ \ b_and (equal (t1, u1) x1 y1) + + \ \ \ \ \ \ \ \ \ \ \ \ \ \ (equal (t2, u2) x2 y2)) + + \ \ \| TList t, TList u -\> forall2 (equal (t, u)) + + \ \ \| _ -\> fun _ _ -\> False + + + InvarGenT returns an unexpected type: >a,b.(a,b)>b>b>Bool>, + one of four maximally general types of . This illustrates + that unrestricted type systems with GADTs lack principal typing property. + + InvarGenT commits to a type of a toplevel definition before proceeding to + the next one, so sometimes we need to provide more information in the + program. Besides type annotations, there are two means to enrich the + generated constraints: syntax for providing + negative constraints, and syntax for including constraints + of use cases with constraint of a toplevel definition. To ensure only one + maximally general type for , we use both. We add the lines: + + <\code> + \ \ \| TInt, TList l -\> (function Nil -\> assert false) + + \ \ \| TList l, TInt -\> (fun _ -\> function Nil -\> assert false) + + test b_not (equal (TInt, TList TInt) Zero Nil) + + + Actually, InvarGenT returns the expected type + >a,b.(a,b)>a>b>Bool> + when either the two clauses or the + clause is added. When using , the program should declare the + type and constant . In a future version, + this will be replaced by a built-in type with constants + and , exported into OCaml as type + with constants and . + + Now we demonstrate numerical invariants: + + <\code> + newtype Binary : num + + newtype Carry : num + + newcons Zero : Binary 0 + + newcons PZero : >n[0>n]. Binary(n) + > Binary(n+n) + + newcons POne : >n[0>n]. Binary(n) + > Binary(n+n+1) + + newcons CZero : Carry 0 + + newcons COne : Carry 1 + + \; + + let rec plus = + + \ \ function CZero -\> + + \ \ \ \ (function Zero -\> (fun b -\> b) + + \ \ \ \ \ \ \| PZero a1 as a -\> + + \ \ \ \ \ \ \ \ (function Zero -\> a + + \ \ \ \ \ \ \ \ \ \ \| PZero b1 -\> PZero (plus CZero a1 b1) + + \ \ \ \ \ \ \ \ \ \ \| POne b1 -\> POne (plus CZero a1 b1)) + + \ \ \ \ \ \ \| POne a1 as a -\> + + \ \ \ \ \ \ \ \ (function Zero -\> a + + \ \ \ \ \ \ \ \ \ \ \| PZero b1 -\> POne (plus CZero a1 b1) + + \ \ \ \ \ \ \ \ \ \ \| POne b1 -\> PZero (plus COne a1 b1))) + + \ \ \ \ \| COne -\> + + \ \ \ \ (function Zero -\> + + \ \ \ \ \ \ \ \ (function Zero -\> POne(Zero) + + \ \ \ \ \ \ \ \ \ \ \| PZero b1 -\> POne b1 + + \ \ \ \ \ \ \ \ \ \ \| POne b1 -\> PZero (plus COne Zero b1)) + + \ \ \ \ \ \ \| PZero a1 as a -\> + + \ \ \ \ \ \ \ \ (function Zero -\> POne a1 + + \ \ \ \ \ \ \ \ \ \ \| PZero b1 -\> POne (plus CZero a1 b1) + + \ \ \ \ \ \ \ \ \ \ \| POne b1 -\> PZero (plus COne a1 b1)) + + \ \ \ \ \ \ \| POne a1 as a -\> + + \ \ \ \ \ \ \ \ (function Zero -\> PZero (plus COne a1 Zero) + + \ \ \ \ \ \ \ \ \ \ \| PZero b1 -\> PZero (plus COne a1 b1) + + \ \ \ \ \ \ \ \ \ \ \| POne b1 -\> POne (plus COne a1 b1))) + + + We get >i,j,k.Carry + i>Binary j>Binary + k>Binary (i+j+k)>. + + We can introduce existential types directly in type declarations. To have + an existential type inferred, we have to use or + expressions, which differ from and + only in that the (return) type is an existential type. To + use a value of an existential type, we have to bind it with a + .. expression. Otherwise, the existential type + will not be unpacked. An existential type will be automatically unpacked + before being ``repackaged'' as another existential type. + + <\code> + newtype Room + + newtype Yard + + newtype Village + + newtype Castle : type + + newtype Place : type + + newcons Room : Room > Castle Room + + newcons Yard : Yard > Castle Yard + + newcons CastleRoom : Room > Place Room + + newcons CastleYard : Yard > Place Yard + + newcons Village : Village > Place Village + + \; + + external wander : >a. Place a > + >b. Place b + + \; + + let rec find_castle = efunction + + \ \ \| CastleRoom x -\> Room x + + \ \ \| CastleYard x -\> Yard x + + \ \ \| Village _ as x -\> + + \ \ \ \ let y = wander x in + + \ \ \ \ find_castle y + + + We get >a. Place + a> >b. Castle b>. + + A more practical existential type example: + + <\code> + newtype Bool + + newcons True : Bool + + newcons False : Bool + + newtype List : type * num + + newcons LNil : >a. List(a, 0) + + newcons LCons : >n,a[0>n]. a * List(a, n) + > List(a, n+1) + + \; + + let rec filter = fun f -\> + + \ \ efunction LNil -\> LNil + + \ \ \ \ \| LCons (x, xs) -\> + + \ \ \ \ \ \ ematch f x with + + \ \ \ \ \ \ \ \ \| True -\> + + \ \ \ \ \ \ \ \ \ \ let ys = filter f xs in + + \ \ \ \ \ \ \ \ \ \ LCons (x, ys) + + \ \ \ \ \ \ \ \ \| False -\> + + \ \ \ \ \ \ \ \ \ \ filter f xs + + + We get >a,i.(a>Bool)>List + (a, i)> >j[j>i].List (a, + j)>. Note that we need to use both and + above, since every use of or + will force the types of its branches to be equal. In + particular, for lists with length the resulting length would have to be the + same in each branch. If the constraint cannot be met, as for + with either or , the + code will not type-check. + + A more complex example that computes bitwise -- + stands for ``upper bound'': + + <\code> + newtype Binary : num + + newcons Zero : Binary 0 + + newcons PZero : >n [0>n]. Binary(n) + > Binary(n+n) + + newcons POne : >n [0>n]. Binary(n) + > Binary(n+n+1) + + \; + + let rec ub = efunction + + \ \ \| Zero -\> + + \ \ \ \ \ \ (efunction Zero -\> Zero + + \ \ \ \ \ \ \ \ \| PZero b1 as b -\> b + + \ \ \ \ \ \ \ \ \| POne b1 as b -\> b) + + \ \ \| PZero a1 as a -\> + + \ \ \ \ \ \ (efunction Zero -\> a + + \ \ \ \ \ \ \ \ \| PZero b1 -\> + + \ \ \ \ \ \ \ \ \ \ let r = ub a1 b1 in + + \ \ \ \ \ \ \ \ \ \ PZero r + + \ \ \ \ \ \ \ \ \| POne b1 -\> + + \ \ \ \ \ \ \ \ \ \ let r = ub a1 b1 in + + \ \ \ \ \ \ \ \ \ \ POne r) + + \ \ \| POne a1 as a -\> + + \ \ \ \ \ \ (efunction Zero -\> a + + \ \ \ \ \ \ \ \ \| PZero b1 -\> + + \ \ \ \ \ \ \ \ \ \ let r = ub a1 b1 in + + \ \ \ \ \ \ \ \ \ \ POne r + + \ \ \ \ \ \ \ \ \| POne b1 -\> + + \ \ \ \ \ \ \ \ \ \ let r = ub a1 b1 in + + \ \ \ \ \ \ \ \ \ \ POne r) + + + Type: >k,n.Binary k>Binary + n>>:i[n>i>k>i>i>k+n>0>n>0>k].Binary + i>. + + Why cannot we shorten the above code by converting the initial cases to + (efunction b -\ b)>? Without pattern + matching, we do not make the contribution of available. + Knowing i> and not knowing >n>, + for the case 0>, we get: + >k,n.Binary k>Binary + n>>:i[n>i>i>k+n>0>k].Binary + i>. >i> follows from + i>, >k+n> follows from + i> and >k>, but + >i> cannot be inferred from 0> + and i> without knowing that >n>. + + In fact, ... expressions are syntactic sugar for + pattern matching with a single branch. + + Besides displaying types of toplevel definitions, InvarGenT can also export + an OCaml source file with all the required GADT definitions and type + annotations. + + + + Below we present, using examples, the syntax of InvarGenT: the mathematical + notation, the concrete syntax in ASCII and the concrete syntax using + Unicode. + + |||||,\,\,\>>|,,,,,,,...>|>||>|,,,,,,,...>|>||>|>|>||>|>|>||>|>|>||>|>|>||k,nn|]>.\>>|=n].t>>|>k,n[k>n].t>>>||>>|>|>||>>|>|>||\\>>| t2>>|>>>>||>b>>|>|>||n>>|= n>>|> + n>>>||\\>>|>|> b=a>>>>>> + + Toplevel expressions (corresponding to structure items in OCaml) introduce + types, type and value constructors, global variables with given type + \ (external names) or inferred type (definitions). + + |||>>|| List(a,n+1)>>>||>n,a. a * List(a,n) > + List(a,n+1)>>>||>n,a. List(a,n) + \>k[k\=n].List(a,k)>>>||...>>||...>>>>> + + Like in OCaml, types of arguments in declarations of constructors are + separated by asterisks. However, the type constructor for tuples is + represented by commas, like in Haskell but unlike in OCaml. + + For simplicity of theory and implementation, mutual non-nested recursion + and or-patterns are not provided. For mutual recursion, nest one recursive + definition inside another. + + + + \; + + +<\initial> + <\collection> + + + + + +<\references> + <\collection> + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + > + + + +<\auxiliary> + <\collection> + <\associate|toc> + |math-font-series||1> + |.>>>>|> + + + + \ No newline at end of file diff --git a/src/InvariantsTest.ml b/src/InvariantsTest.ml index d8bc5f9..167cc98 100644 --- a/src/InvariantsTest.ml +++ b/src/InvariantsTest.ml @@ -1040,6 +1040,42 @@ let rec filter = fun f g -> ); + "binary upper bound-wrong" >:: + (fun () -> + (* skip_if !debug "debug"; *) + (* We need to expand the branch when the first argument is + [Zero] from [efunction b -> b] to the cases as below, to + convey the fact that the numerical parameter is non-negative. *) + test_case "binary upper bound -- bitwise or" +"newtype Binary : num +newcons Zero : Binary 0 +newcons PZero : ∀n [0≤n]. Binary(n) ⟶ Binary(n+n) +newcons POne : ∀n [0≤n]. Binary(n) ⟶ Binary(n+n+1) + +let rec ub = efunction + | Zero -> (efunction b -> b) + | PZero a1 as a -> + (efunction Zero -> a + | PZero b1 -> + let r = ub a1 b1 in + PZero r + | POne b1 -> + let r = ub a1 b1 in + POne r) + | POne a1 as a -> + (efunction Zero -> a + | PZero b1 -> + let r = ub a1 b1 in + POne r + | POne b1 -> + let r = ub a1 b1 in + POne r)" + [2,"∃n, k. + δ = + (Binary k → Binary n → ∃4:i[n ≤ i ∧ i ≤ (k + n) ∧ + 0 ≤ k].Binary i)"] + ); + "binary upper bound" >:: (fun () -> skip_if !debug "debug";