diff --git a/index.html b/index.html index e15d7b1..3243259 100644 --- a/index.html +++ b/index.html @@ -3,11 +3,11 @@ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
- +LDC
- load constantLDR
- load from registerLDL
- loal localLDS
- load from stackLDLA
- load local adressLDA
- load via adressLDRR
- load register from registerNOP
- noopHALT
- halt programAJS
- adjust stack pointerBRA
- unconditional branchBSR
- branch to subroutineRET
- return from subroutineSTR
- store to registerSTS
- store into stackSTL
- store localLDC
- load constantLDR
- load from registerLDL
- loal localLDS
- load from stackLDLA
- load local adressLDA
- load via adressLDRR
- load register from registerNOP
- noopHALT
- halt programAJS
- adjust stack pointerBRA
- unconditional branchBSR
- branch to subroutineRET
- return from subroutineSTR
- store to registerSTS
- store into stackSTL
- store localHaskell is used because many concept from formal language theory have a direct correspondence in Haskell @@ -357,8 +415,8 @@
An alphabet is a set of symbols that can be used to form sentences @@ -395,8 +453,8 @@
Alphabet: A finite set of symbols. @@ -412,16 +470,16 @@
A grammar is formalism to describe a language inductively. @@ -440,8 +498,8 @@
We consider only restricted grammars: @@ -463,8 +521,8 @@
natural numbers without leading zeros @@ -497,8 +555,8 @@
A grammar where every sentence corresponds to a unique parse tree is called unambiguous. @@ -538,8 +596,8 @@
A grammar transformation is a mapping from one grammar to another, such that the generated language remains the same. @@ -561,12 +619,12 @@
Given a grammar G and a string s, the parsing problem is to decide wether or not \(s\in L(G)\) @@ -577,8 +635,8 @@
Consider this grammar: @@ -601,7 +659,7 @@
Grammar A way to describe a language inductively. @@ -675,12 +733,12 @@
parseDate5 :: Parser Date @@ -729,8 +787,8 @@3.1. Parser data type<
The actual type also has what type of symbol we are trying to parse, usually char. @@ -757,8 +815,8 @@
<$> :: (a -> b) -> Parser a -> Parser b @@ -790,8 +848,8 @@3.3. Implementing <
Examples: @@ -806,8 +864,8 @@
Only succeed if the result of a parser satisfys a given predicate @@ -833,8 +891,8 @@
Parses using either or both parsers @@ -857,8 +915,8 @@
This function isn’t actually in library, but could still be a usefull example for a low level parser @@ -876,8 +934,8 @@
All of these are made for ignoring the result of a parser @@ -905,8 +963,8 @@
Creates a parser that always results in the same value, doesn’t consume anything from the input string @@ -922,8 +980,8 @@
Parser that always fails @@ -936,12 +994,12 @@
satisfy takes a predicate and returns a parser that parses a single symbol satisfying that predicate. @@ -955,8 +1013,8 @@
symbol parses a specific given symbol @@ -970,8 +1028,8 @@
Biased choice. If the left hand side parser succeeds, the right hand side is not considered. @@ -984,8 +1042,8 @@
Monadic bind @@ -1026,8 +1084,8 @@
Because we have defined the bind operator we can also use the do notation! @@ -1056,8 +1114,8 @@
The operations parsers support are very common, many other types support the same interface(s). @@ -1082,8 +1140,8 @@
Parses an optional element. Takes the default value as its second argument. @@ -1096,12 +1154,12 @@
Parses many, i.e., zero or more, occurrences of a given parser. @@ -1114,8 +1172,8 @@
Parser some, i.e., one or more, occurrences of a given parser. @@ -1132,8 +1190,8 @@
Takes a parser p and a separator parser s. Parses a sequence of p’s that is separated by s’s @@ -1158,11 +1216,11 @@
A → u | u | v @@ -1264,12 +1322,12 @@
A production is called left-recursive if the right hand side starts with the nonterminal of the left hand side. @@ -1300,7 +1358,7 @@
First, split the productions for A into left-recursive and others: @@ -1330,12 +1388,12 @@
Consider a grammar for simple equations: @@ -1404,8 +1462,8 @@
The basic idea is to associate operators of different priorities with different non-terminals. @@ -1413,26 +1471,26 @@
For each priority level i, we get -\[E_i → E_i\ Op_i\ E_{i+1}\ |\ E_{i+1} \text{ (for left-associative operators)}\] +\[E_i \rightarrow E_i\ Op_i\ E_{i+1}\ |\ E_{i+1} \text{ (for left-associative operators)}\] or -\[E_i → E_{i+1}\ Op_i\ E_{i}\ |\ E_{i+1} \text{ (for right-associative operators)}\] +\[E_i \rightarrow E_{i+1}\ Op_i\ E_{i}\ |\ E_{i+1} \text{ (for right-associative operators)}\] or -\[E_i → E_{i+1}\ Op_i\ E_{i+1}\ |\ E_{i+1} \text{ (for non-associative operators)}\] +\[E_i \rightarrow E_{i+1}\ Op_i\ E_{i+1}\ |\ E_{i+1} \text{ (for non-associative operators)}\]
Applied to -\[ E → E + E\] -\[ E → E - E\] -\[ E → E * E\] -\[ E → ( E )\] -\[ E → Nat \] +\[ E \rightarrow E + E\] +\[ E \rightarrow E - E\] +\[ E \rightarrow E * E\] +\[ E \rightarrow ( E )\] +\[ E \rightarrow Nat \] we obtain: -\[ E_1 → E_1\ Op_1\ E_2\ |\ E_2 \] -\[ E_2 → E_2\ Op_2\ E_3\ |\ E_3 \] -\[ E_3 → ( E_1 ) | Nat \] -\[ Op_1 → + | - \] -\[ Op_2 → * \] +\[ E_1 \rightarrow E_1\ Op_1\ E_2\ |\ E_2 \] +\[ E_2 \rightarrow E_2\ Op_2\ E_3\ |\ E_3 \] +\[ E_3 \rightarrow ( E_1 ) | Nat \] +\[ Op_1 \rightarrow + | - \] +\[ Op_2 \rightarrow * \]
@@ -1467,8 +1525,8 @@
type Op a = (Char, a -> a -> a) @@ -1499,12 +1557,12 @@4.2.3. A general opera
We would like to create a simple subset of parser combinators @@ -1557,8 +1615,8 @@
The following expressions in the simplified languages can be converted to regex: @@ -1632,8 +1690,8 @@
No parsing @@ -1653,22 +1711,22 @@
We want to create a efficient algorithm for matching regular expressions.
Computers are complicated so instead we consider a Moore Machine
-
We can model the function of a lamp with three buttons using a moore machine @@ -1701,7 +1759,7 @@
runMoore :: Moore inp state out -> [inp] -> state @@ -1780,19 +1838,19 @@6.1.3. Running Moore M
An example Moore Machine for the regular expression a*aaaba*
Not all of regular expressions have a direct and easy translation to DFA, this is why we end up using NFAε @@ -1823,7 +1881,7 @@
We opt to use NFAε instead of DFA for regular expression matching @@ -1919,8 +1977,8 @@
runNFAε :: NFAε symbol state -> [symbol] -> Set state @@ -1936,8 +1994,8 @@6.2.4. Running NFAε
If n = length input and m = length regexp, then… @@ -1955,8 +2013,8 @@
Basically just create a DFA where the state variable is a set of state @@ -1984,8 +2042,8 @@
A compiler roughly has the folowing phases @@ -2011,8 +2069,8 @@
Most common functions over lists can be expressed as folds @@ -2056,8 +2114,8 @@
Consider a grammer with corresponding data type @@ -2133,8 +2191,8 @@
Lets take a simple grammar for arithmetic expressions @@ -2204,8 +2262,8 @@
For a datatype T, we can define a fold function as follows: @@ -2224,8 +2282,8 @@
data Tree a = Leaf a @@ -2264,8 +2322,8 @@7.4.1. Trees example
Dit leek me niet super nuttig, misschien later samenvatten. @@ -2278,13 +2336,135 @@
+Each datatype in the family can have its own result type. +
+ ++example: +
+ ++Result type e for expressions, result type d for declarations +
+ +Add :: E -> E -> E +Neg :: E -> E +Num :: Int -> E +Var :: Id -> E +Def :: D -> E -> E +Dcl :: Id -> E -> D + +type EDAlgebra e d = + (e -> e -> e, + e -> e, + Int -> e, + Id -> e, + d -> e -> e, + Id -> e -> d) ++
+We also need one function per type to traverse the structure: +
+foldE :: EDAlgebra e d -> E -> e +foldE (add, neg, num, var, def, dcl) = fe + where fe (Add e1 e2) = add (fe e1) (fe e2) + fe (Neg e) = neg (fe e) + fe (Num n) = num n + fe (Var x) = var x + fe (Def d e) = def (fd d) (fe e) + fd (Dcl x e) = dcl x (fe e) + +fe :: E -> e +fd :: D -> d ++
+We can also add a list type to one of the constructors: +
+data E = ... + | Def [D] E -- modified + +-- We keep the list in the algebra +type EDAlgebra e d = + ( ..., + [d] -> e -> e, + ...) + +foldE :: EDAlgebra e d -> E -> e +foldE (add, neg, num, var, def, dcl) = fe + where ... + fe (Def ds e) = def (map fd ds) (fe e) + ... ++
+RepMax replaces all the elements of a list with the largest number. +
+ ++We use this function as an example of a sort of ’recursive’ fold +
++It can be implemented using two folds: +
+ +maxAlg :: ListAlgebra Int Int +maxAlg = LAlg {nil = minBound, cons x m = x ‘maximum‘ m} +repAlg :: Int -> ListAlgebra Int [Int] +repAlg m = LAlg {nil = [], cons xs = m : xs} +repMax xs = foldr repAlg (foldr maxAlg xs) xs ++
+It can be implemented using a single fold +
+ +repMaxAlg :: ListAlgebra Int (Int -> ([Int], Int)) +repMaxAlg = LAlg {nil = \max -> ([], minBound) + , cons x f = \max -> + let (ys, maxSoFar) = f max + in (max : ys, x ‘maximum‘ maxSoFar)} + +repMax :: [Int] -> [Int] +repMax xs = maxs + where (maxs, max) = foldr repMaxAlg xs max ++
+Note the recursion in the last line, we the result of the function to the actual function, this can be done in haskell because magic and laziness and stuff. +
+A lot more detailed documentation can be found on the SSM page: @@ -2295,8 +2475,8 @@
The simple stack machine is a virtual machine that executes programs consisting of assembly language instructions @@ -2329,28 +2509,28 @@
LDC
- load constantLDC
- load constantPushes the inline constant on the stack.
LDR
- load from registerLDR
- load from registerPushes a value from a register onto the stack.
LDL
- loal localLDL
- loal localPushes a value relative to the markpointer register. @@ -2364,7 +2544,7 @@
LDL
-
LDL
-
LDS
- load from stackLDS
- load from stackPushes a value relative to the top of the stack. @@ -2396,7 +2576,7 @@
LDS
-
LDS
-
LDLA
- load local adressLDLA
- load local adressPushes the address of a value relative to the markpointer. @@ -2424,8 +2604,8 @@
LDLA
LDA
- load via adressLDA
- load via adressPushes the value pointed to by the value at the top of the stack. The pointer value is offset by a constant offset. @@ -2436,8 +2616,8 @@
LDA
LDRR
- load register from registerLDRR
- load register from registerCopy the content of the second register to the first. Does not affect the stack. @@ -2452,7 +2632,7 @@
LDRR
-
+
@@ -2462,30 +2642,30 @@ 8.3.7. LDRR
-
+
-
-8.3.8. NOP
- noop
+
+8.3.8. NOP
- noop
No operation, does nothing, goes to next instruction.
-
-8.3.9. HALT
- halt program
+
+8.3.9. HALT
- halt program
Machine stops executing instructions.
-
-8.3.10. AJS
- adjust stack pointer
+
+8.3.10. AJS
- adjust stack pointer
Adjusts the stackpointer with fixed amount.
@@ -2500,7 +2680,7 @@
8.3.10. AJS
-
+
@@ -2510,22 +2690,22 @@ 8.3.10. AJS
-
+
-
-8.3.11. BRA
- unconditional branch
+
+8.3.11. BRA
- unconditional branch
Jumps to the destination. Replaces the PC with the destination address.
-
-8.3.12. BSR
- branch to subroutine
+
+8.3.12. BSR
- branch to subroutine
Pushes the PC on the stack and jumps to the subroutine.
@@ -2540,7 +2720,7 @@
8.3.12. BSR
-
+
@@ -2550,14 +2730,14 @@ 8.3.12. BSR
-
+
-
-8.3.13. RET
- return from subroutine
+
+8.3.13. RET
- return from subroutine
Pops a previously pushed PC from the stack and jumps to it.
@@ -2572,7 +2752,7 @@
8.3.13. RET
-
+
@@ -2582,38 +2762,38 @@ 8.3.13. RET
-
+
-
-8.3.14. STR
- store to register
+
+8.3.14. STR
- store to register
Pops a value from the stack and stores it in the specified register. See also ldr.
-
-8.3.15. STS
- store into stack
+
+8.3.15. STS
- store into stack
Pops a value from the stack and stores it in a location relative to the top of the stack.
-
-8.3.16. STL
- store local
+
+8.3.16. STL
- store local
Pops a value from the stack and stores it in a location relative to the markpointer.
-
-8.3.17. Operators
+
+8.3.17. Operators
Operators remove stack arguments and put the result back on the stack.
@@ -2648,14 +2828,14 @@
8.3.17. Operators
-- Binary Operator Example:
+ - Binary Operator Example:
4 and 7 are on top of the stack
-
+
@@ -2669,16 +2849,16 @@ 8.3.17. Operators
-
-8.4. Translating programs
+
+8.4. Translating programs
-
-8.4.1. Translating expressions
+
+8.4.1. Translating expressions
-- Conditionals
+ - Conditionals
Conditionals can be translated like this:
@@ -2748,7 +2928,7 @@
8.4.1. Translating exp
data Expr = ... |
If Expr Expr Expr
-code :: Expr → Code
+code :: Expr -> Code
code = . . .
code (If c t f) = cc ++
[BRF (st + 2)] ++
@@ -2764,7 +2944,7 @@ 8.4.1. Translating exp
-
+
@@ -2778,11 +2958,11 @@ 8.4.1. Translating exp
where
codeAlg :: ExprAlg Code
codeAlg = ExprAlg
- { num = \n → [LDC n]
- , add = \l r → l ++ r ++ [ADD]
- , neg = \l → l ++ [NEG]
- , eq = \l r → l ++ r ++ [EQ]
- , if = \c t f →
+ { num = \n -> [LDC n]
+ , add = \l r -> l ++ r ++ [ADD]
+ , neg = \l -> l ++ [NEG]
+ , eq = \l r -> l ++ r ++ [EQ]
+ , if = \c t f ->
let st = codeSize t
sf = codeSize f
in c ++ [BRF (st + 2)] ++ t ++ [BRA sf] ++ f
@@ -2791,10 +2971,10 @@ 8.4.1. Translating exp
-- Variables and environments
+ - Variables and environments
-To add variables to the code, we chane the type of the code, to include an environment as an argument.
+To add variables to the code, we change the type of the code, to include an environment as an argument.
@@ -2804,19 +2984,19 @@ 8.4.1. Translating exp
code x = foldExpr codeAlg x empty
where
- codeAlg :: ExprAlg (Env → Code)
+ codeAlg :: ExprAlg (Env -> Code)
codeAlg = ExprAlg
- { num = \n → \e → [LDC n]
- , add = \l r → \e → l e ++ r e ++ [ADD]
- , neg = \l → \e → l e ++ [NEG]
- , eq = \l r → \e → l e ++ r e ++ [EQ]
- , if = \c t f → \e →
+ { num = \n -> \e -> [LDC n]
+ , add = \l r -> \e -> l e ++ r e ++ [ADD]
+ , neg = \l -> \e -> l e ++ [NEG]
+ , eq = \l r -> \e -> l e ++ r e ++ [EQ]
+ , if = \c t f -> \e ->
let st = codeSize (t e)
sf = codeSize (f e)
in c e ++ [BRF (st + 2)] ++
t e ++ [BRA sf] ++ f e
- , var = \s → \e → [LDL (e ! s)]
- , leT = \s d b → \e → d e ++ [STL (size e)]
+ , var = \s -> \e -> [LDL (e ! s)]
+ , leT = \s d b -> \e -> d e ++ [STL (size e)]
++ b (insert s (size e) e)
}
@@ -2825,8 +3005,8 @@ 8.4.1. Translating exp
-
-8.4.2. Statements
+
+8.4.2. Statements
We extend our lanuage with statements:
@@ -2850,7 +3030,7 @@
8.4.2. Statements
-- While loops
+ - While loops
Translating while loops can be done in multiple ways: (cc is loop condition, cb is loop body)
@@ -2860,7 +3040,7 @@
8.4.2. Statements
-
+
@@ -2869,7 +3049,7 @@ 8.4.2. Statements
data Stmt = . . .
| While Expr Stmt
-code :: Stmt → Code
+code :: Stmt -> Code
code = ...
code (While c b) = [BRA sb] ++
cb ++
@@ -2888,16 +3068,16 @@ 8.4.2. Statements
data SEAlg s e = SEAlg
- { add :: e → e → e
- , num :: Int → e
- , ifE :: e → e → e → e
- , ifS :: e → s → s → s
- , asg :: String → e → s
- , whl :: e → s → s
- , cal :: String → [e] → s
+ { add :: e -> e -> e
+ , num :: Int -> e
+ , ifE :: e -> e -> e -> e
+ , ifS :: e -> s -> s -> s
+ , asg :: String -> e -> s
+ , whl :: e -> s -> s
+ , cal :: String -> [e] -> s
}
-foldSE :: SEAlg s e → Statement → s
+foldSE :: SEAlg s e -> Statement -> s
foldSE alg {. .} = fs where
fs (IfS c t f) = ifS (fe c) (fs t) (fs f)
fe (IfE c t f) = ifE (fe c) (fe t) (fe f)
@@ -2907,54 +3087,54 @@ 8.4.2. Statements
code x = foldSE codeAlg x empty
where
- codeAlg :: SEAlg (Env → Code) (Env → Code)
+ codeAlg :: SEAlg (Env -> Code) (Env -> Code)
codeAlg = SEAlg
- { asg = λs d e → d e ++ [STL (e ! s)]
- , ifS = λc t f e →
+ { asg = \s d e -> d e ++ [STL (e ! s)]
+ , ifS = \c t f e ->
let st = codeSize (t e)
sf = codeSize (f e)
in c e ++ [BRF (st + 2)] ++
t e ++ [BRA sf] ++ f e
- , whl = λc b e →
+ , whl = \c b e ->
let sc = codeSize (c e)
sb = codeSize (b e)
in [BRA sb] ++ b e ++ c e ++
[BRT (−(sb + sc + 2))]
- , cal = λm ps e →
+ , cal = \m ps e ->
concatMap ($e) ps ++ [BSR m]
, . . .}
-- Method translation
+ - Method translation
A method call:
A method definition:
-- Method translation with local variables
+ - Method translation with local variables
Method call as before.
@@ -2973,7 +3153,7 @@
8.4.2. Statements
-- Example method translation with local variables
+ - Example method translation with local variables
m(7, 12);
@@ -2990,7 +3170,7 @@ 8.4.2. Statements
-
+
@@ -2999,7 +3179,7 @@ 8.4.2. Statements
Then we put the contents of the stack pointer into the mark pointer
-
+
@@ -3008,30 +3188,30 @@ 8.4.2. Statements
Then we adjust the stack pointer by +2, to make space for a and b
-
+
-Then we load 7, the x variables onto the top of the stack, using LDL -3
+Then we load 7, the x variable onto the top of the stack, using LDL -3
Then we call NEG
which will negate the argument at the top of the stack, 7
-
+
-Then we call STL +1 to store -7 in the a variable
+Then we call STL +1 to store -7 in the a variable
-
+
@@ -3041,7 +3221,7 @@ 8.4.2. Statements
-
+
@@ -3050,16 +3230,16 @@ 8.4.2. Statements
Then we pop the old position of the mark pointer and put it into the mark pointer.
-
+
-Then we call STS -2, which stores the current top of the stack (the return pointer/adress) two positions up relative to the top of the stack. We do this to remove the arguments of the function from the stack.
+Then we call STS -2, which stores the current top of the stack (the return pointer/adress) two positions up relative to the top of the stack. We do this to remove the arguments of the function from the stack.
-
+
@@ -3070,7 +3250,7 @@ 8.4.2. Statements
-
+
+9. Validation
+
+
+A compiler will also compile any bugs introduced by a programmer.
+
+
+
+With machine code the error message usually isn’t nice.
+
+
+- Having a clear error message is better
+
+
+
+Where do we look for bugs
+
+
+- In the code
+- Machine code
+- Runtime?
+- The programmer?
+
+
+
+All technique’s ussually has some point of diminishing return.
+
+
+- We should do all!
+
+
+
+This lecture we will only look at checking the source code
+
+
+
+Who check it
+
+
+- programmer
+- 3rd party tester
+- automated/machine
+
+
+
+At code level
+
+
+- Hlint
+
+
+
+At AST level
+
+
+- scope
+- type
+- termination
+- borrow
+
+
+
+At machine code level
+
+
+- SAST bugrpove
+- size-limit test
+
+
+
+9.1. Example checks at AST level
+
+
+Rest van de lecture was niet boeiend, saai voorbeeld van type checker:
+
+
+
+Dit is het uiteindelijke resultaat, niet super boeiend.
+
+
+
+{- Language 3: floating-point/boolean expressions -}
+
+module Demo3 where
+
+import Demo1 (Valid(..))
+
+data Exp
+ = Con Float
+ | Add Exp Exp
+ | Mul Exp Exp
+ | Pow Exp Exp
+ | If Exp Exp Exp
+ | Leq Exp Exp
+ deriving Show
+
+seven =
+ If (Con 3.0 `Leq` Con 4.0)
+ (Con 7.0)
+ (Con 9.0)
+
+eval :: Exp -> Either Float Bool
+eval (Con k) = Left k
+eval (Add e1 e2) = Left (p + q) where Left p = eval e1; Left q = eval e2
+eval (Mul e1 e2) = Left (p * q) where Left p = eval e1; Left q = eval e2
+eval (Pow e1 e2) = Left (p ** q) where Left p = eval e1; Left q = eval e2
+eval (If h t e) = if h' then eval t else eval t where
+ Right h' = eval h
+eval (Leq e1 e2) = Right (p <= q) where Left p = eval e1; Left q = eval e2
+
+{- ✅ eval seven -}
+
+{- 💩 eval is ugly -}
+
+{- 🕵 -}
+
+nseven =
+ If (Con 3.0)
+ (Con 7.0)
+ (Con 9.0)
+
+{- > eval nseven -}
+
+{- 💩 crashes on the user's computer! -}
+
+{- 💡 write type-checker -}
+
+data Type = Float | Bool
+ deriving (Show, Eq)
+
+-- check :: Type -> Exp -> Bool
+-- check Float (Con _) = True
+-- check Float (Add e1 e2) = check Float e1 && check Float e2
+-- check Float (Mul e1 e2) = check Float e1 && check Float e2
+-- check Float (Pow e1 e2) = check Float e1 && check Float e2
+-- check ty (If h t e) = check Bool h && check ty t && check ty e
+-- check _ _ = False
+
+-- safeEval :: Type -> Exp -> Maybe (Either Float Bool)
+-- safeEval t e =
+-- if check t e
+-- then Just $ eval e
+-- else Nothing
+
+{- > safeEval Float nseven -}
+
+{- 💩 safeEval should not need a Type param -}
+{- 💩 'Nothing' is not a nice error message -}
+{- 💩 eval is ugly -}
+
+{- 💡 get rid of the Type param by inferring type -}
+
+-- infer :: Exp -> Maybe Type
+-- infer e@(Con _) = if check Float e then Just Float else Nothing
+-- infer e@(Add _ _) = if check Float e then Just Float else Nothing
+-- infer e@(Mul _ _) = if check Float e then Just Float else Nothing
+-- infer e@(Pow _ _) = if check Float e then Just Float else Nothing
+-- infer e@(Leq _ _) = if check Bool e then Just Bool else Nothing
+-- infer (If h t e) = do
+-- t_ty <- infer t
+-- e_ty <- infer e
+-- if t_ty == e_ty && check Bool h
+-- then Just t_ty
+-- else Nothing
+
+{- note the re-use of check (to avoid duplication) -}
+
+-- safeEval :: Exp -> Maybe (Either Float Bool)
+-- safeEval e = infer e >> Just (eval e)
+
+{- > safeEval nseven -}
+
+{- 💩 infer, check ugly -}
+{- 💩 'Nothing' is not a nice error message -}
+
+{- 💡 use Maybe () instead of Bool to unlock <*> etc. -}
+
+-- check :: Type -> Exp -> Maybe ()
+-- check Float (Con _) = Just ()
+-- check Float (Add e1 e2) = () <$ check Float e1 <* check Float e2
+-- check Float (Mul e1 e2) = () <$ check Float e1 <* check Float e2
+-- check Float (Pow e1 e2) = () <$ check Float e1 <* check Float e2
+-- check ty (If h t e) = () <$ check Bool h <* check ty t <* check ty e
+-- check _ _ = Nothing
+
+-- infer :: Exp -> Maybe Type
+-- infer e@(Con _) = Float <$ check Float e
+-- infer e@(Add _ _) = Float <$ check Float e
+-- infer e@(Mul _ _) = Float <$ check Float e
+-- infer e@(Pow _ _) = Float <$ check Float e
+-- infer e@(Leq _ _) = Bool <$ check Bool e
+-- infer (If h t e) = do
+-- check Bool h
+-- t_ty <- infer t
+-- t_ty <$ check t_ty e
+
+-- safeEval :: Exp -> Maybe (Either Float Bool)
+-- safeEval e = infer e >> Just (eval e)
+
+{- Unchanged: safeEval nseven -}
+
+{- 💩 'Nothing' is not a nice error message -}
+
+{- 💡 replace Maybe with Either -}
+
+data CheckingErr = Expected Type {- in -} Exp
+ deriving Show
+
+-- check :: Type -> Exp -> Either CheckingErr ()
+-- check Float (Con _) = Right ()
+-- check Float (Add e1 e2) = () <$ check Float e1 <* check Float e2 -- unchanged
+-- check Float (Mul e1 e2) = () <$ check Float e1 <* check Float e2 -- unchanged
+-- check Float (Pow e1 e2) = () <$ check Float e1 <* check Float e2 -- unchanged
+-- check ty (If h t e) = () <$ check Bool h <* check ty t <* check ty e -- unchanged
+-- check expected_ty e = Left $ Expected expected_ty e
+
+-- infer :: Exp -> Either CheckingErr Type
+-- infer e@(Con _) = Float <$ check Float e
+-- infer e@(Add _ _) = Float <$ check Float e -- unchanged
+-- infer e@(Mul _ _) = Float <$ check Float e -- unchanged
+-- infer e@(Pow _ _) = Float <$ check Float e -- unchanged
+-- infer e@(Leq _ _) = Bool <$ check Bool e -- unchanged
+-- infer (If h t e) = do -- unchanged
+-- check Bool h -- unchanged
+-- t_ty <- infer t -- unchanged
+-- t_ty <$ check t_ty e -- unchanged
+
+-- safeEval :: Exp -> Either CheckingErr (Either Float Bool)
+-- safeEval e = infer e >> Right (eval e)
+
+-- {- ✅ safeEval nseven -}
+
+neight =
+ If (Con 3.0)
+ (If (Con 0.0) (Con 0.0) (Con 0.0))
+ (Con 8.0)
+
+{- > safeEval neight -}
+
+{- 💩 only one error reported! -}
+
+{- 💡 bring back Valid -}
+
+check :: Type -> Exp -> Valid [CheckingErr] ()
+check Float (Con _) = OK ()
+check Float (Add e1 e2) = () <$ check Float e1 <* check Float e2 -- unchanged
+check Float (Mul e1 e2) = () <$ check Float e1 <* check Float e2 -- unchanged
+check Float (Pow e1 e2) = () <$ check Float e1 <* check Float e2 -- unchanged
+check ty (If h t e) = () <$ check Bool h <* check ty t <* check ty e -- unchanged
+check expected_ty e = Err [Expected expected_ty e]
+
+infer :: Exp -> Valid [CheckingErr] Type
+infer e@(Con _) = Float <$ check Float e -- unchanged
+infer e@(Add _ _) = Float <$ check Float e -- unchanged
+infer e@(Mul _ _) = Float <$ check Float e -- unchanged
+infer e@(Pow _ _) = Float <$ check Float e -- unchanged
+infer e@(Leq _ _) = Bool <$ check Bool e -- unchanged
+infer (If h t e) = do -- unchanged
+ check Bool h -- unchanged
+ t_ty <- infer t -- unchanged
+ t_ty <$ check t_ty e -- unchanged
+
+safeEval :: Exp -> Valid [CheckingErr] (Either Float Bool)
+safeEval e = infer e >> OK (eval e)
+
+{- ✅ safeEval neight -}
+
+{- 💩 eval is still ugly -}
+
+{- 🏋🏋 implement...
+ * A type Exp_Float allowing *only* Float-typed Exps
+ * validate :: Exp -> Valid [CheckingErr] Exp_Float
+ * evalFloat :: Exp_Float -> Float
+ * safeEval in terms of `validate` and `evalFloat`
+
+Hint: you will also need to implement
+ corresponding defintions for Bool-typed Exp s
+ (or use a GADT)
+-}
+
+
+
+
+
+
+10. TODO Pumping Lemmas, proving (non)regular languages
+
+
+
+10.1. General strategy for proving a language (non) regular
+
+
+Regular language: a language that can be expressed using a regular expression, sometimes defined as a language recognised by a finite automaton.
+
+
+
+Generally, proving that a language does not belong to a certain class is much more difficult than proving that it does.
+
+
+
+In the case of regular languages,
+
+
+
+- to show that a language is regular, we have to give one regular grammar (or regular expression, or DFA, or NFA) that describes the language;
+- to show that a language is not regular, we have to prove that no regular grammar (or regular expression, or DFA, or NFA) is possible that describes the language
+
+
+
+The strategy is as follows:
+
+
+- we expose a limitation in the formalism (in this case, in the concept of finite state automata);
+- from this limitation, we derive a property that all languages in the class (in this case, regular languages) must have;
+- therefore, if a language does not have that property, it cannot be in the class.
+
+
+
+
+10.2. Proving a language non-regular
+
+
+Assume we have a deterministic finite state automaton
+
+
+
+10.2.1. Strategy step 1: limitation in the formalism
+
+
+we expose a limitation in the formalism (in this case, in the concept of finite state automata)
+
+
+
+Any finite state automaton has a finite number of states.
+
+
+
+Assume we have one with n states.
+
+
+
+How many different states do we visit while reading a string that is accepted and has length n?
+
+
+
+n + 1 or less, and if less, we traverse a loop. But there are only n states, so we cannot traverse n + 1 different states. Therefore, we must traverse a loop.
+
+
+
+We have done the first step of the strategy. We have found a limitation in the formalism. Now we have to derive a property for all regular languages from that.
+
+
+
+
+10.2.2. Step 2: property of language class
+
+
+From previous limitation, we derive a property that all languages in the class (in this case, regular languages) must have.
+
+
+
+If we have a word that is accepted and traverses the loop once, then the words that follow the same path and traverse the loop any other number of times are also accepted
+
+
+
+
+
+
+
+
+
+This is an excerpt of the automaton. There may be other nodes and edges.
+
+
+- Both u and w may be empty (i.e. A and S or A and E may be the same state), but v is not empty – there is a proper loop.
+- All words of the form \(uv^iw\) for \(i\in \mathbb{N}\) are accepted.
+
+
+
+A loop has to occur in every subword of at least length n:
+
+
+
+
+
+
+
+
+
+Assume we have an accepted word xyz where subword y is of at least length n
+
+
+
+A loop has to occur in every subword of at least length n:
+
+
+
+
+
+
+
+
+
+- Assume we have an accepted word xyz where subword y is of at least length n.
+- Then y has to be of form uvw where v is not empty and corresponds to a loop.
+- All words of the form \(xuv^iwz\) for \(i\in\mathbb{N}\) are accepted
+
+
+
+Pumping lemma for regular languages:
+
+
+- For every regular language L, there exists an \(n\in \mathbb{N}\)
+
+- (corresponding to the number of states in the automaton)
+
+- such that for every word \(xyz\) in L with \(|y| \ge n\),
+
+- (this holds for every long substring of every word in L)
+
+- we can split y into three parts, \(y = uvw\), with \(|v| > 0\),
+
+- (v is a loop)
+
+- such that for every \(i\in\mathbb{N}\) , we have \(xuv^iwz \in L\)
+
+
+
+
+10.2.3. Step 3: pumping lemma
+
+
+The we proceed with the final step of the strategy. In order to show that a language is not regular, we show that it does not have the pumping lemma property as follows:
+
+
+- We assume that the language is regular.
+- We use the pumping lemma to derive a word that must be in the language, but is not:
+
+- find a word \(xyz\) in L with \(|y| \ge n\),
+- from the pumping lemma there must be a loop in y,
+- but repeating this loop, or omitting it, takes us outside of the language.
+
+- The contradiction means that the language cannot be regular.
+
+
+
+Using the pumping lemma - strategy
+
+
+- For every natural number n,
+
+- because you don’t know what the value of n is
+
+- find a word xyz in L with \(|y| \ge n\) (you choose the word),
+- such that for every splitting \(y = uvw\) with \(|v| > 0\),
+
+- because you don’t know where the loop may be
+
+- there exists a number i (you figure out the number),
+- such that \(xuv^iwz\notin L\) (you have to prove it).
+
+
+
+
+
+10.3. Proving context-free grammar
+
+
+A context-free grammar consists of a sequence of productions:
+
+
+- the left hand side is always a nonterminal,
+- the right hand side is any sequence of terminals and nonterminals.
+
+
+
+One nonterminal of the grammar is the start symbol.
+
+
+
+Context-sensitive grammars drop the restriction on the left hand side:
+
+
+- \(a N b \rightarrow x\)
+
+
+
+Context-sensitive grammars are as powerful as any other computing formalism:
+
+
+- Turing machines
+- λ-calculus
+
+
+
+If we want to prove that a certain language is not context-free, we can apply the same strategy as for regular languages:
+
+
+- we expose a limitation in the formalism (in this case, in the concept of context-free grammars);
+- from this limitation, we derive a property that all languages in the class (in this case, context-free languages) must have;
+- therefore, if a language does not have that property, it cannot be in the class.
+
+
+
+10.3.1. Step 1: limitation in the formalism
+
+
+This time, we analyze parse trees rather than finite state
+automata.
+
+
+- We can produce parse trees of arbitrary depth if we find words in the language that are long enough, because the number of children per node is bounded by the maximum length of a right hand side of a production.
+- Once a path from a leaf to the root has more than n internal nodes, where n is the number of nonterminals in the grammar, one nonterminal has to occur twice on such a path
+
+- consequently a subtree can be inserted as often as desired
+
+
+
+
+
+10.3.2. Step 2: property of language class
+
+
+If the word is long enough, we have a derivation of the form
+
+
+- For a word to be a certain length, some non-terminals must occur twice
+
+
+
+\[S \Rightarrow^* uAy \Rightarrow^* uvAxy \Rightarrow^* uvwxy\]
+
+
+
+where \(|vx| > 0\).
+
+
+
+Because the grammar is context-free, this implies that
+
+
+
+\[A\Rightarrow^*vAx\]
+\[A\Rightarrow^*w\]
+
+
+
+We can thus derive
+
+
+
+\[S\Rightarrow ^* uAy \Rightarrow ^* uv^iwx^iy\]
+
+
+
+for any \(i\in\mathbb N\)
+
+
+
+
+10.3.3. Step 3: pumping lemma
+
+
+Pumping lemma for context-free languages
+
+
+
+For every context-free language L,
+
+
+- there exists a number \(n\in\mathbb N\) such that
+- for every word \(z\in L\) with \(|z| \ge n\)
+- we can split z into five parts, \(z = uvwxy\), with \(|vx| > 0\) and \(|vwx| \ge n\), such that
+- for every \(i \in N\), we have \(uv^iwx^iy \in L\).
+
+
+
+The n lets us limit the size of the part that gets pumped, similar to how the pumping lemma for regular languages lets us choose the subword that contains te loop.
+
+
+
+Using the pumping lemma:
+
+
+- For every number n,
+- find a word z in L with |z| ⩾ n (you choose the word),
+- such that for every splitting z = uvwxy with |vx| > 0 and |vwx| ⩽ n,
+- there exists a number i (you choose the number),
+- such that \(uv^iwx^iy \notin L\) (you have to prove it)
+
+
+
+Example:
+
+
+- The language \(L = \{a^mb^mc^m | m \in \mathbb N\}\) is not context-free.
+- Let n be any number.
+- We then consider the word \(z = a^nb^nc^n\).
+- From the pumping lemma, we learn that we can pump z, and that the part that gets pumped is smaller than n.
+- The part being pumped can thus not contain a’s, b’s and c’s at the same time, and is not empty either. In all these cases, we pump out of the language (for any \(i \ne 1\))
+
+
+
+
+
+10.4. Normal forms
+
+
+Context-free grammars can be wildly complex, in general.
+
+
+
+But all of them can be brought into more normalised forms.
+
+
+
+- We call them normal forms.
+
+
+
+We get to them by applying grammar transformations (see lecture 4).
+
+
+
+10.4.1. Chomksy Normal Form
+
+
+A context-free grammar is in Chomsky Normal Form if each production rule has one of these forms:
+
+
+- A → B C
+- A → x
+- S → ε
+
+
+
+where A, B, and C are nonterminals, x is a terminal, and S is the start symbol of the grammar. Also, B and C cannot be S
+
+
+- No rule produces ε except (possibly) from the start.
+- No chain rules of the form A → B.
+- Parse trees are always binary.
+
+
+
+
+10.4.2. Greibach Normal Form
+
+
+A context-free grammar is in Greibach Normal Form if each production rule has one of these forms:
+
+
+- \(A \rightarrow xA_1A_2 . . . A_n\)
+- \(S \rightarrow \epsilon\)
+
+
+where \(A, A_1, . . . , A_n\) are nonterminals \((n \ge 0)\), \(x\) is a terminal, and S is the start symbol of the grammar and does not occur in any right hand side.
+
+
+- At most one rule produces ε, and only from the start.
+- No left recursion.
+- A derivation of a word of length n has exactly n rule applications (except ε).
+- Generalizes GNF for regular grammars (where n ⩽ 1)
+
+
+
+
+
+
+11. Optimizations
+
+
+
+11.1. Optimization passes
+
+
+What is a compiler optimization?
+
+
+- A bad name
+- A semantics-preverving code transformation
+- Hopefully imporving the code by some metric
+
+
+
+Optimization passes can be visualized as so:
+
+
+
+
+
+11.2. Simple optimizations
+
+
+- Group of simple but effective optimizations
+- Find and replace
+- Usually on low-level instructions
+
+
+
+Examples:
+
+
+- \(x * 2 \Rightarrow x << 1\)
+- \(x * 0 \Rightarrow 0\)
+- \(x \leftarrow 3;x \leftarrow 4 \Rightarrow x \leftarrow 4\)
+
+
+
+11.2.1. Unreachable/dead code elimination:
+
+
+- Uncalled methods/functions
+- Code after a return statement
+- Patterns that cannot be matched
+
+
+
+
+11.2.2. Tail call elimination:
+
+
+Turn a simple recursive function into a loop, removes a lot of overhead from function calls. Also prevent stack from growing.
+
+
+
+int add (int m, int n) {
+ if (m = 0) then
+ return n;
+ else
+ return add (m − 1, n + 1);
+}
+
+
+
+
+int add (int m, int n) {
+ while (m ! = 0) {
+ m = m − 1;
+ n = n + 1;
+ }
+ return n;
+}
+
+
+
+
+
+
+11.3. Loop optimization
+
+
+
+11.3.1. Loop unrolling
+
+
+Removes some overhead from the loop, such as jumps, checking the condition.
+
+
+
+- In this example, n needs to be divisible by 4.
+- Side effects should not be in condition
+
+
+
+Possibly more cache misses.
+
+
+for (int i = 0; i < n; i++)
+{
+ doStuff (i);
+}
+
+
+
+
+for (int i = 0; i < n − 4; i + = 4)
+{
+ doStuff (i);
+ doStuff (i + 1);
+ doStuff (i + 2);
+ doStuff (i + 3);
+}
+
+
+
+
+
+11.3.2. Loop invariant code motion
+
+
+for (int i = 0; i < n; i++)
+{
+ x = 10 * y + cos (0.5);
+ doStuff (i, x);
+}
+
+
+
+
+x = 10 * y + cos (0.5);
+for (int i = 0; i < n; i++)
+{
+ doStuff (i, x);
+}
+
+
+
+
+
+11.3.3. Loop fusion
+
+
+Less overhead from jumps and conditions.
+
+
+
+The functions can influence each other, just make sure the order doesn’t change.
+
+
+
+for (int i = 0; i < n; i++)
+{
+ doStuff1 (i);
+}
+
+for (int i = 0; i < n; i++)
+{
+ doStuff2 (i);
+}
+
+
+
+
+for (int i = 0; i < n; i++)
+{
+ doStuff1 (i);
+ doStuff2 (i);
+}
+
+
+
+
+Vertical fusion:
+
+
+
+- Replacing an array with a scalar
+- Eliminating n array reads and writes
+
+
+
+for (int i = 0; i < n; i++) {
+y [i] = 2 ∗ x [i];
+}
+for (int i = 0; i < n; i++) {
+z [i] = 4 + y [i];
+}
+return z;
+
+
+
+
+
+11.3.4. Loop fission
+
+
+Oposite of fussion.
+
+
+- Sometimes one is better, sometimes the other.
+
+
+
+Can be better for cache reasons.
+
+
+for (int i = 0; i < n; i++)
+{
+ doStuff1 (i);
+ doStuff2 (i);
+}
+
+
+
+
+for (int i = 0; i < n; i++)
+{
+ doStuff1 (i);
+}
+
+for (int i = 0; i < n; i++)
+{
+ doStuff2 (i);
+}
+
+
+
+
+
+
+11.4. Other optimizations
+
+
+
+11.4.1. Inlining
+
+
+let x = 5 in x * y + x
+
+
+
+
+5 * y + 5
+
+
+
+
+
+11.4.2. Common Subexpression Elimination
+
+
+Opposite of inlining: Tradeoff between computation and memory
+
+
+
+cos(5*x)/(1+cos(5*x))
+
+
+
+
+let y=cos(5*x) in y / (1+y)
+
+
+
+
+
+
+11.5. Compiler pipeline
+
+
+
+
+
+
+
+- Source: haskell
+- Desugared: guards, typeclass and do desugared
+- Core: more lambda dan haskell, optimizations are looped. An inline can open up another optimization for example
+- STG: intermediate, basically a primitive version of c
+- CMM: actual build target
+
+
+
+
-Created: 2024-01-08 Mon 17:40
+
+Created: 2024-01-29 Mon 17:32
diff --git a/notes.org b/notes.org
index 57e6aab..021b0bb 100644
--- a/notes.org
+++ b/notes.org
@@ -1904,7 +1904,10 @@ Hint: you will also need to implement
(or use a GADT)
-}
#+END_SRC
-* TODO Pumping Lemmas, proving (non)regular languages
+* Pumping Lemmas, proving (non)regular languages
+** General strategy for proving a language (non) regular
+Regular language: a language that can be expressed using a regular expression, sometimes defined as a language recognised by a finite automaton.
+
Generally, proving that a language does not belong to a certain class is much more difficult than proving that it does.
In the case of regular languages,
@@ -1916,41 +1919,24 @@ The strategy is as follows:
1. we expose a limitation in the formalism (in this case, in the concept of finite state automata);
2. from this limitation, we derive a property that all languages in the class (in this case, regular languages) must have;
3. therefore, if a language does not have that property, it cannot be in the class.
-
+** Proving a language non-regular
Assume we have a [[dfa][deterministic finite state automaton]]
-
-How many different states do we visit...
-- If the string has length 0?
- - One (the start state).
-- If the string has length 1?
- - Two or one. One if for the given terminal, the start state has a transition to itself, i.e., we walk through a loop.
-- If the string has length 2?
- - Three or two or one. If less than three, we visit at least one state twice, i.e., walk through a loop.
+*** Strategy step 1: limitation in the formalism
+we expose a limitation in the formalism (in this case, in the concept of finite state automata)
Any finite state automaton has a finite number of states.
Assume we have one with n states.
-*Question*
-
How many different states do we visit while reading a string that is accepted and has length n?
-*Answer*
-According to the previous considerations, n + 1 or less, and if less, we traverse a loop.
-But there are only n states, so we cannot traverse n + 1 different states. Therefore, we must traverse a loop.
+n + 1 or less, and if less, we traverse a loop. But there are only n states, so we cannot traverse n + 1 different states. Therefore, we must traverse a loop.
We have done the first step of the strategy. We have found a limitation in the formalism. Now we have to derive a property for all regular languages from that.
+*** Step 2: property of language class
+From previous limitation, we derive a property that all languages in the class (in this case, regular languages) must have.
-*Question*
-
-What can we say about a loop in a finite state automaton?
-
-*Answer*
-
-We can traverse it arbitrarily often.
-
-To be more precise: if we have a word that is accepted and traverses the loop once, then the words that follow the same path and traverse the loop any other number of times are also accepted
-
+If we have a word that is accepted and traverses the loop once, then the words that follow the same path and traverse the loop any other number of times are also accepted
[[file:Pumping_Lemmas,_proving_(non)regular_languages/2024-01-20_15-57-18_screenshot.png]]
@@ -1972,16 +1958,15 @@ A loop has to occur in every subword of at least length n:
- Then y has to be of form uvw where v is not empty and corresponds to a loop.
- All words of the form $xuv^iwz$ for $i\in\mathbb{N}$ are accepted
-Pumping lemma for regular languages
-
-For every regular language L, there exists an $n\in \mathbb{N}$
-- (corresponding to the number of states in the automaton)
-such that for every word $xyz$ in L with $|y| \ge n$,
-- (this holds for every long substring of every word in L)
-we can split y into three parts, $y = uvw$, with $|v| > 0$,
-- (v is a loop)
-such that for every $i\in\mathbb{N}$ , we have $xuv^iwz \in L$
-
+*Pumping lemma for regular languages:*
+- For every regular language L, there exists an $n\in \mathbb{N}$
+ - (corresponding to the number of states in the automaton)
+- such that for every word $xyz$ in L with $|y| \ge n$,
+ - (this holds for every long substring of every word in L)
+- we can split y into three parts, $y = uvw$, with $|v| > 0$,
+ - (v is a loop)
+- such that for every $i\in\mathbb{N}$ , we have $xuv^iwz \in L$
+*** Step 3: pumping lemma
The we proceed with the final step of the strategy. In order to show that a language is not regular, we show that it does not have the pumping lemma property as follows:
- We assume that the language is regular.
- We use the pumping lemma to derive a word that must be in the language, but is not:
@@ -1998,7 +1983,7 @@ Using the pumping lemma - strategy
- because you don’t know where the loop may be
- there exists a number i (you figure out the number),
- such that $xuv^iwz\notin L$ (you have to prove it).
-
+** TODO Proving context-free grammar
A context-free grammar consists of a sequence of productions:
- the left hand side is always a nonterminal,
- the right hand side is any sequence of terminals and nonterminals.
@@ -2006,30 +1991,25 @@ A context-free grammar consists of a sequence of productions:
One nonterminal of the grammar is the start symbol.
*Context-sensitive grammars* drop the restriction on the left hand side:
-a N b → x
+- $a N b \rightarrow x$
Context-sensitive grammars are as powerful as any other computing formalism:
-- Turing machines,
-- λ-calculus.
-
-Not interesting from a parsing perspective.
+- Turing machines
+- λ-calculus
If we want to prove that a certain language is not context-free, we can apply the same strategy as for regular languages:
- we expose a limitation in the formalism (in this case, in the concept of context-free grammars);
- from this limitation, we derive a property that all languages in the class (in this case, context-free languages) must have;
- therefore, if a language does not have that property, it cannot be in the class.
-
-If we want to prove that a certain language is not context-free, we can apply the same strategy as for regular languages:
-- we expose a limitation in the formalism (in this case, in the concept of context-free grammars);
-- from this limitation, we derive a property that all languages in the class (in this case, context-free languages) must have;
-- therefore, if a language does not have that property, it cannot be in the class.
-
+*** Step 1: limitation in the formalism
This time, we analyze parse trees rather than finite state
automata.
- We can produce parse trees of arbitrary depth if we find words in the language that are long enough, because the number of children per node is bounded by the maximum length of a right hand side of a production.
- Once a path from a leaf to the root has more than n internal nodes, where n is the number of nonterminals in the grammar, one nonterminal has to occur twice on such a path
-
+ - consequently a subtree can be inserted as often as desired
+*** Step 2: property of language class
If the word is long enough, we have a derivation of the form
+- For a word to be a certain length, some non-terminals must occur twice
$$S \Rightarrow^* uAy \Rightarrow^* uvAxy \Rightarrow^* uvwxy$$
@@ -2045,7 +2025,7 @@ We can thus derive
$$S\Rightarrow ^* uAy \Rightarrow ^* uv^iwx^iy$$
for any $i\in\mathbb N$
-
+*** Step 3: pumping lemma
Pumping lemma for context-free languages
For every context-free language L,
@@ -2053,6 +2033,50 @@ For every context-free language L,
- for every word $z\in L$ with $|z| \ge n$
- we can split z into five parts, $z = uvwxy$, with $|vx| > 0$ and $|vwx| \ge n$, such that
- for every $i \in N$, we have $uv^iwx^iy \in L$.
+
+The n lets us limit the size of the part that gets pumped, similar to how the pumping lemma for regular languages lets us choose the subword that contains te loop.
+
+*Using the pumping lemma:*
+- For every number n,
+- find a word z in L with |z| ⩾ n (you choose the word),
+- such that for every splitting z = uvwxy with |vx| > 0 and |vwx| ⩽ n,
+- there exists a number i (you choose the number),
+- such that $uv^iwx^iy \notin L$ (you have to prove it)
+
+Example:
+- The language $L = \{a^mb^mc^m | m \in \mathbb N\}$ is not context-free.
+- Let n be any number.
+- We then consider the word $z = a^nb^nc^n$.
+- From the pumping lemma, we learn that we can pump z, and that the part that gets pumped is smaller than n.
+- The part being pumped can thus not contain a’s, b’s and c’s at the same time, and is not empty either. In all these cases, we pump out of the language (for any $i \ne 1$)
+** Normal forms
+Context-free grammars can be wildly complex, in general.
+
+But all of them can be brought into more normalised forms.
+
+- We call them normal forms.
+
+We get to them by applying grammar transformations [[*Grammar transformations][(see lecture 4)]].
+
+*** Chomksy Normal Form
+A context-free grammar is in Chomsky Normal Form if each production rule has one of these forms:
+- A → B C
+- A → x
+- S → ε
+
+where A, B, and C are nonterminals, x is a terminal, and S is the start symbol of the grammar. Also, B and C cannot be S
+- No rule produces ε except (possibly) from the start.
+- No chain rules of the form A → B.
+- Parse trees are always binary.
+*** Greibach Normal Form
+A context-free grammar is in Greibach Normal Form if each production rule has one of these forms:
+- $A \rightarrow xA_1A_2 . . . A_n$
+- $S \rightarrow \epsilon$
+where $A, A_1, . . . , A_n$ are nonterminals $(n \ge 0)$, $x$ is a terminal, and S is the start symbol of the grammar and does not occur in any right hand side.
+- At most one rule produces ε, and only from the start.
+- No left recursion.
+- A derivation of a word of length n has exactly n rule applications (except ε).
+- Generalizes GNF for regular grammars (where n ⩽ 1)
* Optimizations
** Optimization passes
What is a compiler optimization?
@@ -2170,10 +2194,10 @@ Vertical fusion:
#+BEGIN_SRC csharp
for (int i = 0; i < n; i++) {
- y [i] = 2 ∗ x [i];
+y [i] = 2 ∗ x [i];
}
for (int i = 0; i < n; i++) {
- z [i] = 4 + y [i];
+z [i] = 4 + y [i];
}
return z;
#+END_SRC
-