From c540944278830f8b1702a76898657cebe93db583 Mon Sep 17 00:00:00 2001 From: Oliver Lee Date: Wed, 9 Apr 2014 16:48:20 -0500 Subject: [PATCH 1/2] corrections to chapter 2 about syntax --- anatomy.lhs | 46 ++++++++++++++++++++++------------------------ 1 file changed, 22 insertions(+), 24 deletions(-) diff --git a/anatomy.lhs b/anatomy.lhs index c51dbf7..11df3ff 100644 --- a/anatomy.lhs +++ b/anatomy.lhs @@ -157,18 +157,17 @@ what negative numbers are or what subtraction or exponentiation do, but there is room for confusion about how to write them down. %Simp5 The conceptual structure of a given expression can be defined much more -clearly using pictures. For example, the following pictures make a +clearly using pictures of trees. For example, the following pictures make a clear description of the underlying arithmetic operations specified in the expressions given above: %Simp6 ![Graphical illustration of abstract structure](figures/abstract_syntax.eps) %Simp7 These pictures are similar to *sentence diagramming* that is taught in grade school -to explain the structure of English. %Simp1 +to explain the structure of English. These trees are called |Abstract Syntax Trees| %Simp1 -The last picture represents the last two expressions in the previous example. -This is because the pictures do not need parentheses, since the grouping -structure is explicit. %Simp8 +The last tree represents the last two expressions in the previous example. +This is because those two expressions are the same thing conceptually, the difference is merely that the first of the two has parentheses around the "8*2". The trees however do not need parentheses, because the grouping is explicit by the nature of the structure of the tree. %Simp8 ## Syntax @@ -224,7 +223,7 @@ into abstract syntax. %Abst8 ### Concrete Syntax and Grammars -The concrete syntax of a language describes how the abstract +The concrete syntax of a language describes how the abstract concepts in the language are represented as text. For example, lets consider how to convert the string |"3+81\*2"| into the abstract syntax |Add (Number 3) (Multiply (Number 81) (Number 2))|. @@ -256,7 +255,7 @@ INCLUDE:BasicToken > | Symbol String > -- %BasicToken -A |Token| is either an integer token or a symbol token with a string. +Here, we define that a |Token| is either an integer token or a symbol token with a string. For example, the tokens from the string |"3 + 81 * 2"| are: %Toke6 > Digits 3 @@ -266,11 +265,10 @@ For example, the tokens from the string |"3 + 81 * 2"| are: %Toke6 > Digits 2 > -- %Toke7 +Program souce code is not broken up into tokens straight out of the box, instead it is merely a text file, which is to say, it is a string of characters. A program that transforms a list of characters into a list of tokens is called a |lexer|. + The [Lexer](./code/Lexer.hs.htm) file contains the code for a simple -lexer that creates tokens in this form. It defines a function |lexer| -that transforms a string (i.e. a list of characters) into a list of tokens. -The |lexer| function takes as input a list of symbols and a list of -keywords. %Toke8 +lexer that creates tokens in the form we speciied above. The |lexer| function takes as input a list of symbols and a list of keywords. %Toke8 #### Grammars @@ -392,17 +390,17 @@ Written out explicitly, this grammar means: %Gram21 Given this lengthy and verbose explanation, I hope you can see the value of using a more concise notation! %Gram29 +Putting a legal expression through a grammer will yield an abstract syntax tree. The process in which a sequential list of tokens is converted into a hierarchical syntax tree is called |Parsing|. + +TODO: more extensive explanation of parsing. Move the subsection on tokens to the begining of the section on syntax for clarity, since tokens are the atoms and they should go first + Just like other kinds of software, there are many design decisions that must be made in creating a grammar. Some grammars work better than others, depending on the situation. %Gram30 #### Ambiguity, Precedence and Associativity -One problem with the straightforward grammar is allows for *ambiguity*. -A sentence is ambiguous if there is more than one that it -can be derived by a grammar. For example, the expression |1-2-3| -is ambiguous because it can be parsed in two ways to create -two different abstract syntax trees [TODO: define "parse"]: %Ambi1 +One problem with grammars is that sometimes they allow for |ambiguity|. A grammar is ambiguous if there exists a sentence that can be correctly parsed by the grammar into more than one syntax tree. For example, the expression "1-2-3" is ambiguous because it can be parsed in two ways to create the following two distinct abstract syntax trees: > Subtract (Number 1) (Subtract (Number 2) (Number 3)) > Subtract (Subtract (Number 1) (Number 2)) (Number 3) @@ -410,14 +408,14 @@ two different abstract syntax trees [TODO: define "parse"]: %Ambi1 TODO: show the parse trees? define "parse tree" %Ambi3 -The same abstract syntax can be generated by parsing |1-(2-3)| and |(1-2)-3|. -We know from our training that the second one is the "correct" version, +The same original sentence (1-2-3) generates two different parse trees with our currently used grammar, the first by parsing it as 1-(2-3), and the second by parsing it as (1-2)-3. +We know from our training in arithmetic that the second one is the "correct" version, because subtraction operations are performed left to right. The technical term for this is that subtraction is *left associative*. (note that this use of the associative is not the same as the mathematical concept of associativity.) -But the grammar as it's written doesn't contain any information -associativity, so it is ambiguous. %Ambi4 +But the grammar as it's written doesn't contain any information regarding +associativity, which makes the grammar ambiguous. %Ambi4 Similarly, the expression |1-2*3| can be parsed in two ways: %Ambi5 @@ -425,13 +423,13 @@ Similarly, the expression |1-2*3| can be parsed in two ways: %Ambi5 > Multiply (Subtract (Number 1) (Number 2)) (Number 2) > -- %Ambi6 -The same abstract syntax can be generated by parsing |1-(2*3)| and |(1-2)*3|. -Again we know that the first version is the correct one, because +The grammar generates two different abstract syntax trees by parsing the sentence as |1-(2*3)| and as |(1-2)*3|. +Again we know from our knowledge of arithmetic that the first version is the correct one, because multiplication should be performed before subtraction. Technically, we say that multiplication has higher *precedence* than subtraction. %Ambi7 -The grammar can be adjusted to express the precedence and associativity -of the operators. Here is an example: %Ambi8 +The grammar can be modified to express the precedence and associativity +of the operators and eliminate ambiguity. Here is an example: %Ambi8 INCLUDE:SimpleGrammar > Term : Term '+' Factor { Add $1 $3 } From d09ef559b1ad348b0c9bf759a186a9afb0e7685c Mon Sep 17 00:00:00 2001 From: Oliver Lee Date: Wed, 9 Apr 2014 18:24:18 -0500 Subject: [PATCH 2/2] minor formatting --- anatomy.lhs | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/anatomy.lhs b/anatomy.lhs index 11df3ff..9913f6f 100644 --- a/anatomy.lhs +++ b/anatomy.lhs @@ -265,7 +265,7 @@ For example, the tokens from the string |"3 + 81 * 2"| are: %Toke6 > Digits 2 > -- %Toke7 -Program souce code is not broken up into tokens straight out of the box, instead it is merely a text file, which is to say, it is a string of characters. A program that transforms a list of characters into a list of tokens is called a |lexer|. +Program souce code is not broken up into tokens straight out of the box, instead it is merely a text file, which is to say, it is a string of characters. A program that transforms a list of characters into a list of tokens is called a *lexer*. The [Lexer](./code/Lexer.hs.htm) file contains the code for a simple lexer that creates tokens in the form we speciied above. The |lexer| function takes as input a list of symbols and a list of keywords. %Toke8 @@ -400,7 +400,7 @@ work better than others, depending on the situation. %Gram30 #### Ambiguity, Precedence and Associativity -One problem with grammars is that sometimes they allow for |ambiguity|. A grammar is ambiguous if there exists a sentence that can be correctly parsed by the grammar into more than one syntax tree. For example, the expression "1-2-3" is ambiguous because it can be parsed in two ways to create the following two distinct abstract syntax trees: +One problem with grammars is that sometimes they allow for *ambiguity*. A grammar is ambiguous if there exists a sentence that can be correctly parsed by the grammar into more than one syntax tree. For example, the expression |1-2-3| is ambiguous because it can be parsed in two ways to create the following two distinct abstract syntax trees: > Subtract (Number 1) (Subtract (Number 2) (Number 3)) > Subtract (Subtract (Number 1) (Number 2)) (Number 3) @@ -408,7 +408,7 @@ One problem with grammars is that sometimes they allow for |ambiguity|. A gramm TODO: show the parse trees? define "parse tree" %Ambi3 -The same original sentence (1-2-3) generates two different parse trees with our currently used grammar, the first by parsing it as 1-(2-3), and the second by parsing it as (1-2)-3. +The same original sentence |1-2-3| generates two different parse trees with our currently used grammar, the first by parsing it as |1-(2-3)|, and the second by parsing it as |(1-2)-3|. We know from our training in arithmetic that the second one is the "correct" version, because subtraction operations are performed left to right. The technical term for this is that subtraction is *left associative*.