Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

corrections to chapter 2 about syntax #20

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 22 additions & 24 deletions anatomy.lhs
Original file line number Diff line number Diff line change
Expand Up @@ -157,18 +157,17 @@ what negative numbers are or what subtraction or exponentiation do, but there is
room for confusion about how to write them down. %Simp5

The conceptual structure of a given expression can be defined much more
clearly using pictures. For example, the following pictures make a
clearly using pictures of trees. For example, the following pictures make a
clear description of the underlying arithmetic operations specified in the
expressions given above: %Simp6

![Graphical illustration of abstract structure](figures/abstract_syntax.eps) %Simp7

These pictures are similar to *sentence diagramming* that is taught in grade school
to explain the structure of English. %Simp1
to explain the structure of English. These trees are called |Abstract Syntax Trees| %Simp1

The last picture represents the last two expressions in the previous example.
This is because the pictures do not need parentheses, since the grouping
structure is explicit. %Simp8
The last tree represents the last two expressions in the previous example.
This is because those two expressions are the same thing conceptually, the difference is merely that the first of the two has parentheses around the "8*2". The trees however do not need parentheses, because the grouping is explicit by the nature of the structure of the tree. %Simp8

## Syntax

Expand Down Expand Up @@ -224,7 +223,7 @@ into abstract syntax. %Abst8

### Concrete Syntax and Grammars

The concrete syntax of a language describes how the abstract
The concrete syntax of a language describes how the abstract
concepts in the language are represented as text. For example,
lets consider how to convert the string |"3+81\*2"| into the
abstract syntax |Add (Number 3) (Multiply (Number 81) (Number 2))|.
Expand Down Expand Up @@ -256,7 +255,7 @@ INCLUDE:BasicToken
> | Symbol String
> -- %BasicToken

A |Token| is either an integer token or a symbol token with a string.
Here, we define that a |Token| is either an integer token or a symbol token with a string.
For example, the tokens from the string |"3 + 81 * 2"| are: %Toke6

> Digits 3
Expand All @@ -266,11 +265,10 @@ For example, the tokens from the string |"3 + 81 * 2"| are: %Toke6
> Digits 2
> -- %Toke7

Program souce code is not broken up into tokens straight out of the box, instead it is merely a text file, which is to say, it is a string of characters. A program that transforms a list of characters into a list of tokens is called a *lexer*.

The [Lexer](./code/Lexer.hs.htm) file contains the code for a simple
lexer that creates tokens in this form. It defines a function |lexer|
that transforms a string (i.e. a list of characters) into a list of tokens.
The |lexer| function takes as input a list of symbols and a list of
keywords. %Toke8
lexer that creates tokens in the form we speciied above. The |lexer| function takes as input a list of symbols and a list of keywords. %Toke8

#### Grammars

Expand Down Expand Up @@ -392,46 +390,46 @@ Written out explicitly, this grammar means: %Gram21
Given this lengthy and verbose explanation, I hope you can see the value
of using a more concise notation! %Gram29

Putting a legal expression through a grammer will yield an abstract syntax tree. The process in which a sequential list of tokens is converted into a hierarchical syntax tree is called |Parsing|.

TODO: more extensive explanation of parsing. Move the subsection on tokens to the begining of the section on syntax for clarity, since tokens are the atoms and they should go first

Just like other kinds of software, there are many design
decisions that must be made in creating a grammar. Some grammars
work better than others, depending on the situation. %Gram30

#### Ambiguity, Precedence and Associativity

One problem with the straightforward grammar is allows for *ambiguity*.
A sentence is ambiguous if there is more than one that it
can be derived by a grammar. For example, the expression |1-2-3|
is ambiguous because it can be parsed in two ways to create
two different abstract syntax trees [TODO: define "parse"]: %Ambi1
One problem with grammars is that sometimes they allow for *ambiguity*. A grammar is ambiguous if there exists a sentence that can be correctly parsed by the grammar into more than one syntax tree. For example, the expression |1-2-3| is ambiguous because it can be parsed in two ways to create the following two distinct abstract syntax trees:

> Subtract (Number 1) (Subtract (Number 2) (Number 3))
> Subtract (Subtract (Number 1) (Number 2)) (Number 3)
> -- %Ambi2

TODO: show the parse trees? define "parse tree" %Ambi3

The same abstract syntax can be generated by parsing |1-(2-3)| and |(1-2)-3|.
We know from our training that the second one is the "correct" version,
The same original sentence |1-2-3| generates two different parse trees with our currently used grammar, the first by parsing it as |1-(2-3)|, and the second by parsing it as |(1-2)-3|.
We know from our training in arithmetic that the second one is the "correct" version,
because subtraction operations are performed left to right.
The technical term for this is that subtraction is *left associative*.
(note that this use of the associative is not the same as the
mathematical concept of associativity.)
But the grammar as it's written doesn't contain any information
associativity, so it is ambiguous. %Ambi4
But the grammar as it's written doesn't contain any information regarding
associativity, which makes the grammar ambiguous. %Ambi4

Similarly, the expression |1-2*3| can be parsed in two ways: %Ambi5

> Subtract (Number 1) (Multiply (Number 2) (Number 2))
> Multiply (Subtract (Number 1) (Number 2)) (Number 2)
> -- %Ambi6

The same abstract syntax can be generated by parsing |1-(2*3)| and |(1-2)*3|.
Again we know that the first version is the correct one, because
The grammar generates two different abstract syntax trees by parsing the sentence as |1-(2*3)| and as |(1-2)*3|.
Again we know from our knowledge of arithmetic that the first version is the correct one, because
multiplication should be performed before subtraction. Technically,
we say that multiplication has higher *precedence* than subtraction. %Ambi7

The grammar can be adjusted to express the precedence and associativity
of the operators. Here is an example: %Ambi8
The grammar can be modified to express the precedence and associativity
of the operators and eliminate ambiguity. Here is an example: %Ambi8

INCLUDE:SimpleGrammar
> Term : Term '+' Factor { Add $1 $3 }
Expand Down