Skip to content

Commit

Permalink
First pass at complete chapter
Browse files Browse the repository at this point in the history
  • Loading branch information
noelwelsh committed Oct 5, 2023
1 parent 5882b16 commit 6cd9fdb
Show file tree
Hide file tree
Showing 4 changed files with 169 additions and 20 deletions.
118 changes: 116 additions & 2 deletions src/pages/adt/algebra.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,119 @@
## The Algebra of Algebraic Data Types

Algebra of algebraic data types
A question that sometimes comes up is where the "algebra" in algebraic data types comes from. I want to talk about this a little bit and show some of the algebraic manipulations that can be done on algebraic data types.

Exponential types, quotient types.
The term algebra is used in the sense of abstract algebra, an area of mathematics.
Abstract algebra deals with algebraic data structures.
An algebraic structure consists of a set of values, operations on that set, and properties that those operations must maintain.
An example is integers, the operations addition and multiplication, and the familiar properties of these operations such as associativity, which says that $a + (b + c) = (a + b) + c$.
The abstract in abstract algebra means that the field doesn't deal with concrete values like integers---that would be far too easy to understand---and instead with abstractions with wacky names like semigroup and monoid.
We'll see a lot more of these soon enough!

Algebraic data structures correspond to an algebraic structure called a ring.
A ring has two operations, which are conventionally denoted with $+$ and $\times$.
You'll perhaps guess that these correspond to sum and product types respectively, and you'd be absolutely correct.
What about the properties of these operations?
We'll they are similar to what we know from basic algebra:

- $+$ and $\times$ are associative, so $a + (b + c) = (a + b) + c$ and likewise for $\times$;
- $a + b = b + a$, known as commutivitiy;
- there is an identity $0$ such that $a + 0 = a$;
- there is an identity $1$ such that $a \times 1 = a$;
- there is distribution, such that $a \times (b + c) = (a \times b) + (a \times c)$

So far, so abstract.
Let's make it concrete by looking at actual examples in Scala.

Remember the algebraic data types work with types, so the operations $+$ and $\times$ take types as parameters.
So $Int \times String$ is equivalent to

```scala mdoc:silent
final case class IntAndString(int: Int, string: String)
```

To avoid creating all these names we can use tuples instead

```scala mdoc:reset:silent
type IntAndString = (Int, String)
```

We can do the same thing for $+$. $Int + String$ is

```scala mdoc:silent
enum IntOrString {
case IsInt(int: Int)
case IsString(string: String)
}
```

or just

```scala mdoc:reset:silent
type IntOrString = Either[Int, String]
```

#### Exercise: Identities {-}

Can you work out which Scala type corresponds to the identity $1$ for product types?

<div class="solution">
It's `Unit`, because adding `Unit` to any product doesn't add any more information.
So, `Int` contains exactly as much information as $Int \times Unit$ (written as the tuple `(Int, Unit)` in Scala).
</div>

What about the Scala type corresponding to the identity $0$ for sum types?

<div class="solution">
It's `Nothing`, following the same reasoning as products: a case of `Nothing` adds no further information (and we cannot even create a value with this type.)
</div>


What about the distribution law? This allows us to manipulate algebraic data types to form equivalent, but perhaps more useful, representations.
Consider this example of a user data type.

```scala mdoc:silent
final case class Person(name: String, permissions: Permissions)
enum Permissions {
case User
case Moderator
}
```

Written in mathematical notation, this is

$$
Person = String \times Permissions
$$
$$
Permissions = User + Moderator
$$

Performing substitution gets us

$$
Person = String \times (User + Moderator)
$$

Applying distribution results in

$$
Person = (String \times User) + (String \times Moderator)
$$

which in Scala we can represent as

```scala mdoc:reset:silent
enum Person {
case User(name: String)
case Moderator(name: String)
}
```

Is this representation more useful? I can't say without the context of where the code is being used. However I can say that knowing this manipulation is possible, and correct, is useful.

There is a lot more that could be said about algebraic data types, but at this point I feel we're really getting into the weeds.
I'll finish up with a few pointers to other interesting facts:

- Exponential types exist. They are functions! A function `A => B` is equivalent to $b^a$.
- Quotient types also exist, but they are a bit weird. Read up about them if you're interested.
- Another interesting algebraic manipulation is taking the derivative of an algebraic data type. This gives us a type of iterator, known as a zipper, for that type.
27 changes: 18 additions & 9 deletions src/pages/adt/applications.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,23 +16,32 @@ which we might use in a system monitoring application and also our old friend th
final case class Uri(protocol: String, host: String, port: Int, path: String)
```

I think these are the straightforward examples, but those new to algebraic data types often don't realise how many other uses cases there are.
I think these are the simplest examples, but those new to algebraic data types often don't realise how many other uses cases there are.
We'll see combinator libraries, an extremely important use, in the next chapter.
Here I want to give a few examples of finite state machines as another use case.

Finite state machines occur everywhere in programming. The state of a user interface component, such as open or closed, or visible or invisible, can be modelled as a finite state machine. The state of a job in a distributed job system, like Spark, can also be modelled as a finite state machine.
When using an algebraic data type we're not restricted to simple enumerations of state.
Finite state machines occur everywhere in programming. The state of a user interface component, such as open or closed, or visible or invisible, can be modelled as a finite state machine. That's probably not relevant to most Scala programmers, so let's consider instead a distributed job server. The idea is here that users submit jobs to run on a cluster of computers. The supervisor is responsible for selecting a computer on which to run the job, monitoring it, and collecting the result at the end of successful completion.

In a very simple system, we might represent jobs as having four states:

1. Queued: the job is in the queue to be run.
2. Running: the job is running on a computer.
3. Completed: the job successfully finished and we have collected the result.
4. Failed: the job failed to run to completion.

When using an algebraic data type we're not restricted to a simple enumeration of states.
We can also store data within the states.
So, in our job control system, we define jobs as having states.
Here's a simple example.
So, in our job control system, we could define the job states as follows:

```scala mdoc:silent
import scala.concurrent.Future

enum Job[A] {
case Queued(name: String, job: () => A)
case Running(name: String, host: String, result: Future[A])
case Completed(name: String, result: A)
case Failed(name: String, reason: String)
case Queued(id: Int, name: String, job: () => A)
case Running(id: Int, name: String, host: String, result: Future[A])
case Completed(id: Int, name: String, result: A)
case Failed(id: Int, name: String, reason: String)
}
```

If you look around the code you work with, I expect you'll quickly find many other examples.
20 changes: 19 additions & 1 deletion src/pages/adt/conclusions.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,23 @@
## Conclusions

We have covered a lot of material in this chapter. The key points are:

[How to Design Co-Programs](https://www.cs.ox.ac.uk/jeremy.gibbons/publications/copro.pdf)
- algebraic data types represent data expressed as logical ands and logical ors of types;
- algebraic data types are the main way to represent data in Scala;
- structural recursion gives a skeleton for converting a given algebraic data type into any other type;
- structural corecursion gives a skeleton for converting any type into a given algebraic data type; and
- there are other reasoning principles (primarily, following the types) that help us complete structural recursions and corecursions.

There is a lot packed into this chapter.
We'll see many more examples thoughout the rest of the book, which will help reinforce the concepts.
Below are some references that you might find useful if you want to dig in further into the concepts covered in this chapter.


Algebraic data types are standard in introductory material on functional programming, but structural recursion seems to be much less commonly known.
I learned about both from [How to Design Program](https://htdp.org/).
I'm not aware of any concise reference for algebraic data types and structural recursion.
This original material, whatever it is, seems to be too old to be available online and now concepts have become so commonly known that they are assumed background knowledge in most sources.

Corecursion is a bit better documented. [How to Design Co-Programs](https://www.cs.ox.ac.uk/jeremy.gibbons/publications/copro.pdf) covers the main idea we have looked at here. [The Under-Appreciated Unfold](https://dl.acm.org/doi/pdf/10.1145/289423.289455) discusses uses of `unfold`.

[The Derivative of a Regular Type is its Type of One-Hole Contexts] (https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=7de4f6fddb11254d1fd5f8adfd67b6e0c9439eaa) describes the derivative of algebraic data types.
24 changes: 16 additions & 8 deletions src/pages/adt/structural-corecursion.md
Original file line number Diff line number Diff line change
Expand Up @@ -336,10 +336,11 @@ counter = 0
MyList.fill(5)(getAndInc())
```


#### Exercise: Iterate {-}

Implement `iterate` using the same reasoning as we did for `fill`.
This is slightly more complex than `fill` as we need to keep to bits of information: the value of the counter and the current value of type `A`.
This is slightly more complex than `fill` as we need to keep two bits of information: the value of the counter and the value of type `A`.

<div class="solution">
```scala
Expand Down Expand Up @@ -371,7 +372,7 @@ MyList.iterate(0, 5)(x => x - 1)

#### Exercise: Map {-}

Once you've completed `iterate`, try to implement `map` in terms of `unfold`.
Once you've completed `iterate`, try to implement `map` in terms of `unfold`. You'll need to use the destructors to implement it.

<div class="solution">
```scala
Expand All @@ -389,21 +390,28 @@ MyList.iterate(0, 5)(x => x + 1).map(x => x * 2)
```
</div>

Now a quick discussion on destructors. The destructors must do two things:
Now a quick discussion on destructors. The destructors do two things:

1. distinguish the different cases within a sum type; and
2. extract all the elements from each product type.
2. extract elements from each product type.

So for `MyList` the minimal set of destructors is `isEmpty`, which distinguishes `Empty` from `Pair`, and `head` and `tail`.
The extractors are partial functions, in the conceptual, not Scala, sense; they are only defined for a particular product type and throw an exception if used on a different case. You may have also noticed thtat the functions we passed to `fill` are exactly the destructors for natural numbers.

The destructors are another part of the duality between structural recursion and corecursion. Structural recursion is:

So for `MyList` the minimal set of destructors is `isEmpty` (which distinguishes `Empty` from `Pair`), and `head` and `tail`.
The extractors are partial functions (in the conceptual, not Scala, sense): they are only defined for a particular product type and throw an exception if used on a different case.
- defined by pattern matching on the constructors; and
- takes apart an algebraic data type into smaller pieces.

The destructors are another part of the duality between structural recursion and corecursion. Whereas structural recursion is defined by pattern matching on the constructors
Structural corecursion instead is:

- defined by conditions on the input, which may use destructors; and
- build up an algebraic data type from smaller pieces.

One last thing before we leave `unfold`. If we look at the usual definition of `unfold` we'll usually find the following definition.

```scala
def unfold[A, B](in: A)(f: A => Option[(A, B)]): List[B]
```

This is equivalent to the definition we used, just a bit more compact in terms of the interface it presents. We used a more explicit definition that makes the use of the individual elements a little bit easier to understand.
This is equivalent to the definition we used, but a bit more compact in terms of the interface it presents. We used a more explicit definition that makes the structure of the method clearer.

0 comments on commit 6cd9fdb

Please sign in to comment.