Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Relational Algebra example. #395

Open
wants to merge 3 commits into
base: safe-names-dev
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
114 changes: 114 additions & 0 deletions examples/relational_algebra.dx
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
'# The Relational Algebra
Unlike other languages which have bespoke "table" or "dataframe" datatypes,
Dex's record system allows the manipulation of tables of records
to do the sorts of data-munging usually done by packages such as pandas.
duvenaud marked this conversation as resolved.
Show resolved Hide resolved

'This is an attempt to express the basic operations on datasets
described by the relational algebra,
roughly following [Wikipedia](https://en.wikipedia.org/wiki/Relational_algebra).

'Right now there are two major differences from the standard relational algebra:
1. We're not using sets of records, but rather tables. I'm not sure which is better.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine. SQL also doesn't have set semantics, because every table entry has a distinct primary key (which corresponds to table's index set in Dex!).

2. Pretty much all the output types of these functions need to be flattened in order
to match their standard defintions, but I don't know how to type this.

'### Cartesian Product

def cartesian_product {n m a b} (r:n=>a) (s:m=>b) : (n & m)=>(a & b) =
-- Follows the set theory defintion, not the relational
-- algebra definition, which would flatten a & b.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But... it does flatten to a & b

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote that poorly, I said it "would flatten a & b", i.e. the relational algebra definition would flatten a & b to (a_1, a_2, ... b_1, b_2...), but this implementation doesn't do that.

for (i, j). (r.i, s.j)

'#### Cartesian Product Example

table1 = [{name = "Bob", age = 32, size = 3.3},
{name = "Finn", age = 0, size = 1.1},
{name = "Bea", age = 2, size = 2.2}]

table2 = [{name = "Bob", bool = True},
{name = "Bea", bool = False},
{name = "Qi", bool = True}]

cartesian_product table1 table2


'### Projection

def project {a b c n} (field: Iso ({&} & a) (b & c)) (table:n=>a) : n=>b =
for i. fst $ appIso (splitR &>> field) table.i

'#### Projection Example

project (#&name &>> #&size) table1


'### Rename
Todo: Consider doing this in-place.

def rename {a b c d n} (oldFieldIso: Iso ({&} & a) (b & c))
(renameIso: Iso b d) (table:n=>a) : n=>(d & c) =
for i.
(old_name_field, rest_fields) = appIso (splitR &>> oldFieldIso) table.i
renamed_field = appIso renameIso old_name_field
(renamed_field, rest_fields)

'#### Rename Example
Awkwardly, we need to construct our own isomorphism to wrap
the renaming function.

def myRenameIso {a} : Iso {name:a} {newname:a} =
MkIso {fwd = \x. {newname = (getAt #name x)},
bwd = \x. {name = (getAt #newname x)}}

rename #&name myRenameIso table1



'## Inner Join
There are a couple of awkward things about this implementation of inner join:
1. We need to repeat the name isomorphism twice to call it.
This is necessary because the type of the other fields in the record
are different for the two tables being joined, even if it's on the
same record name. There might a way to generate the second isomorphism
automatically, but I don't know what it is.
2. You need to write your own equality instance for the field being
joined on.
3. As in the other examples, the resulting records should be flattened.


def inner_join_one {b a1 a2 c1 c2} [Eq b]
-- Would rather have an Eq instance for just the inner
-- type of b, not its named type.
(left_iso : Iso ({&} & a1) (b & c1))
(right_iso: Iso ({&} & a2) (b & c2))
(left:a1) (right:a2) : Maybe (b & c1 & c2) =
(left_b, left_c) = appIso (splitR &>> left_iso) left
(right_b, right_c) = appIso (splitR &>> right_iso) right
if left_b == right_b
-- Can we unpack left_b and right_b to compare only
-- their contents and not their names?
then Just (left_b, left_c, right_c)
else Nothing

def inner_join {b a1 a2 c1 c2 n m} [Eq b]
(left_iso : Iso ({&} & a1) (b & c1))
(right_iso: Iso ({&} & a2) (b & c2))
(left_tab:n=>a1) (right_tab:m=>a2) : List (b & c1 & c2) =
concat for (i, j).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just write filter : (n=>(Maybe a)) -> List a. Or maybe call it concatMaybes to match catMaybes from Haskell.

case inner_join_one left_iso right_iso left_tab.i right_tab.j of
Nothing -> AsList 0 []
Just ans -> AsList 1 [ans]


'#### Inner Join Example
Right now we have to manually make an equality instance
for the field we want to join on. This can be avoided either
by automatically generating equality instances for records,
or by somehow allowing an unpack operation to directly access
the data held by any field.

-- This bespoke equality instance should be automatically generated.
instance {a} [Eq a] Eq {name: a}
(==) = \r1 r2. (getAt #name r1) == (getAt #name r2)

inner_join #&name #&name table1 table2