[WIP] Relational Algebra example. #395

duvenaud · 2020-12-29T20:01:19Z

In principle Dex's record system could use tables of records to do the sorts of data-munging usually done by packages such as pandas.

In order to match the standard relational algebra, there are a couple of things that I don't think can be expressed in the current type system:

Some functions need to flatten their records in order to match their standard definitions, but I don't know how to type this. I think something like this would suffice: def combine_records (a:{&...l}) (b:{&...r}): ({& ...l ...r}) = todo
but I don't think the current syntax supports this. Any thoughts, @danieldjohnson ?
To rename a field, one needs to manually create a renaming isomorphism. This isn't bad but it's a little boilerplate-y.
We still need to write our own equality instances for records, as mentioned in No support for polymorphism over record fields #258.
Equality instances for records wouldn't be a problem if there was some sort of isomorphism that let one access the data of an arbitrary record.

Another missing feature (as discussed with @apaszke and @dougalm is the ability to create custom index sets. Ideally, instead of tracking primary keys, one would be able to re-index a table of records by one of its columns. Something like:

def indexby (_: Eq b) ?=>
  (field : Iso ({&} & a) (b & c))
  (table:n=>a) : (elementsOfColumn table b)=>c = todo
  
elementsOfColumn table b (_: Eq b) ?=>
  (field : Iso ({&} & a) (b & c))
  (table:n=>a) : Type = todo

but I don't know how to write elementsOfColumn to create a new index set.

examples/relational_algebra.dx

danieldjohnson · 2020-12-29T22:13:12Z

Yeah, the current syntax doesn't support record flattening. I think it's actually a more fundamental issue than that, too, because the concatenation of two records can't be unified with a concrete record type in a unique way (since the fields could come from either l or r). That means that a type like {& ...l ...r} probably couldn't directly be used as a type in the core IR during type inference (under the current design of type inference), and would have to be handled with some other machinery (like typeclass dictionary synthesis, perhaps).

(I think @dougalm and I had a conversation a few months ago where we were wondering if there were any practical use cases for record concatenation. Looks like this may be one!)

My not-fully-thought-through hypothesis is that record concatenation and record equality are both special cases of making it possible to iterate over the fields of arbitrary records in user-space, which could be accomplished by somehow transforming records into a generic representation. But I'm still not sure how much work that would be; it's something I'm considering playing around with in the future but I don't know when I'll get around to it.

Similar to the manual renaming isomorphism, it should be possible right now to manually write out a concatenation for two tables by providing an isomorphism that lists all of the fields from l and moves them one by one into r. But admittedly that's a pretty annoying thing to have to do.

duvenaud · 2020-12-29T23:49:18Z

Thanks for your thoughts, @danieldjohnson !

Yeah, the current syntax doesn't support record flattening. I think it's actually a more fundamental issue than that, too, because the concatenation of two records can't be unified with a concrete record type in a unique way (since the fields could come from either l or r). That means that a type like {& ...l ...r} probably couldn't directly be used as a type in the core IR during type inference (under the current design of type inference), and would have to be handled with some other machinery (like typeclass dictionary synthesis, perhaps).

I suspected as much, or you would have included an example of it in isomorphisms.dx :)

(I think @dougalm and I had a conversation a few months ago where we were wondering if there were any practical use cases for record concatenation. Looks like this may be one!)

Yes, although we should keep in mind that the main use case here, join, might often just be an awkward way of composing indexed tables. A long time ago @dougalm sketched out for me some examples where complicated lookups could be simply (and type-safely) expressed through nested indexing. I think the next step should be to come up with a few more concrete use cases we'd like to be able to support. It might turn out that the relational algebra can be usefully refactored!

My not-fully-thought-through hypothesis is that record concatenation and record equality are both special cases of making it possible to iterate over the fields of arbitrary records in user-space, which could be accomplished by somehow transforming records into a generic representation. But I'm still not sure how much work that would be; it's something I'm considering playing around with in the future but I don't know when I'll get around to it.

I also am getting the sense that moving things to a user-manipulable representation is going to be necessary for usability. However, I've been surprised by how much you've managed to be able to do with the current representation, so I'm not sure. Part of the motivation for this example was for me to see how far I could push this isomorphism framework, and in this case I was able to push it a little further than I thought, but it looks like I'm hitting some real limitations.

Similar to the manual renaming isomorphism, it should be possible right now to manually write out a concatenation for two tables by providing an isomorphism that lists all of the fields from l and moves them one by one into r. But admittedly that's a pretty annoying thing to have to do.

Right, I found that example in isomorphisms.dx. But yes, if anyone should be able to avoid boilerplate, it's languages with dependent types!

apaszke · 2021-01-04T17:55:29Z

examples/relational_algebra.dx

+roughly following [Wikipedia](https://en.wikipedia.org/wiki/Relational_algebra).
+
+'Right now there are two major differences from the standard relational algebra:
+1. We're not using sets of records, but rather tables.  I'm not sure which is better.


This is fine. SQL also doesn't have set semantics, because every table entry has a distinct primary key (which corresponds to table's index set in Dex!).

apaszke · 2021-01-04T17:56:05Z

examples/relational_algebra.dx

+
+def cartesian_product (r:n=>a) (s:m=>b) : (n & m)=>(a & b) =
+  -- Follows the set theory defintion, not the relational
+  -- algebra definition, which would flatten a & b.


But... it does flatten to a & b

I wrote that poorly, I said it "would flatten a & b", i.e. the relational algebra definition would flatten a & b to (a_1, a_2, ... b_1, b_2...), but this implementation doesn't do that.

apaszke · 2021-01-04T17:58:17Z

examples/relational_algebra.dx

+  (left_iso : Iso ({&} & a1) (b & c1))
+  (right_iso: Iso ({&} & a2) (b & c2))
+  (left_tab:n=>a1) (right_tab:m=>a2) : List (b & c1 & c2) =
+  concat for (i, j).


We should just write filter : (n=>(Maybe a)) -> List a. Or maybe call it concatMaybes to match catMaybes from Haskell.

google-cla bot added the cla: yes label Dec 29, 2020

oxinabox reviewed Dec 29, 2020

View reviewed changes

examples/relational_algebra.dx Outdated Show resolved Hide resolved

examples/relational_algebra.dx Show resolved Hide resolved

examples/relational_algebra.dx Outdated Show resolved Hide resolved

apaszke reviewed Jan 4, 2021

View reviewed changes

duvenaud force-pushed the relational branch from 391ab46 to de1fbd3 Compare January 6, 2021 02:17

duvenaud added 3 commits February 7, 2022 21:13

Added draft relational algebgra example.

9933717

Updated relational algebra example to new syntax.

5620470

Updated relational algebra example to new syntax.

0323f13

duvenaud force-pushed the relational branch from de1fbd3 to 0323f13 Compare February 8, 2022 02:20

duvenaud changed the base branch from main to safe-names-dev February 8, 2022 02:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Relational Algebra example. #395

[WIP] Relational Algebra example. #395

duvenaud commented Dec 29, 2020

danieldjohnson commented Dec 29, 2020 •

edited

Loading

duvenaud commented Dec 29, 2020 •

edited

Loading

apaszke Jan 4, 2021

apaszke Jan 4, 2021

duvenaud Jan 7, 2021

apaszke Jan 4, 2021

[WIP] Relational Algebra example. #395

Are you sure you want to change the base?

[WIP] Relational Algebra example. #395

Conversation

duvenaud commented Dec 29, 2020

danieldjohnson commented Dec 29, 2020 • edited Loading

duvenaud commented Dec 29, 2020 • edited Loading

apaszke Jan 4, 2021

Choose a reason for hiding this comment

apaszke Jan 4, 2021

Choose a reason for hiding this comment

duvenaud Jan 7, 2021

Choose a reason for hiding this comment

apaszke Jan 4, 2021

Choose a reason for hiding this comment

danieldjohnson commented Dec 29, 2020 •

edited

Loading

duvenaud commented Dec 29, 2020 •

edited

Loading