Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Relational Algebra example. #395

Open
wants to merge 3 commits into
base: safe-names-dev
Choose a base branch
from

Conversation

duvenaud
Copy link
Contributor

In principle Dex's record system could use tables of records to do the sorts of data-munging usually done by packages such as pandas.

In order to match the standard relational algebra, there are a couple of things that I don't think can be expressed in the current type system:

  1. Some functions need to flatten their records in order to match their standard definitions, but I don't know how to type this. I think something like this would suffice: def combine_records (a:{&...l}) (b:{&...r}): ({& ...l ...r}) = todo
    but I don't think the current syntax supports this. Any thoughts, @danieldjohnson ?
  2. To rename a field, one needs to manually create a renaming isomorphism. This isn't bad but it's a little boilerplate-y.
  3. We still need to write our own equality instances for records, as mentioned in No support for polymorphism over record fields #258.
  4. Equality instances for records wouldn't be a problem if there was some sort of isomorphism that let one access the data of an arbitrary record.

Another missing feature (as discussed with @apaszke and @dougalm is the ability to create custom index sets. Ideally, instead of tracking primary keys, one would be able to re-index a table of records by one of its columns. Something like:

def indexby (_: Eq b) ?=>
  (field : Iso ({&} & a) (b & c))
  (table:n=>a) : (elementsOfColumn table b)=>c = todo
  
elementsOfColumn table b (_: Eq b) ?=>
  (field : Iso ({&} & a) (b & c))
  (table:n=>a) : Type = todo

but I don't know how to write elementsOfColumn to create a new index set.

@google-cla google-cla bot added the cla: yes label Dec 29, 2020
examples/relational_algebra.dx Outdated Show resolved Hide resolved
examples/relational_algebra.dx Show resolved Hide resolved
examples/relational_algebra.dx Outdated Show resolved Hide resolved
@danieldjohnson
Copy link
Collaborator

danieldjohnson commented Dec 29, 2020

Yeah, the current syntax doesn't support record flattening. I think it's actually a more fundamental issue than that, too, because the concatenation of two records can't be unified with a concrete record type in a unique way (since the fields could come from either l or r). That means that a type like {& ...l ...r} probably couldn't directly be used as a type in the core IR during type inference (under the current design of type inference), and would have to be handled with some other machinery (like typeclass dictionary synthesis, perhaps).

(I think @dougalm and I had a conversation a few months ago where we were wondering if there were any practical use cases for record concatenation. Looks like this may be one!)

My not-fully-thought-through hypothesis is that record concatenation and record equality are both special cases of making it possible to iterate over the fields of arbitrary records in user-space, which could be accomplished by somehow transforming records into a generic representation. But I'm still not sure how much work that would be; it's something I'm considering playing around with in the future but I don't know when I'll get around to it.

Similar to the manual renaming isomorphism, it should be possible right now to manually write out a concatenation for two tables by providing an isomorphism that lists all of the fields from l and moves them one by one into r. But admittedly that's a pretty annoying thing to have to do.

@duvenaud
Copy link
Contributor Author

duvenaud commented Dec 29, 2020

Thanks for your thoughts, @danieldjohnson !

Yeah, the current syntax doesn't support record flattening. I think it's actually a more fundamental issue than that, too, because the concatenation of two records can't be unified with a concrete record type in a unique way (since the fields could come from either l or r). That means that a type like {& ...l ...r} probably couldn't directly be used as a type in the core IR during type inference (under the current design of type inference), and would have to be handled with some other machinery (like typeclass dictionary synthesis, perhaps).

I suspected as much, or you would have included an example of it in isomorphisms.dx :)

(I think @dougalm and I had a conversation a few months ago where we were wondering if there were any practical use cases for record concatenation. Looks like this may be one!)

Yes, although we should keep in mind that the main use case here, join, might often just be an awkward way of composing indexed tables. A long time ago @dougalm sketched out for me some examples where complicated lookups could be simply (and type-safely) expressed through nested indexing. I think the next step should be to come up with a few more concrete use cases we'd like to be able to support. It might turn out that the relational algebra can be usefully refactored!

My not-fully-thought-through hypothesis is that record concatenation and record equality are both special cases of making it possible to iterate over the fields of arbitrary records in user-space, which could be accomplished by somehow transforming records into a generic representation. But I'm still not sure how much work that would be; it's something I'm considering playing around with in the future but I don't know when I'll get around to it.

I also am getting the sense that moving things to a user-manipulable representation is going to be necessary for usability. However, I've been surprised by how much you've managed to be able to do with the current representation, so I'm not sure. Part of the motivation for this example was for me to see how far I could push this isomorphism framework, and in this case I was able to push it a little further than I thought, but it looks like I'm hitting some real limitations.

Similar to the manual renaming isomorphism, it should be possible right now to manually write out a concatenation for two tables by providing an isomorphism that lists all of the fields from l and moves them one by one into r. But admittedly that's a pretty annoying thing to have to do.

Right, I found that example in isomorphisms.dx. But yes, if anyone should be able to avoid boilerplate, it's languages with dependent types!

roughly following [Wikipedia](https://en.wikipedia.org/wiki/Relational_algebra).

'Right now there are two major differences from the standard relational algebra:
1. We're not using sets of records, but rather tables. I'm not sure which is better.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine. SQL also doesn't have set semantics, because every table entry has a distinct primary key (which corresponds to table's index set in Dex!).


def cartesian_product (r:n=>a) (s:m=>b) : (n & m)=>(a & b) =
-- Follows the set theory defintion, not the relational
-- algebra definition, which would flatten a & b.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But... it does flatten to a & b

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote that poorly, I said it "would flatten a & b", i.e. the relational algebra definition would flatten a & b to (a_1, a_2, ... b_1, b_2...), but this implementation doesn't do that.

(left_iso : Iso ({&} & a1) (b & c1))
(right_iso: Iso ({&} & a2) (b & c2))
(left_tab:n=>a1) (right_tab:m=>a2) : List (b & c1 & c2) =
concat for (i, j).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should just write filter : (n=>(Maybe a)) -> List a. Or maybe call it concatMaybes to match catMaybes from Haskell.

@duvenaud duvenaud changed the base branch from main to safe-names-dev February 8, 2022 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants