Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Declarative Graphing + JSON/DataFrames #482

Merged
merged 12 commits into from
Feb 1, 2021
Merged

Conversation

srush
Copy link
Contributor

@srush srush commented Jan 23, 2021

This example code implements:

(This is not meant to overlap / intersect with the Dex plotting. I just got interested in how clean this Vega-Lite Library was compared to Plotly and wanted to try it out. )

Here's what the rendered output looks like https://srush.github.io/dex-lang/examples/json.html

Neat parts:

  • Dex is really good as a backbone for the this style of dataframe based declarative method. I mostly like the chart functional interface. (Besides maybe the use of numbers for variable id.)

Questionable Parts:

  • The DataFrame is basically a JSON table. Probably would have been right to do it with records. However I don't know if it is possible to enumerate over records or associate accessors with names?
  • I don't think Dex allows arbitrary javascript injection (nor should it? it seems stateful). So I am outputting iframes.

Nasty parts (most of this I knew already):

  • The json implementation is a pseudo tree. I collapse recursion to strings.
  • String stuff is slow (but works).

@google-cla google-cla bot added the cla: yes label Jan 23, 2021
@danieldjohnson
Copy link
Collaborator

I don't know if it is possible to enumerate over records or associate accessors with names?

Not at the moment (see #258) but this is on the roadmap!

Copy link
Collaborator

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wohooo, this is amazing! This mostly looks good, but can you please add some more docs? It's a bit difficult to review as it is

examples/json.dx Outdated


data DataFrame key n value =
AsDataFrame (n => key => value) (key => String)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since value doesn't depend on key this is really a homogeneous data frame instead, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point... it's heterogeneous because of the variant type for value. I guess ideally this would be a table where columns are of different types? I will have to think more if that is possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh it's heterogeneous because you store all values as JSON! When we initially thought of data frames, we usually considered them as tables-of-records (n=>{field1: Float & field2 : Int}). A nice benefit of that is that those do have an efficient lowering, unlike this huge variant. But it would make it much harder to store the column=>meta association, because the column index set really indexes the fields of the record 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah records are definitely the right solution to this. But I couldn't really figure out how to cleanly enumerate over fields.

Although maybe I just need like a variant over Iso's instead of values. I guess the Meta could could do that work.

Copy link
Contributor Author

@srush srush Jan 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh actually @danieldjohnson , maybe the cute solution is to have col metadata Iso Variant indexed and the columns in a Iso record. The metadata needs to provide a function to extract it's column (say into json)

Downside: The user would have to manually link them. Same names?

Upside: homogenous data columns, variant named columns (not ordering based), column enumeration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried a slightly different strategy. I think maybe flat dataframes are a hack and kind of unneeded in dex. To make graphing work, you can just construct the flat data representation along with the graph description. This can come from a record or from any other source.

n = Fin 100
df2 : {x1: n => Float &
        x2: n => Float &
        weight: n => Float &
        label : n => Class} 
...

chart2 = (AsVLDescriptor (pure Point) [("title", "Scatter")]
            [{title="X1", encodings=pureLs X, encType=Quantitative, rows=wrapCol #x1 df2},
             {title="X2", encodings=pureLs Y, encType=Quantitative, rows=wrapCol #x2 df2},
             {title="Weight", encodings=pureLs Size, encType=Quantitative, rows=wrapCol #weight df2},
             {title="Label", encodings=toList [pure Color, pure Tooltip], encType=Nominal, rows=wrapCol #label df2}])

:html showVega $ toJSON chart2

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, I like the most recent version!

I think that manipulating records will be much, much easier once we can enumerate over fields in userspace. So hopefully the "manually linking" downsides will no longer be a problem at that point.

examples/json.dx Outdated Show resolved Hide resolved
examples/json.dx Outdated Show resolved Hide resolved
examples/json.dx Outdated Show resolved Hide resolved
@@ -10,6 +10,12 @@
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.css" integrity="sha384-AfEj0r4/OFrOo5t7NnNe46zW/tFgW6x/bCJG8FqQCEo3+Aro6EYUG4+cU+KJWu/X" crossorigin="anonymous">
<script defer src="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.js" integrity="sha384-g7c+Jr9ZivxKLnZTDUhnkOnsh30B4H0rpLUpJ4jAIKs4fnJI+sEnkvrMWph2EDg4" crossorigin="anonymous"></script>
<script defer src="https://cdn.jsdelivr.net/npm/[email protected]/dist/contrib/auto-render.min.js" integrity="sha384-mll67QQFJfxn0IYznZYonOWZ644AWYC+Pt2cHqMaRhXVrursRwvLnLaebdGIlYNa" crossorigin="anonymous"></script>

<script>
function resizeIframe(obj) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is maybe lame, I need to look more into how to sandbox javascript per cell.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to dynamic.js

Copy link
Collaborator

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the docs, this is super helpful. I would only like to sort out the resizeIframe situation and I think it might be good to merge. Also, how about we rename the example to vega-plotting.dx or something like that? That part is even cooler than JSON for me 😛

examples/json.dx Outdated
def pureLs (x:a) : List (Opts a) =
AsList 1 [WithOpts x AsNone]

def mergeOpts [ToJSON a, ToJSON b] (x : a) (y : b) : Value =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess one caveat here is that you probably shouldn't be merging too many overlapping options 😃 But this is fine for now!

examples/json.dx Outdated


def showVega (x: Value) : String =
"<iframe width=\"100%\" frameborder=\"0\" scrolling=\"no\" onload=\"resizeIframe(this)\" srcdoc='<html>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this resizeIframe business be done with something like style="height:100%" on a bunch of the tags below?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently you can't do this :( Need to resize with js as far as I can tell. But the JS seems to work.

@srush srush changed the title [Preliminary] Declarative Graphing + JSON/DataFrames Declarative Graphing + JSON/DataFrames Jan 26, 2021
Copy link
Collaborator

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. I'm particularly happy that the data frames themselves don't hold any JSON, because that would make it very hard to transform them from inside Dex.

' Start with a well type and useful Dex record

df1 = {a = ["A", "B", "C", "D", "E", "F", "G", "H", "I"],
b = [28, 55, 43, 91, 81, 53, 19, 87, 52]}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to make data frames records-of-tables? I thought that table-of-records is a bit more intuitive, because that's closer to relational algebra

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Switched to rows.

@apaszke
Copy link
Collaborator

apaszke commented Feb 1, 2021

I've pushed a small update that avoids modifying our builtin JS sources just for this example. Will merge once the CI confirms that it's all good!

@apaszke apaszke merged commit bd06d8f into google-research:main Feb 1, 2021
apaszke pushed a commit that referenced this pull request Feb 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants