Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Declarative Graphing + JSON/DataFrames #482

Merged
merged 12 commits into from
Feb 1, 2021
258 changes: 258 additions & 0 deletions examples/json.dx
Original file line number Diff line number Diff line change
@@ -0,0 +1,258 @@


' # Declarative Graphing


' ## JSON Implementation

def join (extra: List a) (lists:n=>(List a)) : List a =
concat $ for i.
case ordinal i == (size n - 1) of
True -> lists.i
False -> lists.i <> extra


data JValue = AsJValue String
-- TODO - once Dex supports recursive ADT JValue becomes Value.

data Value =
AsObject (List (String & JValue))
AsArray (List JValue)
AsString String
AsFloat Float
AsInt Int

interface ToJSON a
toJSON : a -> Value


-- These three are private methods. Users should use type classes.
def escape (x:JValue) : String =
(AsJValue y) = x
y


def collapse (x:Value) : JValue =
AsJValue $ case x of
AsString y -> "\"" <> y <> "\""
AsFloat y -> show y
AsInt y -> show y
AsObject (AsList _ y) ->
("{" <> (join ", " $ for i.
(k, v) = y.i
"\"" <> k <> "\"" <> ":" <> (escape v)) <> "}")
AsArray (AsList _ y) -> ("[" <> (join ", " $ for i. escape y.i) <> "]")

def hide [ToJSON a] (x:a) : JValue =
collapse $ toJSON x

instance Show Value
show = \x. escape $ collapse x

instance ToJSON String
toJSON = AsString

instance ToJSON Int
toJSON = AsInt

instance ToJSON Float
toJSON = AsFloat

instance ToJSON Value
toJSON = id

instance [ToJSON v] ToJSON ((Fin n) => v)
toJSON = \x . AsArray $ AsList _ $ for i. hide x.i

instance [ToJSON v] ToJSON ((Fin n) => (String & v))
toJSON = \x . AsObject $ AsList _ $ for i. (fst x.i, hide $ snd x.i)

instance [ToJSON v] ToJSON (List (String & v))
toJSON = \(AsList _ x) . toJSON x


' ## Graph Grammars (Vega-Lite Spec)

Options = List (String & String)
srush marked this conversation as resolved.
Show resolved Hide resolved

data Mark =
AsMark String Options

data EncodingType =
Quantitative
Nominal
Ordinal

instance Show EncodingType
show = (\ x.
case x of
Quantitative -> "quantitative"
Nominal -> "nominal"
Ordinal -> "ordinal")

data Channel =
Y
X
Color
Tooltip
HREF
Row
Col
Size

instance Show Channel
show = (\ x.
case x of
Y -> "y"
X -> "x"
Color -> "color"
Tooltip -> "tooltip"
HREF -> "href"
Size -> "size"
Row -> "row"
Col -> "col")

data Encoding key =
AsEncoding Channel key EncodingType Options
srush marked this conversation as resolved.
Show resolved Hide resolved

def enc (c:Channel) (k:Int) (et: EncodingType) : Encoding key =
AsEncoding c (k@_) et mempty

def mark (m:String) : Mark =
AsMark m mempty

def optsList (x:Options) : List (String & Value) =
(AsList _ tab) = x
AsList _ $ for i. (fst tab.i, toJSON $ snd tab.i)


data DataFrame key n value =
AsDataFrame (n => key => value) (key => String)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since value doesn't depend on key this is really a homogeneous data frame instead, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point... it's heterogeneous because of the variant type for value. I guess ideally this would be a table where columns are of different types? I will have to think more if that is possible.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh it's heterogeneous because you store all values as JSON! When we initially thought of data frames, we usually considered them as tables-of-records (n=>{field1: Float & field2 : Int}). A nice benefit of that is that those do have an efficient lowering, unlike this huge variant. But it would make it much harder to store the column=>meta association, because the column index set really indexes the fields of the record 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah records are definitely the right solution to this. But I couldn't really figure out how to cleanly enumerate over fields.

Although maybe I just need like a variant over Iso's instead of values. I guess the Meta could could do that work.

Copy link
Contributor Author

@srush srush Jan 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh actually @danieldjohnson , maybe the cute solution is to have col metadata Iso Variant indexed and the columns in a Iso record. The metadata needs to provide a function to extract it's column (say into json)

Downside: The user would have to manually link them. Same names?

Upside: homogenous data columns, variant named columns (not ordering based), column enumeration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried a slightly different strategy. I think maybe flat dataframes are a hack and kind of unneeded in dex. To make graphing work, you can just construct the flat data representation along with the graph description. This can come from a record or from any other source.

n = Fin 100
df2 : {x1: n => Float &
        x2: n => Float &
        weight: n => Float &
        label : n => Class} 
...

chart2 = (AsVLDescriptor (pure Point) [("title", "Scatter")]
            [{title="X1", encodings=pureLs X, encType=Quantitative, rows=wrapCol #x1 df2},
             {title="X2", encodings=pureLs Y, encType=Quantitative, rows=wrapCol #x2 df2},
             {title="Weight", encodings=pureLs Size, encType=Quantitative, rows=wrapCol #weight df2},
             {title="Label", encodings=toList [pure Color, pure Tooltip], encType=Nominal, rows=wrapCol #label df2}])

:html showVega $ toJSON chart2

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, I like the most recent version!

I think that manipulating records will be much, much easier once we can enumerate over fields in userspace. So hopefully the "manually linking" downsides will no longer be a problem at that point.



def chart [ToJSON v] (x: DataFrame (Fin key) n v)
(mark: Mark)
(encs : (Fin m) => Encoding (Fin key))
(opts : Options)
dan-zheng marked this conversation as resolved.
Show resolved Hide resolved
: Value =

(AsMark mtype options) = mark
jmark = ("mark", toJSON ((AsList _ [("type", mtype)]) <> options))
(AsDataFrame df names) = x
finsize = (Fin $ size n)
jdf = toJSON $ unsafeCastTable finsize $ for i. toJSON $ for k.
("field" <> (show $ ordinal k), toJSON df.i.k)
jdata = ("data", toJSON [("values", jdf)])
jencodings = toJSON $ for i.
(AsEncoding channel key type encoptions) = encs.i
(show channel, toJSON ((AsList _ [
("field", "field" <> (show $ ordinal key)),
("type", show type),
("title", names.key)
])
<> encoptions))
jencode = ("encoding", jencodings)
toJSON ((AsList _ [jdata, jmark, jencode]) <> optsList opts)


def showVega (x: Value) : String =
"<iframe style=\"border:0\" height=\"300px\" width=\"700px\" srcdoc='<html> <head><script src=\"https://cdn.jsdelivr.net/npm/[email protected]\"></script> <script src=\"https://cdn.jsdelivr.net/npm/[email protected]\"></script> <script src=\"https://cdn.jsdelivr.net/npm/[email protected]\"></script> </head> <body> <div id=\"vis\"></div><script>vegaEmbed(\"#vis\"," <> (show x) <> ");</script></body></html>'</iframe>"

' ## Example: Bar Chart

a_data = ["A", "B", "C", "D", "E", "F", "G", "H", "I"]
b_data = [28, 55, 43, 91, 81, 53, 19, 87, 52]

df0 = AsDataFrame (for i. [toJSON a_data.i, toJSON b_data.i]) (["a", "b"])

c = (chart df0 (mark "bar")
[enc X 0 Nominal,
enc Y 1 Quantitative]
(toList [("title", "Bar Graph")]))

:html showVega $ c


' ## Example: Scatter

data Class =
A
B
C

instance Show Class
show = \x . case x of
A -> "Apples"
B -> "Bananas"
C -> "Cucumbers"

keys : (Fin 5) => Key = splitKey $ newKey 1
x1 : (Fin 100) => Float = arb $ keys.(0 @ _)
x2 : (Fin 100) => Float = arb $ keys.(1 @ _)
weight : (Fin 100) => Float = arb $ keys.(2 @ _)
label : (Fin 100) => Class =
x = arb $ keys.(3 @ _)
for i. [A, B, C] (x.i)


df = (AsDataFrame
(for i. [toJSON $ x1.i,
toJSON $ x2.i,
toJSON $ weight.i,
toJSON $ show label.i])
(["X1", "X2", "Weight", "Label"]))


:html showVega (chart df (mark "point")
[enc X 0 Quantitative,
enc Y 1 Quantitative,
enc Size 2 Quantitative,
enc Color 3 Nominal,
enc Tooltip 3 Nominal]
(toList [("title", "Scatter")]))

' ## Example: Faceted Area plot

y1 : (Fin 3) => (Fin 10) => Float = arb $ keys.(0 @ _)
y = for i. cumSum . for j. select (y1.i.j > 0.0) (-1.0) 1.0

df2 = (AsDataFrame
(for (i,j). [toJSON $ y.i.j,
toJSON $ ["Run 1", "Run 2", "Run 3"].i,
toJSON $ ordinal j])
(["density", "Runs", "Round"]))



:html showVega (chart df2 (mark "area")
[enc Y 0 Quantitative,
enc Row 1 Nominal,
enc X 2 Quantitative]
(toList [("title", "Area"), ("height", "100")]))


' ## Example: Heatmap


words = ["the", "dog", "walked", "to", "the", "store"]

z : (Fin 6) => (Fin 6) => Float = arb $ keys.(0 @ _)

df3 = (AsDataFrame
(for (i,j). [toJSON $ z.i.j,
toJSON $ words.i <> " - " <> words.j,
toJSON $ ordinal i,
toJSON $ ordinal j
])
(["match", "words", "X", "Y"]))



:html showVega (chart df3 (mark "rect")
[enc Color 0 Quantitative,
enc X 2 Ordinal,
enc Y 3 Ordinal,
enc Tooltip 1 Nominal]
(toList [("title", "HeatMap"), ("height", "100")]))


2 changes: 2 additions & 0 deletions static/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.css" integrity="sha384-AfEj0r4/OFrOo5t7NnNe46zW/tFgW6x/bCJG8FqQCEo3+Aro6EYUG4+cU+KJWu/X" crossorigin="anonymous">
<script defer src="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.js" integrity="sha384-g7c+Jr9ZivxKLnZTDUhnkOnsh30B4H0rpLUpJ4jAIKs4fnJI+sEnkvrMWph2EDg4" crossorigin="anonymous"></script>
<script defer src="https://cdn.jsdelivr.net/npm/[email protected]/dist/contrib/auto-render.min.js" integrity="sha384-mll67QQFJfxn0IYznZYonOWZ644AWYC+Pt2cHqMaRhXVrursRwvLnLaebdGIlYNa" crossorigin="anonymous"></script>


</head>

<body>
Expand Down