Skip to content

Vizier Concepts

Oliver edited this page Apr 21, 2021 · 1 revision

Notebooks

Vizier is a notebook interface, an interactive coding environment consisting of a series of cells. Each cell performs one task in a larger analytics workflow represented by the notebook. For example, one cell might load a dataset, the next cell might perform a regression based on that dataset, and a third might plot the results.

The killer feature of a notebook interface is that results appear after each cell, rather than after the entire workflow. Each cell in Vizier has an output area, where you'll see plots, tables, and other cell outputs.

Datasets

A dataset is a table of data (also sometimes called a data frame) with rows and columns. Vizier asks you to keep your data tidy, with each column storing one variable and each row storing one record or measurement.

Each dataset has a name and one or more named columns. Each column also has a datatype (e.g., an Integer, a Real, a String, a Date, etc...).

You can edit individual datasets in the Spreadsheet view. Every edit you make adds a new "Vizual" script cell to the end of the notebook.

Reproducible Execution

Think of a Vizier notebook as a program that executes in order. Datasets created by one cell are only visible to later cells in the notebook. Similarly, any changes to a dataset (e.g., added rows, altered values) made by one cell only take effect in later cells.

Vizier is all about keeping data science reproducible. To do that, Vizier keeps notebook contents up-to date. If a cell changes: all cells that depend on it are re-executed.

Another way to think about it is that datasets are versioned. Cells don't change datasets, they create new versions of the dataset. Each cell sees the version of the datasets left for it by the previous cell. If the version of the dataset that a cell read from changes, the cell gets re-run.

Versioning and Branches

Every time you make a change to the notebook, it is automatically tracked by Vizier. You can see a history of all of your edits by opening the History item in the Branches menu. From the history view, you can go back to a (read-only copy of) earlier version of the notebook.

You can also branch the history. This creates a copy of the notebook from the version that you're currently looking at. Edits to this branch are completely separate from all other branches.

Warnings

Some cells (Load Dataset and the Lenses) annotate your data with data warnings when things break (e.g., when they load a malformed value). Affected cells show up in red. You can click on the Warnings tab in the top bar to see a list of warnings on each dataset.

Warnings stick with a dataset and cells. For example, when you run an aggregate SQL query, the aggregate value will be tagged with all warnings that were applied to the values being aggregated.

These show up in the warnings tab of the top bar