-
Notifications
You must be signed in to change notification settings - Fork 11
Cell Lenses
Lenses are small data manipulation widgets that automate common data transformations.
Annotate cells or rows with comments. (Note: Requires that Vizier be launched with the -X ENABLE-MIMIR
option.
- Dataset [dataset]: The dataset to annotate
-
Comments [list]:
- Column [column, optional]: The column to annotate. If unset, annotate the row.
- Comment [string]: The text to annotate the cell(s) with.
-
Row or Condition [string]:
- If this is a number, it will be treated as a row identifier (this is what the spreadsheet generates)
- If this field is empty, all rows in the column will be annotated
- If this field starts with an
=
, it will be treated as a boolean-valued SparkSQL expression dictating which rows should be annotated (e.g.,=B=1
will annotate rows where the columnB
is1
)
The easiest way to create a comment lens is to use the comment feature in the Spreadsheet view. Select a cell and use the comments button to add a new comment.
Be warned when your dataset changes.
- Dataset [dataset]: The dataset to stabilize
- Reset [checkbox]: Recompute dataset statistics
The first time you run this cell, it will record some basic statistics about the input dataset (what we call 'facets'), including typical values, null frequencies, and more. If these statistics change, the lens will apply caveats to the dataset.
- Dataset: A dataset
-
Schema
- Column Name: A column
- Data Type: The datatype to set the column to
This lens can be used to cast the types of any attribute in your dataset. If no attributes are provided, it will look at every string-typed attribute and make a guess about what types to assign it.
Impute missing values in one or more target columns.
- Dataset [dataset]: The dataset to repair
-
Columns
- Column [column]: The column to impute
- Constraint [string, optional]: A constraint on the column
Eliminate duplicate rows based on a specified key column. The key column will be unique in resulting table. Values for the remaining columns are selected arbitrarily from among the non-null values of rows with matching values of the key column. If more than one such value exists for a given column, the result will be caveatted.
- Dataset [dataset]: The dataset to repair
- Key [column]: The key column to merge rows on.
Pivot the dataset, translating rows into columns. For each pivot cell, pick one column to pivot on and one or more value columns. Assuming that the pivot column is categorical, each value column will be duplicated once for every category. The value of the value column for a given category will be the value of the column on the row where the pivot column's value corresponds to the category. If more than one row has the same category, one value will be chosen arbitrarily. If the -X ENABLE-MIMIR
experimental option is enabled, cells with duplicate values will be caveated. If no rows have the selected value, the cell will be NULL
. Key columns act as group-by columns.
- Dataset [dataset]: The dataset to repair
- Pivot Column [column]: The column to pivot on
-
Value Columns
- Column [column]: Columns that are split on all the distinct value columns
-
Key Columns
- Column [column]: Columns that are passed unchanged through the pivot
- Output [string, optional]: The name of the output column. If empty, the input table will be overwritten.
Given an integer column, assume that the column's values are supposed to contain sequential values and fill in any gaps. The result table will contain all integers, starting from the lowest value in the input, up to the highest.
- Dataset [dataset]: The dataset to repair
- Column [column]: The column to fill the gaps in for.
Translate latitude and longitude values into a Sedona geometry column.
- Dataset: The dataset to tag
- Longitude: The longitude column
- Latitude: The latitude column
- Output Column: The name of the geometry column to add
Note: The geocoding lens will not appearo unless a geocoding service has been configured. See below.
Use an external geocoding service to translate street address information into latitude/longitude coordinates. Currently, only US addresses are supported, and additional configuration is required.
- Dataset [dataset]: The dataset to geocode
- House No [column]: The column containing the house number field of the address
- Street [column]: The column containing the street name field of the address
- City [column]: The column containing the city field of the address
- State [column]: The column containing the state field of the address
- Geocoder [pop-up]: The geocoding service to use.
Vizier supports the following geocoding services:
- OpenStreetMaps: You will need the URL of your geocoding endpoint
- Google Maps: You will need your Google API key
You can configure vizier in one of the following ways:
- Enter the option via the Vizier settings panel. Note, this is a per-project configuration and requires Vizier to be restarted.
- Command-line argument: When launching Vizier, pass either the
--osm-server
or--google-api-key
flags.