Skip to content

Cell Python

Oliver Kennedy edited this page Dec 31, 2023 · 6 revisions

TL;DR

Unlike Jupyter, variables and imports from one cell are not available in later cells by default. To pass information from one cell to another, see the documentation for vizierdb[]= below.

The Python Cell

Python

The Python cell executes a single Python 3 script in a fresh kernel.

If you're used to Jupyter...

There are two major differences in Vizier:

  1. Each Python cell is started in a fresh interpreter. Variables instantiated in one cell will not be visible in any other cells by default. Only datasets created using the vizierdb module (described below), or variables, functions, or classes exported with vizierdb.export_module will be available to later cells.
  2. Cells are always run in order. Only datasets and variables created by cells above in the noteboook will be visible.

Packages

Python cells run using the same Python interpreter used to launch Vizier. If you just installed Vizier with pip3, anything else you install with pip3 should be available in Vizier's Python cells.

Bokeh

VizierDB has been integrated with the Bokeh plotting library. Using show(plot) as you would normally will display the plot inline in Vizier. No magic is required.

Code Snippets

Clicking on the "Show Code Snippets" button below the Python cell will bring up a list of common Vizier-specific tasks. Click on any task to paste a snippet of code performing it into the cell.

The VizierDB Module

Each Python cell starts with a vizierdb already loaded, which allows you to access Vizier datsaets.

VizierDB Methods
  • vizierdb.show(object: Any, [mime_type:str], [force_to_string:bool]) (Always aliased to just show) Display the provided object as a cell message. Vizier uses the object type to make a best effort to convert the provided object into something that can be formatted nicely. If force_to_string is true (default false), the object will be converted to a string first. If mime_type is set, no post-processing will be done on object. If it is not set, the object will be rendered as follows:

    • DatasetClient: Render the dataset natively in vizier.
    • Bokeh Layouts, Matplotlib Figures, and Matplotlib Axes: Plots are rendered to HTML/SVG before being displayed.
    • list: One list element per line
    • Any object with a repr_html method: The result of calling repr_html is used.
    • ... otherwise: The object will be convered to a string and rendered as plain text (equivalent to print(str(object)))
  • vizierdb.show_html(html: str): Display the provided html text as a cell message.

  • vizierdb[artifact_name:str]: Get a handle to a vizier artifact automatically based on the artifact type:

    • Parameter: The equivalent python primitive value (equivalent to calling vizierdb.get_parameter)
    • Dataset: A Dataset object, as detailed below (equivalent to calling vizierdb.get_dataset)
    • Pickle: The Pickled python object (equivalent to calling vizierdb.get_pickle)
    • Module: The python function or class (equivalent to calling vizierdb.get_module)
    • File: A FileClient object, as detailed below (equivalent to calling vizierdb.get_file)
  • vizierdb.new_dataset() -> DatasetClient: Get a handle to a new, empty Vizier dataset.

  • vizierdb.get_dataset(ds_name:str) -> DatasetClient: Get a handle to Vizier dataset with the specified name ds_name (case sensitive).

  • vizierdb.update_dataset(ds_name:str, ds:DatasetClient) -> DatasetClient: Replace the existing dataset named ds_name with the data referenced by the handle ds, including any changes applied to ds. Throws an error if ds_name is not a valid dataset name.

  • vizierdb.create_dataset(ds_name:str, ds:DatasetClient): Create a new dataset named ds_name with the data referenced by the handle ds, including any changes applied to ds. Throws an error if ds_name is already a valid dataset name.

  • vizierdb.rename_dataset(ds_name:str, ds_new_name:str): Rename the existing dataset ds_name to ds_new_name.

  • vizierdb.drop_dataset(ds_name:str): Hide the specified dataset from subsequent cells.

  • vizierdb.get_data_frame(ds_name:str): Retrieve the provided dataset as a Pandas dataframe using Apache arrow. (This function is currently unsupported on JVM versions 1.9 or higher due to a Spark incompatibility)

  • vizierdb.save_data_frame(ds_name:str, df:pd.DataFrame): Save the provided Pandas dataframe as a Vizier dataset artifact. The dataframe will be serialized to a Parquet dataset before saving.

  • vizierdb.get_parameter(parameter_name:str) -> Any: Get the parameter artifact with the specified name.

  • vizierdb.get_file(name: str, [binary_mode: boolean]) -> FileClient: Retrieve the specified file artifact. Optionally open the file as a binary file.

  • vizierdb.create_file(artifact_name:str, [filename: Optional[str]], [mime_type: str], [binary_mode: bool]) -> FileClient: Create a new file artifact and return a writeable FileClient object representing it. Optionally override the filename or mime_type properties for the file, and optionally open the file object in binary mode. If the filename property is not specified, the artifact name will be used instead. If the mime_type property is not specified "text/plain" will be used instead.

  • vizierdb.import_file(path: str, [artifact_name: Optional[str]], [filename: Optional[str]], [mime_type: str], [buffer_size: int]) -> FileClient: Create a new file artifact by importing a file in the local filesystem. The artifact name may be overridden, but defaults to the file name portion of the path. The Filename parameter is set from the file name portion of the path and may likewise be overridden. If the mime_type of the file is not provided, it defaults to "text/plain".

  • vizierdb.get_pickle(name) -> Any: Retrieve the specified artifact as an object pickled previously by export_pickle.

  • vizierdb.export_pickle(x:Any, parameter_name:str): Export a value using python's pickle library. Later cells can refer to this variable by name.

  • vizierdb.get_module(module_name): Get the module artifact with the specified name.

  • vizierdb.export_module( x[, return_type={int,str,float,bool}] ): Export a variable, class, or function. The variable, class, or function will be visible in subsequent python cells. A dependency will be created to subsequent cells that reference the variable, class, or function, triggering re-execution if this cell is re-run. If the exported element is a function, the function can also be called as a UDF in SQL cells. If used in this way, you can pass a return_type argument to control the output type of the UDF (string is the default).

Note on export_module: Each python cell runs in a fresh interpreter. As a result, unlike Jupyter, export_module is analogous to creating a library file that is imported into all subsequent cells. It is recommended that you not put expensive initialization code in the class, function, or variable definition, as this code will be run anew in every cell.

Dataset Handle Methods

Note: any methods that mutate the dataset (e.g., insert_row or insert_column) will not be persisted until you call update_dataset or create_dataset as described above.

  • ds.rows: All rows in the dataset ds (e.g., for row in ds:rows:; See Row Handle Methods below).
    • Note: del ds.rows[index] will remove the row at the specified index from the dataset`
    • Note: This is a normal python array, you can access the cell at (i,j) by ds.rows[i][j]
  • ds.columns: All column specifications of a dataset ds (See Column Handle Methods below)
  • ds.insert_row( values = [v1, v2, v3, ...], position = pos): Insert a new row into ds with column values v1, v2, v3, .... Excess column values are ignored and unspecified column values (or an omitted values argument) are treated as None. Optionally provide a (0-based) index pos to insert the row at.
  • ds.insert_column( col_name, position = pos ): Insert a new column into ds with the specified name at the specified position.
  • ds.delete_column( col_name ): Delete the specified column col_name and all associated data.
  • ds.to_bokeh(): Translate the dataset to a bokeh ColumnDataSource for plotting.
  • ds.save( [name = name] ): Save the dataset under its original name, or optionally save it under a new name. Equivalent to calling vizierdb.update_dataset or vizierdb.create_dataset respectively.
Column Handle Methods

These are the elements of ds.columns

  • col.name: The name of the column col.
  • col.data_type: The type affinity of the column col.
Row Handle Methods

These are the elements of ds.rows. Note: changes to values (e.g., set_value) will not be persisted until you call update_dataset or create_dataset as described above.

  • row.get_value( col_name ): Retrieve the value of the specified column col_name on the row row.
  • row[col_name] or row[col_index]: Equivalent to calling get_value.
  • row.set_value( col_name, value ): Update the value of col_name on the row row to row
  • row[col_name] = value: Equivalent to calling set_value.
  • row.identifier: (read-only) A unique identifier for the row
  • row.caveats: (read-only) An array of bools that are true if the column at the corresponding index is caveatted.
  • row.row_caveat: (read-only) A bool that is true if the row is caveatted.