Skip to content

Getting started tutorial

jeipollack edited this page Apr 3, 2023 · 24 revisions

Welcome to the WaveDiff tutorial!

This tutorial serves as a walk-through guide of how to set up runs of WaveDiff with different configuration settings.

Basic Execution

The WaveDiff pipeline is launched and managed by the wf_psf/run.py script.

A list of command-line arguments can be displayed using the --help option:

> python wf_psf/run.py --help
usage: run.py [-h] --conffile CONFFILE --repodir REPODIR --outputdir OUTPUTDIR

optional arguments:
  -h, --help            show this help message and exit
  --conffile CONFFILE, -c CONFFILE
                        a configuration file containing program settings.
  --repodir REPODIR, -r REPODIR
                        the path of the code repository directory.
  --outputdir OUTPUTDIR, -o OUTPUTDIR
                        the path of the output directory.

There are three arguments, which the user should specify when launching the pipeline.

The first argument: --confile CONFFILE specifies the path to the configuration file storing the parameter options for running the pipeline.

The second argument: --repodir REPODIR is the path to the wf-psf repository.

The third argument: --outputdir OUTPUTDIR is used to set the path to the output directory, which stores the WaveDiff results.

To run WaveDiff, use the following command:

> python wf_psf/run.py -c /path/to/config/file -r /path/to/wf-psf -o /path/to/output/dir

You can test this now using the configuration file configs.yaml provided in the subdirectory config inside the wf-psf repository. Launching this script will initiate training of the semi parametric DataDriven PSF model. The outputs will be stored in a directory called wf-outputs located in path specified with the argument -o.

Next, we describe to some detail the configuration file structures and content.

Configuration

The WaveDiff pipeline features four main packages for four pipeline tasks: training, metrics, plotting, and simPSF.
The training pipeline task is for training PSF models. The metrics pipeline task performs metrics evaluations of the trained PSF models. And, the plotting pipeline task is a utility feature for generating plots for the various metrics. The fourth pipeline task simPSF is used to simulate stellar PSFs to used as training and test data for the training procedure. To configure WaveDiff for a particular run or a set of runs, the user specifies the processing step(s) by setting the values of the associated configuration variables {pipeline_task}_conf in the master configs.yaml file:

---
  training_conf: config/training_config.yaml
  metrics_conf: config/metrics_config.yaml
  plotting_conf: config/plotting_config.yaml
  

and providing the corresponding configuration files. Each configuration file contains only those parameters required for the specific pipeline task.

The input configuration files into WaveDiff are constructed using YAML (Yet Another Markup Language), while for logging we use the ini file syntax. The complete config directory tree is shown below

config
├── configs.yaml
├── logging.conf
├── metrics_config.yaml
├── plotting_config.yaml
└── training_config.yaml

As WaveDiff is currently undergoing refactoring, only the training part of the pipeline is functional. Therefore, the code will only activate the training part of the pipeline.

Training Configuration

The specifications for training are set in the configuration file training_config.yaml. An example template is already provided in the config folder (click to view). The contents of the yaml file are read-in as a nested dictionary with key:value pairs. The first line contains the phrase training:, which is the key and the value is another dictionary containing another set of keys: model_params, training_hparams, and data. We write below a short-hand example of this:

training:
  model_params:
     .
     .
     .
  training_hparams:
     .
     .
     .
  data:
     .
     .
     .

As a user, you should not modify the names of the keys. You can modify the value entries. For example, if you want to train a particular PSF model you can update the training: model_params: model_name with the name of the desired model. The models options are listed in the comment just above as with most of the other keys. Note, however, for now (due to refactoring) only the poly model is implemented.

Similarly, you can modify the training hyperparameters associated to the key training_hparams. To avoid confusing your future self, try to be consistent when changing variables, as all parameters in the file will be recorded in the run log. For example, if you increase or decrease the number of cycles of training (total_cycles) you should adjust all of the associated training hyper parameter variables with lists which are the size specified by total_cycles like learning_rate_params, etc.

For the training and test data, we already have datasets you can use located in the data/coherent_euclid_dataset directory. In the near future, we will update the tutorial with instructions for generating new training and test datasets.

Okay, that wraps up this tutorial. If you have any questions or feedback, please don't hesitate to open a Github issue.

Clone this wiki locally