-
Notifications
You must be signed in to change notification settings - Fork 9
Getting started tutorial
Welcome to the WaveDiff tutorial!
This tutorial serves as a walk-through guide of how to set up runs of WaveDiff
with different configuration settings.
The WaveDiff
pipeline is launched and managed by the wf_psf/run.py
script.
A list of command-line arguments can be displayed using the --help
option:
> python wf_psf/run.py --help
usage: run.py [-h] --conffile CONFFILE --repodir REPODIR --outputdir OUTPUTDIR
optional arguments:
-h, --help show this help message and exit
--conffile CONFFILE, -c CONFFILE
a configuration file containing program settings.
--repodir REPODIR, -r REPODIR
the path of the code repository directory.
--outputdir OUTPUTDIR, -o OUTPUTDIR
the path of the output directory.
There are three arguments, which the user should specify when launching the pipeline.
The first argument: --confile CONFFILE
specifies the path to the configuration file storing the parameter options for running the pipeline.
The second argument: --repodir REPODIR
is the path to the wf-psf
repository.
The third argument: --outputdir OUTPUTDIR
is used to set the path to the output directory, which stores the WaveDiff
results.
To run WaveDiff
, use the following command:
> python wf_psf/run.py -c /path/to/config/file -r /path/to/wf-psf -o /path/to/output/dir
You can test this now using the configuration file configs.yaml
provided in the subdirectory config
inside the wf-psf
repository. Launching this script will initiate training of the semi parametric DataDriven PSF model. The outputs will be stored in a directory called wf-outputs
located in path specified with the argument -o
.
Next, we describe to some detail the configuration file structures and content.
The WaveDiff
pipeline features four main packages for four pipeline tasks: training
, metrics
, plotting
, and simPSF
.
The training
pipeline task is for training PSF models. The metrics
pipeline task performs metrics evaluations of the trained PSF models. And, the plotting
pipeline task is a utility feature for generating plots for the various metrics. The fourth pipeline task simPSF
is used to simulate stellar PSFs to used as training and test data for the training procedure. To configure WaveDiff
for a particular run or a set of runs, the user specifies the processing step(s) by setting the values of the associated configuration variables {pipeline_task}_conf
in the master configs.yaml
file:
---
training_conf: config/training_config.yaml
metrics_conf: config/metrics_config.yaml
plotting_conf: config/plotting_config.yaml
and providing the corresponding configuration files. Each configuration file contains only those parameters required for the specific pipeline task.
The input configuration files into WaveDiff
are constructed using YAML
(Yet Another Markup Language), while for logging we use the ini
file syntax. The complete config
directory tree is shown below
config
├── configs.yaml
├── logging.conf
├── metrics_config.yaml
├── plotting_config.yaml
└── training_config.yaml
As WaveDiff
is currently undergoing refactoring, only the training part of the pipeline is functional. Therefore, the code will only activate the training
part of the pipeline.
The specifications for training are set in the configuration file training_config.yaml
. An example template is already provided in the config
folder (click to view). The contents of the yaml file are read-in as a nested dictionary with key:value pairs. The first line contains the phrase training:
, which is the key
and the value is another dictionary containing another set of keys: model_params
, training_hparams
, and data
. We write below a short-hand example of this:
training:
model_params:
.
.
.
training_hparams:
.
.
.
data:
.
.
.
As a user, you should not modify the names of the keys. You can modify the value entries. For example, if you want to train a particular PSF model you can update the training: model_params: model_name
with the name of the desired model. The models options are listed in the comment just above as with most of the other keys. Note, however, for now (due to refactoring) only the poly
model is implemented.
Similarly, you can modify the training hyperparameters associated to the key training_hparams
. To avoid confusing your future self, try to be consistent when changing variables, as all parameters in the file will be recorded in the run log. For example, if you increase or decrease the number of cycles of training (total_cycles
) you should adjust all of the associated training hyper parameter variables with lists which are the size specified by total_cycles
like learning_rate_params
, etc.
For the training and test data, we already have datasets you can use located in the data/coherent_euclid_dataset
directory. In the near future, we will update the tutorial with instructions for generating new training and test datasets.
Okay, that wraps up this tutorial. If you have any questions or feedback, please don't hesitate to open a Github issue.