-
Notifications
You must be signed in to change notification settings - Fork 9
Getting started tutorial
Welcome to the WaveDiff tutorial!
This tutorial serves as a walk-through guide of how to set up runs of WaveDiff
with different configuration settings.
The WaveDiff
pipeline is launched and managed by the wf_psf/run.py
script.
A list of command-line arguments can be displayed using the --help
option:
> python wf_psf/run.py --help
usage: run.py [-h] --conffile CONFFILE --repodir REPODIR --outputdir OUTPUTDIR
optional arguments:
-h, --help show this help message and exit
--conffile CONFFILE, -c CONFFILE
a configuration file containing program settings.
--repodir REPODIR, -r REPODIR
the path of the code repository directory.
--outputdir OUTPUTDIR, -o OUTPUTDIR
the path of the output directory.
There are three arguments, which the user should specify when launching the pipeline.
The first argument: --confile CONFFILE
specifies the path to the configuration file storing the parameter options for running the pipeline.
The second argument: --repodir REPODIR
is the path to the wf-psf
repository.
The third argument: --outputdir OUTPUTDIR
is used to set the path to the output directory, which stores the WaveDiff
results.
To run WaveDiff
, use the following command:
> python wf_psf/run.py -c /path/to/config/file -r /path/to/wf-psf -o /path/to/output/dir
You can test this now using the configuration file configs.yaml
provided in the subdirectory config
inside the wf-psf
repository. Launching this script will initiate training of the semi parametric DataDriven PSF model. The outputs will be stored in a directory called wf-outputs
located in path specified with the argument -o
.
Next, we describe to some detail the configuration file structures and content.
The WaveDiff
pipeline features four main packages for four pipeline tasks: training
, metrics
, plotting
, and simPSF
.
The training
pipeline task is for training PSF models. The metrics
pipeline task performs metrics evaluations of the trained PSF models. And, the plotting
pipeline task is a utility feature for generating plots for the various metrics. The fourth pipeline task simPSF
is used to simulate stellar PSFs to used as training and test data for the training procedure.
To configure WaveDiff
for a specific pipetask run or a set of pipe tasks, the user specifies the processing step(s) by setting the values of the associated configuration variables {pipeline_task}_conf
in the master configs.yaml
file:
---
data_conf: data_config.yaml
training_conf: training_config.yaml
metrics_conf: metrics_config.yaml
plotting_conf: plotting_config.yaml
and providing the corresponding configuration files. Each configuration file contains only those parameters required for the specific pipeline task.
The input configuration files into WaveDiff
are constructed using YAML
(Yet Another Markup Language), while for logging we use the ini
file syntax. The complete config
directory tree is shown below
config/
├── configs.yaml
├── logging.conf
├── data_config.yaml
├── metrics_config.yaml
├── plotting_config.yaml
└── training_config.yaml
IMPORTANT NOTE: As WaveDiff
is currently undergoing refactoring, only the training and metrics evaluation pipetasks of the pipeline are functional.
To run WaveDiff
in train-only mode
, you can simply provide the configs.yaml
---
data_conf: data_config.yaml
training_conf: training_config.yaml
metrics_conf: Null
Or, leave metrics_conf:
empty. Similarly, to run WaveDiff
in metrics-only mode
, you can set the training_conf: Null
or empty in configs.yaml
.
The data_config.yaml
stores the metadata for generating training and test datasets or retrieving existing ones. For the training and test data, we already have datasets you can use located in the data/coherent_euclid_dataset
directory. New training and test datasets can be produced with the parameters in the file. In the near future, we will provide a tutorial with instructions for how to set these parameters.
# Training and test datasets for training and/or metrics evaluation
data:
training:
# Specify directory path to data; Default setting is /path/to/repo/data
data_dir: data/coherent_euclid_dataset/
file: train_Euclid_res_200_TrainStars_id_001.npy
# if training dataset file does not exist, generate a new one by setting values below
.
. <params to generate training dataset>
.
test:
data_dir: data/coherent_euclid_dataset/
file: test_Euclid_res_id_001.npy
# If test dataset file not provided produce a new one
.
. <params to generate test dataset>
.
The specifications for training are set in the configuration file training_config.yaml
. An example template is already provided in the config
folder (click to view). The contents of the yaml file are read-in as a nested dictionary with key:value pairs. The first line contains the phrase training:
, which is the key
and the value is another dictionary containing another set of keys: model_params
, training_hparams
, and data
. We write below a short-hand example of this:
training:
id_name:
model_params:
model_name:
.
.
.
training_hparams:
.
.
.
As a user, you should not modify the names of the keys. You can modify the value entries. For example, if you want to train a particular PSF model you can update the training: model_params: model_name
with the name of the desired model. The models options are listed in the comment just above as with most of the other keys. Note, however, for now (due to refactoring) only the poly
model is implemented.
Similarly, you can modify the training hyperparameters associated to the key training_hparams
(this part of the code works 😇). To avoid confusing your future self, try to be consistent when changing variables, as all parameters in the file will be recorded in the run log. For example, if you increase or decrease the total number of cycles for training (total_cycles
) you should adjust all of the associated training hyper parameter variables with lists which are the size specified by total_cycles
like learning_rate_params
, etc.
The metrics_config.yaml
file stores the configuration parameters for the wf-psf
pipeline to carry out a set of metric calculations on a trained PSF model. Below we show an example of the types of parameters contained in the file. Above each parameter we provide a definition. The user can adjust the various flags for assessing either a fully trained PSF model or the weights of a given checkpoint cycle (if saved during training). To evaluate a previously trained model, the user should provide trained model's id_name
, path to output directory in trained_model_path
, and the config file used during training in trained_model_config
. The user can choose which metrics to evaluate by setting the Boolean flags for eval_mono_metric_rmse
, eval_opd_metric_rmse
, and eval_train_shape_sr_metric_rmse
. To compute the errors of trained PSF model, a the metrics
package generates a ground truth model
based on the parameters defined the object ground_truth_model
in the metrics_config.yaml
file. The metrics
package is run using TensorFlow and the hyperparameters are defined in the metrics_hparams
object.
metrics:
# Set flag to evaluate model weights; if False, then final updated model will be evaluated
use_callback: False
# Choose the training cycle for the evaluation. Can be: 1, 2, ...
saved_training_cycle: 2
# Provide path to saved model checkpoints.
chkp_save_path: checkpoint
# Fill model_params if computing metrics_only on a pre-trained model
# Specify the ID name of the Trained PSF model
id_name: -coherent_euclid_200stars
# Path to Trained PSF Model
trained_model_path: /Users/jenniferpollack/Projects/wf-outputs/wf-outputs-202305262344/
# Name of Trained PSF Model Config file
trained_model_config: config/training_config.yaml
# Flag to evaluate the monchromatic RMSE metric.
eval_mono_metric_rmse: True
# Flag to evaluate the OPD RMSE metric.
eval_opd_metric_rmse: True
# Flag to evaluate the super-resolution and the shape RMSE metrics for the train dataset.
eval_train_shape_sr_metric_rmse: True
ground_truth_model:
model_params:
.
.
.
metrics_hparams:
.
.
.
Okay, that wraps up this tutorial. Now you can run WaveDiff
to train your PSF model and evaluate the metrics. If you have any questions or feedback, please don't hesitate to open a Github issue.