Add a train_test_split.py module for dataset creation #124

jeipollack · 2024-03-01T17:12:49Z

WaveDiff v.2.0.x is missing a module to create simulated datasets for training and validation. This issue is to discuss the development of this module. It may require refactoring of the data_config.yaml file.

jeipollack · 2024-03-01T17:18:42Z

@tobias-liaudat do training and test datasets require different values for the parameters (e.g. SEDs, Zernike coeffs, etc) as it is in data_config.yaml file? Note, I am asking specifically about the duplicates.

This would imply that different SEDs could be used, Zernikes, spatial variations, etc. If no, then I am wondering if a single set of parameters are specified to generate a single dataset from which it is split by some fraction defined in the config file. There could be other parameters like adding noise, etc.

Btw what does SR mean in the following?

# Gaussian noise for training stars
SNR_range = [10, 110]
# Parameters for the SR in the test dataset
SR_output_dim = 64
SR_output_Q = 1.0

Also what is the purpose of defining:

    stars: null
    noisy_stars: null
    positions: null
    zernike_coeffs: null
    polynomial_coeffs: null

? Maybe I added it as a reminder to myself, but I don't know what values would go there. Would it be the name(s) of the corresponding file?

tobias-liaudat · 2024-03-04T11:17:26Z

@jeipollack I don't recall how the parameters with null are handled in the new code. Is it a path to a .npy file?

SR is for super-resolved, those parameters are the ones to change the PSF simulator to generate super resolved stars.

In the original code, at the beginning, the generation of super-resolved (SR) stars was done on the fly, as the parameters it was fast in the GPU and with the parameters I was using. This allowed us to have a lightweight test/train .npy file. However, depending on the parameters, generating the SR stars may take a long time, and you may want to generate them only once and load the stars from the .npy.

In the usual usage of wavediff, we are not interested in having different parameters for the train/test stars. However, to carry out sensitivity testing and how does some errors in the input affect the PSF model after training we will need to have different parameters for train/test.

jeipollack added the enhancement New feature or request label Mar 1, 2024

jeipollack added this to the Modelling Performance Enhancement milestone Mar 1, 2024

jeipollack self-assigned this Mar 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a train_test_split.py module for dataset creation #124

Add a train_test_split.py module for dataset creation #124

jeipollack commented Mar 1, 2024

jeipollack commented Mar 1, 2024 •

edited

Loading

tobias-liaudat commented Mar 4, 2024

Add a train_test_split.py module for dataset creation #124

Add a train_test_split.py module for dataset creation #124

Comments

jeipollack commented Mar 1, 2024

jeipollack commented Mar 1, 2024 • edited Loading

tobias-liaudat commented Mar 4, 2024

jeipollack commented Mar 1, 2024 •

edited

Loading