Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a train_test_split.py module for dataset creation #124

Open
jeipollack opened this issue Mar 1, 2024 · 2 comments
Open

Add a train_test_split.py module for dataset creation #124

jeipollack opened this issue Mar 1, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@jeipollack
Copy link
Contributor

WaveDiff v.2.0.x is missing a module to create simulated datasets for training and validation. This issue is to discuss the development of this module. It may require refactoring of the data_config.yaml file.

@jeipollack jeipollack added the enhancement New feature or request label Mar 1, 2024
@jeipollack jeipollack self-assigned this Mar 1, 2024
@jeipollack
Copy link
Contributor Author

jeipollack commented Mar 1, 2024

@tobias-liaudat do training and test datasets require different values for the parameters (e.g. SEDs, Zernike coeffs, etc) as it is in data_config.yaml file? Note, I am asking specifically about the duplicates.

This would imply that different SEDs could be used, Zernikes, spatial variations, etc. If no, then I am wondering if a single set of parameters are specified to generate a single dataset from which it is split by some fraction defined in the config file. There could be other parameters like adding noise, etc.

Btw what does SR mean in the following?

# Gaussian noise for training stars
SNR_range = [10, 110]
# Parameters for the SR in the test dataset
SR_output_dim = 64
SR_output_Q = 1.0

Also what is the purpose of defining:

    stars: null
    noisy_stars: null
    positions: null
    zernike_coeffs: null
    polynomial_coeffs: null

? Maybe I added it as a reminder to myself, but I don't know what values would go there. Would it be the name(s) of the corresponding file?

@tobias-liaudat
Copy link
Member

@jeipollack I don't recall how the parameters with null are handled in the new code. Is it a path to a .npy file?

SR is for super-resolved, those parameters are the ones to change the PSF simulator to generate super resolved stars.

In the original code, at the beginning, the generation of super-resolved (SR) stars was done on the fly, as the parameters it was fast in the GPU and with the parameters I was using. This allowed us to have a lightweight test/train .npy file. However, depending on the parameters, generating the SR stars may take a long time, and you may want to generate them only once and load the stars from the .npy.

In the usual usage of wavediff, we are not interested in having different parameters for the train/test stars. However, to carry out sensitivity testing and how does some errors in the input affect the PSF model after training we will need to have different parameters for train/test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants