PROTEUS grid-search or forward model #204

timlichtenberg · 2024-10-07T14:40:06Z

For moving towards an inverse method of PROTEUS sometime down the road, we need to consider a computationally feasible approach to run many models to fit a given set of observations.

To give an example of the problem: Let's assume a given exoplanet has the following known/observed parameters with uncertainties: stellar age, orbital distance, planet radius, planet mass, transmission/emission spectrum. Given these parameters, we would like to compute the best-fitting PROTEUS models over a set of input parameters, and then compute a goodness-of-fit metric. This is essentially the description of an atmospheric retrieval, only that PROTEUS simulations are way too computationally expensive as to perform 100k+ simulations.

I am not certain yet what is the best strategy to approach this problem. Here are a few that have some opportunities and drawbacks:

A modified chi-squared or r-squared algorithm to compute some measure of goodness-of-fit for an arbitrary grid. E.g. Madhusudhan & Seager (2009).
Train a machine learning model on the simulation data and use the machine learning model for the retrieval, e.g., Ardevol Martinez et al. (2024).
Brute-force retrieval approach, e.g., nested sampling, MCMC, or some other random sampler, ignoring the computational cost, and possibly achieving a pretty low-confidence result.

nichollsh · 2024-10-08T12:22:46Z

I agree that this would be incredibly powerful. I can imagine that running an MCMC (or similar method) would be tricky because of the slow runtimes. When we are ready to look into this, maybe we could involve someone who has experience doing retrievals with large models?

nichollsh · 2024-10-08T12:27:56Z

The ML paper you cited is interesting - they ran 50k simulations to train the model. I am finding that a grid of 22 simulations takes about 14 hours to run (on 22 threads). If we scaled this to 50k simulations on 256 threads this would take 50000*14/256 = 114 days. We could of course speed this up by reducing the resolution, etc.

timlichtenberg · 2024-10-08T12:35:46Z

I believe they need fewer simulations than a "normal" Bayesian model, which is one of their selling points. Nevertheless, even 100k simulations are not impossible when using a large-scale computing facility. We can and should do this sometime in the next year to achieve a large simulation grid, once the current plans with aragog and zephyrus are done. Cosmology solves this problem by running updated large-scale forward models every few years with high-performance codes (e.g. TNG project) and then using these models to train machine learning on them. This is a way to go, but if we can find an algorithm that enables running highly specialised simulations to compute the Bayesian evidence directly for a single planet on ~week(s) timescale, this would be preferable I think.

nichollsh · 2024-11-04T16:27:15Z

A simpler option might be to run a grid of models (>~2000 points) and use a clustering algorithm on their binned spectra (or bulk density) to identify particular groups. This can pick out particular features that may allow us to infer particular parameters of an observed planet by identifying the group into which it best fits.

https://hdbscan.readthedocs.io/en/stable/advanced_hdbscan.html

timlichtenberg · 2024-11-11T09:29:55Z

This is pretty cool! I like it a lot, this seems like a good solution to analyse our outputs. However, it still requires us to set the parameter space by hand, which means that we run many models that are not particularly useful/necessary and do not add valuable information. A Bayesian model selection or sth similar would save computation time until a statistically robust answer is achieved.

stefsmeets · 2024-11-21T10:48:46Z

This tool came up today: https://wandb.ai/

It's meant for hyperparameter optimization in machine learning. Via a yaml file you define the parameters to search. You just have to write the interface in python.

nichollsh · 2024-11-27T08:54:39Z

Also on the theme of optimisation, I think we should consider Emcee, since it's well established within the astronomy community.

https://emcee.readthedocs.io/en/stable/

timlichtenberg added this to PROTEUS Development Roadmap Oct 7, 2024

timlichtenberg converted this from a draft issue Oct 7, 2024

This was referenced Oct 18, 2024

Compare PROTEUS (grid) output to population data #212

Closed

Generalise stellar wrapper, dummy star module #220

Merged

Support for arbitrary gases #229

Merged

Fix grid_proteus.py #231

Merged

lsoucasse added the ensembles Relating to grids or forward models label Nov 11, 2024

timlichtenberg added the Priority 4: tbd Priority level 4: nice to have features and/or has some time label Nov 19, 2024

timlichtenberg mentioned this issue Nov 19, 2024

Rework for and optimisations to GridPROTEUS #162

Open

nichollsh moved this from TBD to JOSS Publication in PROTEUS Development Roadmap Nov 27, 2024

nichollsh changed the title ~~PROTEUS grid-search~~ PROTEUS grid-search or forward model Nov 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PROTEUS grid-search or forward model #204

PROTEUS grid-search or forward model #204

timlichtenberg commented Oct 7, 2024 •

edited

Loading

nichollsh commented Oct 8, 2024

nichollsh commented Oct 8, 2024

timlichtenberg commented Oct 8, 2024 •

edited

Loading

nichollsh commented Nov 4, 2024

timlichtenberg commented Nov 11, 2024

stefsmeets commented Nov 21, 2024

nichollsh commented Nov 27, 2024

PROTEUS grid-search or forward model #204

PROTEUS grid-search or forward model #204

Comments

timlichtenberg commented Oct 7, 2024 • edited Loading

nichollsh commented Oct 8, 2024

nichollsh commented Oct 8, 2024

timlichtenberg commented Oct 8, 2024 • edited Loading

nichollsh commented Nov 4, 2024

timlichtenberg commented Nov 11, 2024

stefsmeets commented Nov 21, 2024

nichollsh commented Nov 27, 2024

timlichtenberg commented Oct 7, 2024 •

edited

Loading

timlichtenberg commented Oct 8, 2024 •

edited

Loading