-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PROTEUS grid-search or forward model #204
Comments
I agree that this would be incredibly powerful. I can imagine that running an MCMC (or similar method) would be tricky because of the slow runtimes. When we are ready to look into this, maybe we could involve someone who has experience doing retrievals with large models? |
The ML paper you cited is interesting - they ran 50k simulations to train the model. I am finding that a grid of 22 simulations takes about 14 hours to run (on 22 threads). If we scaled this to 50k simulations on 256 threads this would take 50000*14/256 = 114 days. We could of course speed this up by reducing the resolution, etc. |
I believe they need fewer simulations than a "normal" Bayesian model, which is one of their selling points. Nevertheless, even 100k simulations are not impossible when using a large-scale computing facility. We can and should do this sometime in the next year to achieve a large simulation grid, once the current plans with aragog and zephyrus are done. Cosmology solves this problem by running updated large-scale forward models every few years with high-performance codes (e.g. TNG project) and then using these models to train machine learning on them. This is a way to go, but if we can find an algorithm that enables running highly specialised simulations to compute the Bayesian evidence directly for a single planet on ~week(s) timescale, this would be preferable I think. |
A simpler option might be to run a grid of models (>~2000 points) and use a clustering algorithm on their binned spectra (or bulk density) to identify particular groups. This can pick out particular features that may allow us to infer particular parameters of an observed planet by identifying the group into which it best fits. https://hdbscan.readthedocs.io/en/stable/advanced_hdbscan.html |
This is pretty cool! I like it a lot, this seems like a good solution to analyse our outputs. However, it still requires us to set the parameter space by hand, which means that we run many models that are not particularly useful/necessary and do not add valuable information. A Bayesian model selection or sth similar would save computation time until a statistically robust answer is achieved. |
This tool came up today: https://wandb.ai/ It's meant for hyperparameter optimization in machine learning. Via a yaml file you define the parameters to search. You just have to write the interface in python. |
Also on the theme of optimisation, I think we should consider Emcee, since it's well established within the astronomy community. |
For moving towards an inverse method of PROTEUS sometime down the road, we need to consider a computationally feasible approach to run many models to fit a given set of observations.
To give an example of the problem: Let's assume a given exoplanet has the following known/observed parameters with uncertainties: stellar age, orbital distance, planet radius, planet mass, transmission/emission spectrum. Given these parameters, we would like to compute the best-fitting PROTEUS models over a set of input parameters, and then compute a goodness-of-fit metric. This is essentially the description of an atmospheric retrieval, only that PROTEUS simulations are way too computationally expensive as to perform 100k+ simulations.
I am not certain yet what is the best strategy to approach this problem. Here are a few that have some opportunities and drawbacks:
The text was updated successfully, but these errors were encountered: