Skip to content

Commit

Permalink
split paragraphs
Browse files Browse the repository at this point in the history
  • Loading branch information
mastoffel committed Dec 2, 2024
1 parent 72b7e9d commit c31d1d8
Showing 1 changed file with 7 additions and 3 deletions.
10 changes: 7 additions & 3 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,15 +42,19 @@ date: 28 November 2024
bibliography: paper.bib
---

# Summary <img src="../misc/AE_logo_final.png" style="width:100px; float:right; margin-left:10px;">
# Summary

Simulations are ubiquitous in research and application, but are often too slow and computationally expensive to deeply explore the underlying system. One solution is to create efficient emulators (also surrogate- or meta-models) to approximate simulations, but this requires substantial expertise. Here, we present AutoEmulate, a low-code, AutoML-style python package for emulation. AutoEmulate makes it easy to fit and compare emulators, abstracting away the need for extensive machine learning (ML) experimentation. The package includes a range of emulators, from Gaussian Processes, Support Vector Machines and Gradient Boosting Models to novel, experimental deep learning emulators such as Neural Processes [@garnelo_conditional_2018]. It also implements global sensitivity analysis as a common emulator application, which quantifies the relative contribution of different inputs to the output variance. In the future, with user feedback and contributions, we aim to organically grow AutoEmulate into an end-to-end tool for most emulation problems.

# Statement of need

To understand complex real-world systems, researchers and engineers often construct computer simulations. These can be computationally expensive and take minutes, hours or even days to run. For tasks like optimisation, sensitivity analysis or uncertainty quantification where thousands or even millions of runs are needed, a solution has long been to approximate simulations with efficient emulators, which can be orders of magnitudes faster [@forrester_recent_2009; @kudela_recent_2022]. Emulation is becoming increasingly widespread, ranging from engineering [@yondo_review_2018], architecture [@westermann_surrogate_2019], biomedical [@strocchi_cell_2023] and climate science [@bounceur_global_2015], to agent-based models [@angione_using_2022]. A typical emulation pipeline involves three steps: 1. Evaluating the simulation at a small, strategically chosen set of inputs using techniques such as Latin Hypercube Sampling [@mckay_comparison_1979] to create a representative dataset, 2. constructing a high-accuracy emulator using that dataset, which involves model selection, hyperparameter optimisation and evaluation and 3. applying the emulator to tasks such as prediction, sensitivity analysis, or optimisation. Building an emulator in particular requires substantial machine learning experience and knowledge of an ever increasing ecosystem of models and packages. This puts a substantial burden on practitioners whose main focus is to explore the underlying system, not building the emulator.
To understand complex real-world systems, researchers and engineers often construct computer simulations. These can be computationally expensive and take minutes, hours or even days to run. For tasks like optimisation, sensitivity analysis or uncertainty quantification where thousands or even millions of runs are needed, a solution has long been to approximate simulations with efficient emulators, which can be orders of magnitudes faster [@forrester_recent_2009; @kudela_recent_2022]. Emulation is becoming increasingly widespread, ranging from engineering [@yondo_review_2018], architecture [@westermann_surrogate_2019], biomedical [@strocchi_cell_2023] and climate science [@bounceur_global_2015], to agent-based models [@angione_using_2022].

AutoEmulate automates emulator building, with the goal to eventually streamline the whole emulation pipeline. For people new to ML, AutoEmulate compares, optimises and evaluates a range of models to create an efficient emulator for their simulation in just a few lines of code. For experienced surrogate modellers, AutoEmulate provides a reference set of cutting-edge emulators to quickly benchmark new models against. The package includes classic emulators such as Radial Basis Functions and Gaussian Processes, established ML models like Gradient Boosting and Support Vector Machines, as well as experimental deep learning emulators such as [Conditional Neural Processes](https://yanndubs.github.io/Neural-Process-Family/text/Intro.html) [@garnelo_conditional_2018]. AutoEmulate is built to be extensible. Emulators follow the popular [scikit-learn estimator template](https://scikit-learn.org/1.5/developers/develop.html#rolling-your-own-estimator) and deep learning models are supported with little overhead using PyTorch [@paszke_pytorch_2019] with a skorch [@tietz_skorch_2017] interface. AutoEmulate fills a gap in the current landscape of surrogate modeling tools as it’s both highly accessible for newcomers while providing cutting-edge emulators for experienced surrogate modelers. In contrast, existing libraries either focus on lower level implementations of specific models, like GPflow [@matthews_gpflow_2017] and GPytorch [@gardner_gpytorch_2018], or provide multiple emulators and applications but require to manually pre-process data, compare emulators and optimise hyperparameters like SMT in Python [@saves_smt_2024] or [Surrogates.jl](https://docs.sciml.ai/Surrogates/latest/) in Julia.
A typical emulation pipeline involves three steps: 1. Evaluating the simulation at a small, strategically chosen set of inputs using techniques such as Latin Hypercube Sampling [@mckay_comparison_1979] to create a representative dataset, 2. constructing a high-accuracy emulator using that dataset, which involves model selection, hyperparameter optimisation and evaluation and 3. applying the emulator to tasks such as prediction, sensitivity analysis, or optimisation. Building an emulator in particular requires substantial machine learning experience and knowledge of an ever increasing ecosystem of models and packages. This puts a substantial burden on practitioners whose main focus is to explore the underlying system, not building the emulator.

AutoEmulate automates emulator building, with the goal to eventually streamline the whole emulation pipeline. For people new to ML, AutoEmulate compares, optimises and evaluates a range of models to create an efficient emulator for their simulation in just a few lines of code. For experienced surrogate modellers, AutoEmulate provides a reference set of cutting-edge emulators to quickly benchmark new models against. The package includes classic emulators such as Radial Basis Functions and Gaussian Processes, established ML models like Gradient Boosting and Support Vector Machines, as well as experimental deep learning emulators such as [Conditional Neural Processes](https://yanndubs.github.io/Neural-Process-Family/text/Intro.html) [@garnelo_conditional_2018]. AutoEmulate is built to be extensible. Emulators follow the popular [scikit-learn estimator template](https://scikit-learn.org/1.5/developers/develop.html#rolling-your-own-estimator) and deep learning models are supported with little overhead using PyTorch [@paszke_pytorch_2019] with a skorch [@tietz_skorch_2017] interface.

AutoEmulate fills a gap in the current landscape of surrogate modeling tools as it’s both highly accessible for newcomers while providing cutting-edge emulators for experienced surrogate modelers. In contrast, existing libraries either focus on lower level implementations of specific models, like GPflow [@matthews_gpflow_2017] and GPytorch [@gardner_gpytorch_2018], or provide multiple emulators and applications but require to manually pre-process data, compare emulators and optimise hyperparameters like SMT in Python [@saves_smt_2024] or [Surrogates.jl](https://docs.sciml.ai/Surrogates/latest/) in Julia.

# Pipeline

Expand Down

0 comments on commit c31d1d8

Please sign in to comment.