id | title |
---|---|
acquisition |
Acquisition Functions |
Acquisition functions are heuristics employed to evaluate the usefulness of one of more design points for achieving the objective of maximizing the underlying black box function.
BoTorch supports both analytic as well as (quasi-) Monte-Carlo based acquisition
functions. It provides a generic
AcquisitionFunction
API that
abstracts away from the particular type, so that optimization can be performed
on the same objects.
Many common acquisition functions can be expressed as the expectation of some real-valued function of the model output(s) at the design point(s):
where
Evaluating the acquisition function thus requires evaluating an integral over
the posterior distribution. In most cases, this is analytically intractable. In
particular, analytic expressions generally do not exist for batch acquisition
functions that consider multiple design points jointly (i.e.
An alternative is to use Monte-Carlo (MC) sampling to approximate the integrals.
An MC approximation of
where
For instance, for q-Expected Improvement (qEI), we have:
where
where
All MC-based acquisition functions in BoTorch are derived from
MCAcquisitionFunction
.
Acquisition functions expect input tensors
BoTorch relies on the re-parameterization trick and (quasi)-Monte-Carlo sampling for optimization and estimation of the batch acquisition functions 3. The results below show the reduced variance when estimating an expected improvement (EI) acquisition function using base samples obtained via quasi-MC sampling versus standard MC sampling.
In the plots above, the base samples used to estimate each point are resampled. As discussed in the Overview, a single set of base samples can be used for optimization when the re-parameterization trick is employed. What are the trade-offs between using a fixed set of base samples versus re-sampling on every MC evaluation of the acquisition function? Below, we show that fixing base samples produces functions that are potentially much easier to optimize, without resorting to stochastic optimization methods.
If the base samples are fixed, the problem of optimizing the acquisition function is deterministic, allowing for conventional quasi-second order methods such as L-BFGS or sequential least-squares programming (SLSQP) to be used. These have faster convergence rates than first-order methods and can speed up acquisition function optimization significantly.
One concern is that the approximated acquisition function is biased for any
fixed set of base samples, which may adversely affect the solution. However, we
find that in practice, both the optimal value and the optimal solution of these
biased problems for standard acquisition functions converge quite rapidly to
their true counterparts as more samples are used. Note that for evaluation of
the acquisition function we integrate over a
On the other hand, when re-sampling is used in conjunction with a stochastic optimization algorithm, the kind of bias noted above is no longer a concern. The trade-off here is that the optimization may be less effective, as discussed above.
BoTorch also provides implementations of analytic acquisition functions that
do not depend on MC sampling. These acquisition functions are subclasses of
AnalyticAcquisitionFunction
and only exist for the case of a single candidate point (ExpectedImprovement
,
the analytic version of EI, to it's MC counterpart
qExpectedImprovement
can be found in
this tutorial.
Analytic acquisition functions allow for an explicit expression in terms of the summary statistics of the posterior distribution at the evaluated point(s). A popular acquisition function is Expected Improvement of a single point for a Gaussian posterior, given by
where
where
With some additional work, it is also possible to express the gradient of
the Expected Improvement with respect to the design
BoTorch, in contrast, harnesses PyTorch's automatic differentiation feature ("autograd") in order to obtain gradients of acquisition functions. This makes implementing new acquisition functions much less cumbersome, as it does not require to analytically derive gradients. All that is required is that the operations performed in the acquisition function computation allow for the back-propagation of gradient information through the posterior and the model.
Footnotes
-
D. P. Kingma, M. Welling. Auto-Encoding Variational Bayes. ICLR, 2013. ↩
-
D. J. Rezende, S. Mohamed, D. Wierstra. Stochastic Backpropagation and Approximate Inference in Deep Generative Models. ICML, 2014. ↩
-
J. T. Wilson, R. Moriconi, F. Hutter, M. P. Deisenroth. The Reparameterization Trick for Acquisition Functions. NeurIPS Workshop on Bayesian Optimization, 2017. ↩