bayesian optimization
An interactive demo of Bayesian optimization implemented from scratch using only NumPy math operations. This uses Gaussian process regression and an acquisition function to balance sampling high-value vs. high-uncertainty regions.
controls
what is bayesian optimization?
Bayesian optimization is a strategy for finding the maximum of an unknown function that is expensive to evaluate, using as few evaluations as possible. Instead of brute-force sampling, it builds a probabilistic model of what the function may be and uses an acquisition function to decide where to sample next.
place a prior
Start with a Gaussian process, a probability distribution over functions encoding our prior beliefs about smoothness and scale.
observe
Evaluate the true function at a proposed point. The GP updates its beliefs (posterior) to reflect the new data.
acquire
An acquisition function scores every candidate point by trading off high predicted mean (exploitation) against high uncertainty (exploration).
repeat 2-3
Repeat as many times as we want by sampling at the acquisition function's maximum, then updating the GP for the new point.
gaussian process regression
Gaussian Process regression uses a kernel with given parameters to establish the prior (the set of functions we are considering, maybe lumpier or smoother depending on the parameters) and narrows it down based on new data. Given two points, the kernel aims to express how correlated they are with each other. For this demo, I used a radial basis function (RBF) kernel, AKA a Gaussian kernel.
RBF kernel
Controls how correlation decays as distance increases.
Higher ℓ makes our prior a set of smoother, more slowly varying functions.
posterior equations
We get the mean and variance of the posterior at a given (testing) point x* as follows, with X as our
existing (training) data.
I implemented matrix inversion via Cholesky decomposition for more efficiency.
acquisition functions
Our acquisition function balances sampling points that we know have high values (exploitation) with sampling points with lots of uncertainty (exploration). For instance, say we're looking for oil: drilling is costly, but we want to find areas with lots of oil, so we have to make tradeoffs between drilling in places we know have a lot of oil and trying places where we don't yet have enough information to know whether there's oil (high-uncertainty areas). For this demo, I used two different acquisition functions for the user to choose from.
upper confidence bound (UCB)
This acquisition function is just a weighted sum. κ directly controls the tradeoff between exploration and exploitation; higher κ favors exploration.
expected improvement (EI)
This function measures how much improvement over the current best value f⁺ we expect. Φ and φ are standard normal CDF and PDF; ξ is an exploration bonus.