Importance sampling¶

  • Let's try to integrate a generic function $f(x)$. We work in 1D for simplicity and consider the arbitrary example:

    \begin{equation*} f(x) = \big( 2 + \sin(x) + 0.8 \sin(5x) \big) \, \exp \left(-\frac{x^2}{2 \sigma^2} \right) \qquad \text{with} \quad \sigma = 1.5 \end{equation*}

%matplotlib inline
%config InlineBackend.figure_format = 'retina'
import matplotlib.pyplot as plt
import numpy as np
import scipy.integrate
plt.style.use('style.mpl')
def f(x):
    return (2 + np.sin(x) + 0.8*np.sin(5*x)) * np.exp(-0.5*x**2/1.5**2)

Limitations of uniform sampling¶

  • We can use uniformly distributed points $x_i$ and compute

    \begin{equation*} I = \int_a^b f(x) dx = \int_{-\infty}^\infty \pi(x) \frac{f(x)}{\pi(x)} dx \simeq \frac{1}{N} \sum_{i=1}^N \frac{f(x_i)}{\pi(x_i)} \qquad \pi(x) = \begin{cases} 1/(b-a) & \text{if} \quad a \le x \le b\\ 0 & \text{otherwise} \end{cases} \end{equation*}

  • Is this a good strategy?

# check result
n_points = 2**5
points = np.random.uniform(a, b, n_points)
res = 12 * np.sum(f(points)) / n_points
print(f"Estimate = {res}")
Estimate = 8.147761440036945
  • The drawback is that many points sample useless areas

Sampling with other distributions¶

  • We are free to use other distributions $\pi$ to sample the points $x_i$

    \begin{equation*} I = \int_{-\infty}^\infty f(x) dx = \int_{-\infty}^\infty \pi(x) \frac{f(x)}{\pi(x)} dx \simeq \frac{1}{N} \sum_{i=1}^N \frac{f(x_i)}{\pi(x_i)} \end{equation*}

  • It seems natural to use one that would favour areas where $f(x)$ is large

  • Let's try with a normal distribution of width $\sigma$
# check result
n_points = 2**5
σ = 1
gauss = lambda x: np.exp(-0.5 * x**2 / σ**2) / σ / np.sqrt(2*np.pi)
points = np.random.normal(scale=σ, size=n_points)
res = np.sum(f(points) / gauss(points)) / n_points
print(f"Estimate = {res}")
Estimate = 6.928510600960699

Optimal importance sampling¶

  • The width $\sigma$ of the normal distribution is a free parameter

  • Let's see how different choices for $\sigma$ change the standard deviation

  • For the optimal $\sigma$ the distribution best captures the areas where the function is large

  • Ultimately we would want to have a distribution as close to $f(x)$ as possible

Summary¶

  • Importance sampling can be seen as a variance reduction technique.

  • Better results are obtained when points are sampled according to a distribution that is close to the function $f(x)$.

  • We need to be able to sample arbitrary probability distributions.

Outline¶

  • Introduction to Monte Carlo

  • Newton–Cotes quadrature

  • Importance sampling

  • Direct sampling methods

  • Markov chain sampling and balance condition

  • Metropolis-Hastings algorithm

  • The two-dimensional Ising model

  • Error bar analysis

  • References