Direct sampling¶
Imagine I have a uniform random number generator giving number according to
\begin{equation*} \pi_\mathrm{uni}(x) = \begin{cases} 1 & \text{for} \quad 0 \le x < 1 \\ 0 & \text{otherwise} \end{cases} \end{equation*}
numpy.random.rand()
generates such numbers.How do I generate new distribution functions?
In some cases, this can be done and independent random variable are obtained.
Tower sampling¶
Tower sampling allows to sample a discrete distribution.
We want to pick a number $\{1, 2, \ldots, n\}$ with corresponding probabilities $\{ \pi_1, \pi_2, \ldots, \pi_n \}$.
Construct the cumulative distribution $\Pi_\ell = \sum_{i=1}^\ell \pi_i$ for $0 \le \ell \le n$. We have $\Pi_n = 1$.
Draw a uniform random number $\mu$ in $[0, 1[$.
Find $k$ such that $\Pi_{k-1} \le \mu < \Pi_k$.
The $k$'s are distributed according to $\pi_k$.
Below is a simple implementation of tower sampling.
It can be improved by sorting the $\pi_i$.
def tower_sampling(π):
Π = np.cumsum(π) # cumulative sum
μ = np.random.rand() # random number in [0,1[
k = 0
while True: # find k
if Π[k] > μ: break
k = k+1
return k+1
π = [0.2, 0.1, 0.4, 0.15, 0.15]
n_samples = 2**14
samples = [tower_sampling(π) for i in range(n_samples)]
Continuous distribution¶
Let's try to extend the tower sampling to continuous distributions. Discrete indices become continuous varialbles
\begin{equation*} \{k, \pi_k\} \quad \to \quad \{x,\pi(x)\} \end{equation*}
Cumulative distribution function
\begin{equation*} \Pi_k \quad \to \quad \Pi(x) = \int_{-\infty}^x \pi(y) dy \end{equation*}
Draw a uniform random number $\mu \in [0, 1[$.
The original condition to find $k$ such that $\Pi_{k-1} \le \mu < \Pi_k$ now becomes finding $x$ such that $\Pi(x) = \mu$, i.e.
\begin{equation*} x = \Pi^{-1}(\mu) \end{equation*}
The sampled $x$'s are distributed according to $\pi(x)$.
Unfortunately it is usually very difficult to find $\Pi^{-1}$.
Example (trivial)¶
Let's start from a very simple example, a uniform distribution over $[a, b[$
\begin{equation*} \pi(x) = \begin{cases} \frac{1}{b-a} & \text{for} \quad a \le x < b \\ 0 & \text{otherwise} \end{cases} \end{equation*}
The cumulative distribution function is
\begin{equation*} \Pi(x) = \int_a^x \pi(y) dy = \frac{x - a}{b - a} \qquad \Pi^{-1}(\mu) = a + (b-a) \mu \end{equation*}
$x$ will be sampled according to $\pi(x)$ if it is computed from a uniform $\mu \in [0,1[$ with
\begin{equation*} x = a + (b-a) \mu \end{equation*}
Example¶
Let us try to sample the exponential distribution for $x \ge 0$
\begin{equation*} \pi(x) = \lambda \mathrm{e}^{-\lambda x} \end{equation*}
The cumulative distribution function is
\begin{equation*} \Pi(x) =\int_0^x \pi(y) dy = \left.-\mathrm{e}^{-\lambda y} \right \rvert_0^x = 1 - \mathrm{e}^{-\lambda x} \qquad \Pi^{-1}(\mu) = -\frac{1}{\lambda} \ln(1 - \mu) \end{equation*}
The distributions of $1-\mu$ and $\mu$ are the same, so we can sample $x$ using
\begin{equation*} x = -\frac{1}{\lambda} \ln(\mu) \end{equation*}
λ = 0.5
n_samples = 2**14
samples = -np.log(np.random.rand(n_samples)) / λ
Summary¶
Similar tricks allow to generate the most common simple distributions, such as the normal distribution, etc.
Generators for these distributions are available in libraries, e.g.
numpy.random
.The values are all independent and often the generation is fast.
However, one must be able to invert the cumulative distribution function, which is in general very difficult.