Direct sampling¶
- Imagine I have a uniform random number generator giving number according to - \begin{equation*} \pi_\mathrm{uni}(x) = \begin{cases} 1 & \text{for} \quad 0 \le x < 1 \\ 0 & \text{otherwise} \end{cases} \end{equation*} 
- numpy.random.rand()generates such numbers.
- How do I generate new distribution functions? 
- In some cases, this can be done and independent random variable are obtained. 
Tower sampling¶
- Tower sampling allows to sample a discrete distribution. 
- We want to pick a number $\{1, 2, \ldots, n\}$ with corresponding probabilities $\{ \pi_1, \pi_2, \ldots, \pi_n \}$. 
- Construct the cumulative distribution $\Pi_\ell = \sum_{i=1}^\ell \pi_i$ for $0 \le \ell \le n$. We have $\Pi_n = 1$. 
- Draw a uniform random number $\mu$ in $[0, 1[$. 
- Find $k$ such that $\Pi_{k-1} \le \mu < \Pi_k$. 
- The $k$'s are distributed according to $\pi_k$. 
- Below is a simple implementation of tower sampling. 
- It can be improved by sorting the $\pi_i$. 
def tower_sampling(π):
    Π = np.cumsum(π) # cumulative sum
    μ = np.random.rand() # random number in [0,1[
    k = 0
    while True: # find k
        if Π[k] > μ: break
        k = k+1
    return k+1
π = [0.2, 0.1, 0.4, 0.15, 0.15]
n_samples = 2**14
samples = [tower_sampling(π) for i in range(n_samples)]
Continuous distribution¶
- Let's try to extend the tower sampling to continuous distributions. Discrete indices become continuous varialbles - \begin{equation*} \{k, \pi_k\} \quad \to \quad \{x,\pi(x)\} \end{equation*} 
- Cumulative distribution function - \begin{equation*} \Pi_k \quad \to \quad \Pi(x) = \int_{-\infty}^x \pi(y) dy \end{equation*} 
- Draw a uniform random number $\mu \in [0, 1[$. 
- The original condition to find $k$ such that $\Pi_{k-1} \le \mu < \Pi_k$ now becomes finding $x$ such that $\Pi(x) = \mu$, i.e. - \begin{equation*} x = \Pi^{-1}(\mu) \end{equation*} 
- The sampled $x$'s are distributed according to $\pi(x)$. 
- Unfortunately it is usually very difficult to find $\Pi^{-1}$. 
Example (trivial)¶
- Let's start from a very simple example, a uniform distribution over $[a, b[$ - \begin{equation*} \pi(x) = \begin{cases} \frac{1}{b-a} & \text{for} \quad a \le x < b \\ 0 & \text{otherwise} \end{cases} \end{equation*} 
- The cumulative distribution function is - \begin{equation*} \Pi(x) = \int_a^x \pi(y) dy = \frac{x - a}{b - a} \qquad \Pi^{-1}(\mu) = a + (b-a) \mu \end{equation*} 
- $x$ will be sampled according to $\pi(x)$ if it is computed from a uniform $\mu \in [0,1[$ with - \begin{equation*} x = a + (b-a) \mu \end{equation*} 
Example¶
- Let us try to sample the exponential distribution for $x \ge 0$ - \begin{equation*} \pi(x) = \lambda \mathrm{e}^{-\lambda x} \end{equation*} 
- The cumulative distribution function is - \begin{equation*} \Pi(x) =\int_0^x \pi(y) dy = \left.-\mathrm{e}^{-\lambda y} \right \rvert_0^x = 1 - \mathrm{e}^{-\lambda x} \qquad \Pi^{-1}(\mu) = -\frac{1}{\lambda} \ln(1 - \mu) \end{equation*} 
- The distributions of $1-\mu$ and $\mu$ are the same, so we can sample $x$ using - \begin{equation*} x = -\frac{1}{\lambda} \ln(\mu) \end{equation*} 
λ = 0.5
n_samples = 2**14
samples = -np.log(np.random.rand(n_samples)) / λ
Summary¶
- Similar tricks allow to generate the most common simple distributions, such as the normal distribution, etc. 
- Generators for these distributions are available in libraries, e.g. - numpy.random.
- The values are all independent and often the generation is fast. 
- However, one must be able to invert the cumulative distribution function, which is in general very difficult.