Let's try to integrate a generic function $f(x)$. We work in 1D for simplicity and consider the arbitrary example:
\begin{equation*} f(x) = \big( 2 + \sin(x) + 0.8 \sin(5x) \big) \, \exp \left(-\frac{x^2}{2 \sigma^2} \right) \qquad \text{with} \quad \sigma = 1.5 \end{equation*}
We can use uniformly distributed points $x_i$ and compute
\begin{equation*} I = \int_a^b f(x) dx = \int_{-\infty}^\infty \pi(x) \frac{f(x)}{\pi(x)} dx \simeq \frac{1}{N} \sum_{i=1}^N \frac{f(x_i)}{\pi(x_i)} \qquad \pi(x) = \begin{cases} 1/(b-a) & \text{if} \quad a \le x \le b\\ 0 & \text{otherwise} \end{cases} \end{equation*}
Is this a good strategy?
We are free to use other distributions $\pi$ to sample the points $x_i$
\begin{equation*} I = \int_{-\infty}^\infty f(x) dx = \int_{-\infty}^\infty \pi(x) \frac{f(x)}{\pi(x)} dx \simeq \frac{1}{N} \sum_{i=1}^N \frac{f(x_i)}{\pi(x_i)} \end{equation*}
It seems natural to use one that would favour areas where $f(x)$ is large
The width $\sigma$ of the normal distribution is a free parameter
Let's see how different choices for $\sigma$ change the standard deviation
For the optimal $\sigma$ the distribution best captures the areas where the function is large
Ultimately we would want to have a distribution as close to $f(x)$ as possible
Importance sampling can be seen as a variance reduction technique.
Better results are obtained when points are sampled according to a distribution that is close to the function $f(x)$.
We need to be able to sample arbitrary probability distributions.