The random processes which are generating observations under a certain probability distribution can be characterized in multiple ways as seen before. This section will discuss some common discrete and continuous probability distributions and explain in which context they are used. Their probability distributions are given together with their expected value and variance. The derivations of the cumulative distribution functions, expected values and variances are all provided. It is important to understand and follow the full derivations before blindly using the outcomes of these derivations.
In order to indicate that a random variable $X$ is distributed according to a certain distribution, e.g., univariate standard normal distribution, we may write $X \sim \mathcal{N}(0,1)$. By this notation, the letter $\mathcal{N}$ indicates the normal distribution, while the numbers in parenthesis indicate the parameters controlling the distribution. In the case of a normal distribution, these are the mean and the variance. Thus, $X \sim \mathcal{N}(0,1)$ reads “the random variable $X$ is normally distributed with zero mean and unitary variance”.
Families of discrete random variables
Screencast video [⯈]
The Bernoulli(p) distribution
The Bernoulli distribution is a discrete probability distribution that models an experiment where only 2 outcomes are possible. The probability distribution of flipping a coin is an example of a Bernoulli distribution. These outcomes are mapped to 0 and 1, whose probabilities are $1-p$ and $p$ respectively. The distribution is fully characterized by the parameter $p$, which is the probability of success ($\Pr[X = 1]$).
Probability mass function
The probability mass function of the discrete Bernoulli(p) distribution is given as \begin{equation} p_X(x) = \begin{cases} 1-p, & \text{for } x=0 \newline p, & \text{for } x=1 \newline 0, & \text{otherwise} \end{cases} \end{equation} where $p$ is in the range $ 0 < p < 1 $.Cumulative distribution function
The cumulative distribution function of the discrete Bernoulli(p) distribution can be determined as \begin{equation*} \begin{split} P_X(x) &= \begin{cases} 0, &\text{for } x<0 \\ 1-p, &\text{for } x=0 \\ 1. &\text{for } x>0 \end{cases} \end{split} \end{equation*}Expected value
The expected value of the discrete Bernoulli(p) distribution can be determined as \begin{equation} \begin{split} \mathbb{E}[X] &= p. \end{split} \end{equation}Variance
The variance of the discrete Bernoulli(p) distribution can be determined as \begin{equation} \begin{split} \text{Var}[X] &= p(1-p) \end{split} \end{equation}The Geometric(p) distribution
The Geometric distribution is a discrete probability distribution that models an experiment with probability of success $p$. The Geometric distribution gives the probability that the first success is observed at the $x^{th}$ independent trial. The distribution is fully characterized by the parameter $p$, which is the probability of success.
Probability mass function
The probability mass function of the discrete Geometric(p) distribution is given as \begin{equation} p_X(x) = \begin{cases} p(1-p)^{x-1}, & \text{for } x=1,2,\ldots \\ 0, & \text{otherwise} \end{cases} \end{equation} where $p$ is in the range $ 0 < p < 1 $.Cumulative distribution function
The cumulative distribution function of the discrete Geometric(p) distribution can be determined as \begin{equation} \begin{split} P_X(x) &= \begin{cases} 0, &\text{for } x < 1\\ 1-(1-p)^x. &\text{for } x\geq 1 \end{cases} \end{split} \end{equation}Expected value
The expected value of the discrete Geometric(p) distribution can be determined as \begin{equation} \begin{split} \mathbb{E}[X] &= \frac{1}{p}. \end{split} \end{equation}Variance
The variance of the discrete Geometric(p) distribution can be determined as \begin{equation} \begin{split} \text{Var}[X] &= \frac{1-p}{p^2}. \end{split} \end{equation}Binomial(n,p) distribution
The Binomial distribution is a discrete probability distribution that models an experiment with probability of success $p$. The Binomial distribution gives the probability of observing $x$ successes in $n$ independent trials. The distribution is fully characterized by the parameters $n$ and $p$. The parameter $n$ denotes the number of independent trials and the parameter $p$ denotes the probability of observing a success per trial.
Probability mass function
The probability mass function of the discrete Binomial(n,p) distribution is given as \begin{equation} p_X(x) \overset{\href{./#fn:4}{4}}{=} \begin{pmatrix} n \\ x\end{pmatrix}p^x(1-p)^{n-x}, \end{equation} where $0 < p < 1$ and $n$ is an integer such that $n\geq 1$.Cumulative distribution function
The cumulative distribution function of the discrete Binomial(n,p) distribution can be determined as \begin{equation} \begin{split} P_X(x) &= \sum_{m=0}^x \begin{pmatrix} n \\ m\end{pmatrix}p^m(1-p)^{n-m}. \end{split} \end{equation}Expected value
The expected value of the discrete Binomial(n,p) distribution can be determined as \begin{equation} \begin{split} \mathbb{E}[X] &= np \end{split} \end{equation}Variance
The variance of the discrete Binomial(n,p) distribution can be determined as \begin{equation} \begin{split} \text{Var}[X] &= np(1-p). \end{split} \end{equation}The Pascal(k,p) distribution
The Pascal distribution is a probability distribution that is also known as the negative Binomial distribution. The Pascal distribution gives the probability of observing the $k^{th}$ success at the $x^{th}$ trial. The distribution is fully characterized by the parameters $k$ and $p$. The parameter $k$ denotes the desired number of successes and the parameter $p$ denotes the chance of success in an individual trial.
Probability mass function
The probability mass function of the discrete Pascal(k,p) distribution is given as \begin{equation} p_X(x) = \begin{pmatrix} x-1 \\ k-1\end{pmatrix}p^k(1-p)^{x-k}, \end{equation} where $0 < p < 1$ and $k$ is an integer such that $k\geq 1$.Cumulative distribution function
The cumulative distribution function of the discrete Pascal(k,p) distribution can be determined as \begin{equation} \begin{split} P_X(x) &= \sum_{n=-\infty}^x\begin{pmatrix} n-1 \\ k-1\end{pmatrix}p^k(1-p)^{n-k}. \end{split} \end{equation}Expected value
The expected value of the discrete Pascal(k,p) distribution can be determined as \begin{equation} \begin{split} \mathbb{E}[X] &= \frac{k}{p}. \end{split} \end{equation}Variance
The variance of the discrete Pascal(k,p) distribution can be determined as \begin{equation} \begin{split} \text{Var}[X] &= \frac{k(1-p)}{p^2}. \end{split} \end{equation}The discrete Uniform(k,l) distribution
The discrete uniform distribution is a discrete probability distribution that models an experiment where the outcomes are mapped only to discrete points on the interval from $k$ up to and including $l$. The distribution is fully characterized by the parameters $k$ and $l$, which are the discrete lower and upper bound of the interval respectively.
Probability mass function
The probability mass function of the discrete Uniform(k,l) distribution is given as \begin{equation} p_X(x) = \begin{cases} \frac{1}{l-k+1}, &\text{for }x = k, k+1, k+2, \ldots,l\\ 0, &\text{otherwise} \end{cases} \end{equation} where $k$ and $l$ are integers such that $k < l$.Cumulative distribution function
The cumulative distribution function of the discrete Uniform(k,l) distribution can be determined as \begin{equation} \begin{split} P_X(x) = \begin{cases} 0 &\text{for } x < k \\ \frac{x-k+1}{l-k+1} &\text{for } k \leq x < l \\ 1 &\text{for } x \geq l \end{cases} \end{split} \end{equation}Expected value
The expected value of the discrete Uniform(k,l) distribution can be determined as \begin{equation} \begin{split} \mathbb{E}[X] &= \frac{k+l}{2} \end{split} \end{equation}Variance
The variance of the discrete Uniform(k,l) distribution can be determined as \begin{equation*} \begin{split} \text{Var}[X] &= \frac{(l-k+1)^2-1}{12} \end{split} \end{equation*}The Poisson($\alpha$) distribution
The Poisson distribution is a discrete probability distribution that models the number of events occurring within a certain interval of time, in which the events occur independently from each other at a constant rate. The exact moments at which the events occur are unknown, however, the average number of events occurring within the interval is known and is denoted by the parameter $\alpha$. An example of a process, where the number of events within an interval can be described as a Poisson distribution, is the number of phone calls over a network. For optimal allocation of resources, a service provider needs to know the chance that the allocated capacity is insufficient in order to limit the number of dropped calls. The inhabitants can be described as independent entities (i.e. everyone makes a phone call whenever it suits him or her), whilst they usually have their own habit of making phone calls.
Probability mass function
The probability mass function of the discrete Poisson($\alpha$) distribution is given as \begin{equation} p_X(x) = \begin{cases} \frac{\alpha^xe^{-\alpha}}{x!}, &\text{for }x = 0,1,2,\ldots\\ 0, &\text{otherwise} \end{cases} \end{equation} where $\alpha$ is in the range $\alpha > 0$.Cumulative distribution function
The cumulative distribution function of the discrete Poisson($\alpha$) distribution can be determined as \begin{equation} \begin{split} P_X(x) &= \begin{cases} 0, &\text{for } x<0 \\ e^{-\alpha} \sum_{n=0}^x \frac{\alpha^n}{n!}. &\text{for } x\geq 0 \end{cases} \end{split} \end{equation}Expected value
The expected value of the discrete Poisson($\alpha$) distribution can be determined as \begin{equation} \begin{split} \mathbb{E}[X] = \alpha \end{split} \end{equation}Variance
The variance of the discrete Poisson($\alpha$) distribution can be determined as \begin{equation} \begin{split} \text{Var}[X] &= \alpha \end{split} \end{equation}Families of continuous random variables
Screencast video [⯈]
The Exponential($\lambda$) distribution
The exponential distribution is a continuous probability distribution that follow an exponential curve. The curve is fully characterized by the rate parameter $\lambda$.
Probability density function
The probability density function of the continuous Exponential($\lambda$) distribution is given as \begin{equation} p_X(x) = \begin{cases} \lambda e^{-\lambda x}, &\text{for }x\geq 0 \\ 0, &\text{for }x<0 \end{cases} \end{equation} where $\lambda > 0$.Cumulative distribution function
The cumulative distribution function of the continuous Exponential($\lambda$) distribution can be determined as \begin{equation} \begin{split} P_X(x) &= \begin{cases} 1-e^{-\lambda x} &\text{for } x\geq 0\\ 0 &\text{for } x<0 \end{cases}\\ \end{split} \end{equation}Expected value
The expected value of the continuous Exponential($\lambda$) distribution can be determined as \begin{equation} \begin{split} \mathbb{E}[X] &= \frac{1}{\lambda}. \end{split} \end{equation}Variance
The variance of the continuous Exponential($\lambda$) distribution can be determined as \begin{equation} \begin{split} \text{Var}[X] = \frac{1}{\lambda^2} \end{split} \end{equation}The continuous Uniform(a,b) distribution
The continuous Uniform distribution is a continuous probability distribution that models an experiment where the outcomes are mapped only to the interval from $a$ up to and including $b$, with the same probability all over this range. The distribution is fully characterized by the parameters $a$ and $b$, which are the continuous lower and upper bound of the interval respectively.
Probability density function
The probability density function of the continuous Uniform(a,b) distribution is given as \begin{equation} p_X(x) = \begin{cases} \frac{1}{b-a}, &\text{for }a \leq x \leq b \\ 0, &\text{otherwise} \end{cases}, \end{equation} where $b > a$.Cumulative distribution function
The cumulative distribution function of the continuous Uniform(a,b) distribution can be determined as \begin{equation} \begin{split} P_X(x) &= \begin{cases} 0 &\text{for } x\leq a \\ \frac{x-a}{b-a} &\text{for } a < x < b \\ 1 &\text{for } x\geq b \end{cases} \end{split} \end{equation}Expected value
The expected value of the continuous Uniform(a,b) distribution can be determined as \begin{equation} \begin{split} \mathbb{E}[X] = \frac{a+b}{2}. \end{split} \end{equation}Variance
The variance of the continuous Uniform(a,b) distribution can be determined as \begin{equation} \begin{split} \text{Var}[X] = \frac{1}{12}(b-a)^2. \end{split} \end{equation}The Normal or Gaussian $\mathcal{N}(\mu, \sigma^2)$ distribution
The Normal or Gaussian distribution is probably the most commonly used continuous probability distribution. The distribution is bell-shaped and symmetric. The function is characterized by its mean $\mu$ and its variance $\sigma^2$.
The Standard normal $\mathcal{N}(0,1)$ distribution
The Standard normal distribution is a specific case of the Normal or Gaussian distribution, where the mean equals $\mu = 0$ and the variance equals $\sigma^2=1$. This function can be regarded as the normalized Gaussian distribution. Any random variable $Y \sim \mathcal{N}(\mu_Y, \sigma_Y^2)$ can be transformed to a random variable $X$ under the Standard normal distribution by subtracting its mean and dividing by the standard deviation as $X = \frac{Y-\mu_Y}{\sigma_Y}$
The $Q$-function
The $Q$-function is a commonly used function in statistics, which calculates the probability of a Standard normal distributed random variable $X$ exceeding a certain threshold $x$. It is also known as the right-tail probability of the Gaussian distribution, since it is calculated by integrating the right side of the Gaussian PDF from the threshold $x$ up to $\infty$. The $Q$-function is defined as \begin{equation} Q(x) = \Pr[X>x] = \frac{1}{\sqrt{2\pi}} \int_x^\infty e^{-\frac{u^2}{2}}\mathrm{d}u. \end{equation} The function can be used for all Gaussian distributed random variables, however, the random variable and the corresponding threshold should be normalized first. Additionally, through symmetry follows that $Q(x) = 1-Q(-x)$, where $Q(-x)$ is equal to the cumulative density function $P_X(x)$.
Probability density function
The probability density function of the continuous Gaussian $\mathcal{N}$($\mu, \sigma^2$) distribution is given as \begin{equation} p_X(x) = \frac{1}{\sqrt{2\pi\sigma^2}}e^{-\frac{(x-\mu)^2}{2\sigma^2}}, \end{equation} where $\sigma > 0$.Cumulative distribution function
The cumulative distribution function of the continuous Gaussian $\mathcal{N}$($\mu, \sigma^2$) distribution can be determined as \begin{equation} \begin{split} P_X(x) &= Q(-\frac{x-\mu}{\sigma}) \end{split} \end{equation}Expected value
The expected value of the continuous Gaussian $\mathcal{N}$($\mu, \sigma^2$) distribution can be determined as \begin{equation} \begin{split} \mathbb{E}[X] = \mu. \end{split} \end{equation}Variance
The variance of the continuous Gaussian $\mathcal{N}$($\mu, \sigma^2$) distribution can be determined as \begin{equation} \begin{split} \text{Var}[X] = \sigma^2. \end{split} \end{equation}As is implicated by the central limit theorem (explained in the section Funciton and pairs of random variables), the Gaussian distribution is extremely important. The Gaussian distribution is often used to model measurements in practice and, thanks to the CLT, its use can often be extended to other distributions. A Gaussian distribution is also often used to model the thermal noise of a band-limited system. This section will generalize the definition of the Gaussian distribution given in the previous reader and extend it to the multivariate case.
Univariate distribution
In the case of a single random variable $X$ that is generated according to a Gaussian distribution, defined by its mean $\mu$ and variance $\sigma^2$ where the subscript $\cdot_X$ is now omitted for simplification, the probability density function is defined as \begin{equation} p_X(x) = \frac{1}{\sqrt{2\pi\sigma^2}} e^{-\frac{(x-\mu)^2}{2\sigma^2}}. \end{equation} The left side of the figure below shows an example of an univariate Gaussian distribution.
Multivariate distribution
The definition of the univariate Gaussian distribution can be extended to a multivariate distribution. To better understand the multivariate gaussian distribution, it might be useful to first read the sections Funciton and pairs of random variables and Random vectors. To define the Gaussian distribution the position and its spread are required. These quantities are represented by the mean vector $\bf\mu$ and the covariance matrix $\bf\Sigma$. Whereas the covariance matrix is defined as $\bf\Gamma$ here, literature has adopted the $\bf\Sigma$ notation when discussing multivariate Gaussian distributions, as $\Sigma$ is the Greek capital letter of $\sigma$.
To indicate that a $k$-dimensional random vector $\bf{X}$ is Gaussian distributed, we can write ${\bf{X}} \sim \mathcal{N}_k(\bf{\mu},\bf{\Sigma})$. The probability density function of such a multivariate Gaussian distribution is defined as \begin{equation} p_{\bf{X}}({\bf{x}}) = \frac{1}{\sqrt{(2\pi)^k|\bf{\Sigma}|}}\exp \left\{-\frac{1}{2} ({\bf{x}}-\bf{\mu})^\top \bf{\Sigma}^{-1}({\bf{x}}-\bf{\mu})\right\}, \end{equation} where $|\bf{\Sigma}|$ is the determinant of the covariance matrix. Please note the similarities between the univariate Gaussian distribution and the multivariate distribution. The inverse covariance matrix $\bf{\Sigma}^{-1}$ is often also called the precision matrix and is denoted by $\bf{\Lambda}$, because a low variance (i.e. low spread) relates to high precision and vice versa.
The covariance matrix of a multivariate Gaussian distribution
The probability density function of a Gaussian distribution is fully determined by its mean $\bf{\mu}$ and its covariance matrix $\bf{\Sigma}$. In order to give some intuition on how the mean and covariance matrix structure influence the final distribution, we jump forward to Fig. 2 in the next section where three multivariate distributions have been plotted. The covariance matrices that were used to plot these distributions in the figure are from left to right: \begin{equation} \bf{\Sigma}_1 = \begin{bmatrix}1 & -0.5 \newline -0.5 & 1\end{bmatrix} \qquad \bf{\Sigma}_2 = \begin{bmatrix}1 & 0 \newline 0 & 1\end{bmatrix} \qquad \bf{\Sigma}_3 = \begin{bmatrix}1 & 0.5 \newline 0.5 & 1\end{bmatrix} \end{equation} Please note how the off-diagonal entries, referring to Cov$[X_1,X_2]$ and Cov$[X_2,X_1]$ influence the shape of the distribution.
In order to understand how the covariance matrix is related to the tilt and the shape of the distribution, we need to first introduce the so-called rotation matrix and the eigenvalue decomposition. The rotation matrix $R_\theta$ rotates a coordinate counter-clockwise over an angle $\theta$ with respect to the origin. This rotation matrix is defined as \begin{equation}\label{eq:rot_mat} R_\theta = \begin{bmatrix} \cos(\theta) & -\sin(\theta) \newline \sin(\theta) & \cos(\theta) \end{bmatrix} \end{equation} and a rotation of $\theta$ from the coordinates $(x,y)$ to the coordinates $(x’, y’)$ can be represented by \begin{equation} \begin{bmatrix} x’ \newline y’ \end{bmatrix} = R_\theta \begin{bmatrix} x \newline y \end{bmatrix} = \begin{bmatrix} x\cos(\theta) - y\sin(\theta) \newline x\sin(\theta) + y\cos(\theta) \end{bmatrix}. \end{equation} One of the properties of a rotation matrix is that it is orthogonal. This means that $R_\theta R_\theta^\top = I$, where $I$ is the identity matrix. Using the fact that $R_\theta^{-1}= R_{-\theta} = R_\theta^\top$ from its definition, the orthogonality property makes complete sense, because rotating a coordinate with the angle $-\theta$ and $\theta$ respectively does not change anything.
Besides the rotation matrices, we need to introduce the eigenvalue decomposition in order to better understand the covariance matrix structure. The eigenvalue decomposition states that a square invertible symmetric matrix $A$ can be written as \begin{equation} A = Q\Lambda Q^{-1}, \end{equation} where the orthogonal matrix $Q$ contains the eigenvectors of $A$ and $\Lambda$ is a diagonal matrix containing the eigenvalues of $A$.
Now the general representation of the rotation matrix has been defined as well as the eigenvalue decomposition, we can show that any covariance matrix can be written as the rotations of a diagonal covariance matrix. This point is very important to understand. To start off, a diagonal covariance matrix can be represented as \begin{equation} \bf{\Sigma}_d = \begin{bmatrix} a & 0 \newline 0 & b \end{bmatrix}. \end{equation} The entries $a$ and $b$ correspond to the individual variances of $X_1$ and $X_2$ according to the definitions and are at the same time the eigenvalues of $\bf{\Sigma}_d$. An example of a Gaussian distribution that corresponds to a diagonal covariance matrix where $a = 25$ and $b=4$ is shown on the left in the figure below. Please note that the ratio of $\sqrt{a}$ and $\sqrt{b}$ also represents the ratio of the length (the major axis) and the width (the minor axis) of the distribution. \par If we were to apply the eigenvalue decomposition to a covariance matrix $\bf{\Sigma}$, we would interestingly enough find that \begin{equation} \bf{\Sigma} = R_\theta \bf{\Sigma}_d R_\theta^\top. \end{equation} The right of the figure below shows an example of a multivariate Gaussian distribution whose covariance matrix is a rotated version of the diagonal covariance matrix corresponding to the left side of the same figure. From this we can see that the ratio of eigenvalues of $\bf{\Sigma}$ corresponds to the ratio of the lengths of the major and minor axes. Furthermore, we can conclude that the matrix containing the eigenvectors of $\bf{\Sigma}$ is at the same time a rotation matrix, implicitly defining the rotation angle.
Sampling random variables
Most statistical packages in computing softwares provide a so-called pseudorandom number generator, which is an algorithm to randomly sample a number between 0 and 1 with equal probability. Basically, this means generating random samples from a continuos random variable U which follows a uniform distribution, $U\sim\mathcal{U}(0,1)$. More in general, sampling a random variable means generating values $x \in X$ in such a way that the probability of generating $x$ is in accordance with the proability density function $p_X(x)$, or equivalently the cumulative distribution function $P_X(x)$, associated with $X$.
Assuming that we have a pseudorandom number generator, how can we generate samples of any random variable $X$ if we know its probability distribution? We need to find a transformation $T:[0,1]\rightarrow \mathbb{R}$ such that $T(U)=X$.
For continous random variable, the following theorem can help us with this task.
Theorem
Let $X$ be a continuous random variable with CDF $P_X(x)$ which possesses an inverse $P_X^{-1}$. Let $U\sim\mathcal{U}(0,1)$ and $Y = P_X^{-1}(U)$, then $P_X(x)$ is the CDF for $Y$. In other words, $Y$ has the same distribution as $X$.
According to this theorem, the transformation $T$ we were looking for is simply given by $P_X^{-1}$. Then, to sample $x$, it is sufficient to follow these steps:
- Generate a random number $u$ from uniform distribution $U\sim\mathcal{U}(0,1)$;
- Find the inverse of the CDF of $X$, $P_X^{-1}$;
- Compute $x$ as $x = P_X^{-1}(u)$.
This method of sampling a random variable is known as the inverse transform technique.
For discrete random variables, however, this technique cannot apply directly, because when $X$ is discrete, the relationshipe between $X$ and $P_X^{-1}(U)$ .
More formally, Let $X = \{x_1, …, x_n\}$ be a discrete random variable with probability mass function $p_X(x)$, and where $x_1 \leq…\leq x_n$. Let us define each value of the CDF of $X$ as
\begin{equation} q_i = \text{Pr}[X \leq x_i] = \sum_{j=1}^{i} P_X(x_j). \end{equation}
The sampling formula for $X$ becomes:
-
Sum of a geometric series: $$ \sum_{k=0}^{n-1}ar^k = a\left(\frac{1-r^n}{1-r}\right) \quad \text{for }|r|<1$$ ↩︎
-
Derivative of the sum of a geometric series: $$ \sum_{k=1}^{\infty}kr^{k-1} = \frac{1}{(1-r)^2}\quad \text{for }|r|<1$$ ↩︎
-
Second derivative of the sum of a geometric series: $$ \sum_{k=1}^{\infty}k^2r^{k-1} = -\frac{r+1}{(r-1)^3}\quad \text{for }|r|<1$$ ↩︎
-
The binomial coefficient, which returns the total number of possible unsorted combinations of $k$ distinct elements from a set of $n$ distinct elements, defined as: $$ \begin{pmatrix} n \ k\end{pmatrix} = \frac{n!}{k!(n-k)!}, $$ where $!$ is the factorial operator (e.g. $4!=4\cdot 3\cdot 2\cdot 1$), which denotes the total number of sorted permutations. ↩︎
-
Factors of the binomial coefficient: $$ k \binom n k = k\frac{n!}{k!(n-k)!} = n\frac{(n-1)!}{(k-1)!((n-1)-(k-1))!} = n \binom {n - 1} {k - 1} $$ ↩︎
-
Use substitutions $l = n-1$, $k=m-1$, $i=l-1$ and $j=k-1$. ↩︎
-
Binomial theorem: $$ (x+y)^n = \sum_{k = 0}^n \binom{n}{k} x^k y^{n - k} $$ ↩︎
-
Ordinary generating function: $$ \sum_{n=k}^\infty \binom{n}{k}y^n = \frac{y^k}{(1-y)^{k+1}} $$ ↩︎
-
Recurrence relationship: $$ \binom{n}{k} = \binom{n-1}{k-1} + \binom{n-1}{k} $$ ↩︎
-
Variation of the factors of the binomial coefficient: $$ n \binom{n-1}{k} = (k+1)\binom{n}{k+1} $$ ↩︎
-
Sum of consecutive integers: \begin{equation*} \begin{split} \sum_{n=k}^l n &= \frac{1}{2} \left[ \sum_{n=k}^l n + {\color{Gray} \sum_{n=k}^l n} \right] \newline &= \frac{1}{2}\left[k + (k+1) + \ldots + (l-1) + l + {\color{Gray} k + (k+1) + \ldots + (l-1) + l}\right] \newline &= \frac{1}{2} \underbrace{\left[ (k + {\color{Gray} l}) + (k + {\color{Gray} l}) + \ldots + (k + {\color{Gray} l}) + (k + {\color{Gray} l}) \right]}_{l-k+1 \text{ terms}} \newline &= \frac{1}{2} (k+l)(l-k+1) \end{split} \end{equation*} ↩︎
-
Substitute $m = n-k+1$. ↩︎
-
Square pyramidal number or the sum of squared integers: $$ \sum_{n=1}^k n^2 = \frac{k(k+1)(2k+1)}{6}$$ ↩︎
-
Definition of Poisson distribution and total probability axiom: $$ \sum_{n=-\infty}^{\infty} p_X(n) = 1 $$ ↩︎
-
Integration by parts: $$ \int_a^b f(x)g’(x) \mathrm{d}x = \left[ f(x)g(x)\right]_a^b - \int_a^b f’(x)g(x) \mathrm{d}x$$ ↩︎
-
Growth property of exponentials: $$ \lim_{x\to \infty} x^ae^{-x} = 0 $$ ↩︎
-
Substitute $w=\frac{x-\mu}{\sqrt{2\sigma^2}}$. ↩︎
-
Gaussian integral: $$ \int_{-\infty}^\infty e^{-x^2} \mathrm{d}x =\sqrt{\pi} $$ ↩︎