The random processes which are generating observations under a certain probability distribution can be characterized in multiple ways as seen before. This section will discuss some common discrete and continuous probability distributions and explain in which context they are used. Their probability distributions are given together with their expected value and variance. The derivations of the cumulative distribution functions, expected values and variances are all provided. It is important to understand and follow the full derivations before blindly using the outcomes of these derivations.
In order to indicate that a random variable XX is distributed according to a certain distribution, e.g., univariate standard normal distribution, we may write X∼N(0,1)X∼N(0,1). By this notation, the letter NN indicates the normal distribution, while the numbers in parenthesis indicate the parameters controlling the distribution. In the case of a normal distribution, these are the mean and the variance. Thus, X∼N(0,1)X∼N(0,1) reads “the random variable XX is normally distributed with zero mean and unitary variance”.
Families of discrete random variables
Screencast video [⯈]
The Bernoulli(p) distribution
The Bernoulli distribution is a discrete probability distribution that models an experiment where only 2 outcomes are possible. The probability distribution of flipping a coin is an example of a Bernoulli distribution. These outcomes are mapped to 0 and 1, whose probabilities are 1−p1−p and pp respectively. The distribution is fully characterized by the parameter pp, which is the probability of success (Pr[X=1]Pr[X=1]).
Probability mass function
The probability mass function of the discrete Bernoulli(p) distribution is given as pX(x)={1−p,for x=0p,for x=10,otherwisepX(x)=⎧⎨⎩1−p,for x=0p,for x=10,otherwise(1) where pp is in the range 0<p<10<p<1.Cumulative distribution function
The cumulative distribution function of the discrete Bernoulli(p) distribution can be determined as PX(x)={0,for x<01−p,for x=01.for x>0PX(x)=⎧⎨⎩0,for x<01−p,for x=01.for x>0Expected value
The expected value of the discrete Bernoulli(p) distribution can be determined as E[X]=p.E[X]=p.(3)Variance
The variance of the discrete Bernoulli(p) distribution can be determined as Var[X]=p(1−p)Var[X]=p(1−p)(5)The Geometric(p) distribution
The Geometric distribution is a discrete probability distribution that models an experiment with probability of success pp. The Geometric distribution gives the probability that the first success is observed at the xthxth independent trial. The distribution is fully characterized by the parameter pp, which is the probability of success.
Probability mass function
The probability mass function of the discrete Geometric(p) distribution is given as pX(x)={p(1−p)x−1,for x=1,2,…0,otherwisepX(x)={p(1−p)x−1,for x=1,2,…0,otherwise(7) where pp is in the range 0<p<10<p<1.Cumulative distribution function
The cumulative distribution function of the discrete Geometric(p) distribution can be determined as PX(x)={0,for x<11−(1−p)x.for x≥1PX(x)={0,for x<11−(1−p)x.for x≥1(8)Expected value
The expected value of the discrete Geometric(p) distribution can be determined as E[X]=1p.E[X]=1p.(10)Variance
The variance of the discrete Geometric(p) distribution can be determined as Var[X]=1−pp2.Var[X]=1−pp2.(12)Binomial(n,p) distribution
The Binomial distribution is a discrete probability distribution that models an experiment with probability of success pp. The Binomial distribution gives the probability of observing xx successes in nn independent trials. The distribution is fully characterized by the parameters nn and pp. The parameter nn denotes the number of independent trials and the parameter pp denotes the probability of observing a success per trial.
Probability mass function
The probability mass function of the discrete Binomial(n,p) distribution is given as pX(x)4=(nx)px(1−p)n−x,pX(x)4=(nx)px(1−p)n−x,(13) where 0<p<10<p<1 and nn is an integer such that n≥1n≥1.Cumulative distribution function
The cumulative distribution function of the discrete Binomial(n,p) distribution can be determined as PX(x)=x∑m=0(nm)pm(1−p)n−m.PX(x)=x∑m=0(nm)pm(1−p)n−m.(14)Expected value
The expected value of the discrete Binomial(n,p) distribution can be determined as E[X]=npE[X]=np(16)Variance
The variance of the discrete Binomial(n,p) distribution can be determined as Var[X]=np(1−p).Var[X]=np(1−p).(18)The Pascal(k,p) distribution
The Pascal distribution is a probability distribution that is also known as the negative Binomial distribution. The Pascal distribution gives the probability of observing the kthkth success at the xthxth trial. The distribution is fully characterized by the parameters kk and pp. The parameter kk denotes the desired number of successes and the parameter pp denotes the chance of success in an individual trial.
Probability mass function
The probability mass function of the discrete Pascal(k,p) distribution is given as pX(x)=(x−1k−1)pk(1−p)x−k,pX(x)=(x−1k−1)pk(1−p)x−k,(20) where 0<p<10<p<1 and kk is an integer such that k≥1k≥1.Cumulative distribution function
The cumulative distribution function of the discrete Pascal(k,p) distribution can be determined as PX(x)=x∑n=−∞(n−1k−1)pk(1−p)n−k.PX(x)=x∑n=−∞(n−1k−1)pk(1−p)n−k.(21)Expected value
The expected value of the discrete Pascal(k,p) distribution can be determined as E[X]=kp.E[X]=kp.(23)Variance
The variance of the discrete Pascal(k,p) distribution can be determined as Var[X]=k(1−p)p2.Var[X]=k(1−p)p2.(25)The discrete Uniform(k,l) distribution
The discrete uniform distribution is a discrete probability distribution that models an experiment where the outcomes are mapped only to discrete points on the interval from kk up to and including ll. The distribution is fully characterized by the parameters kk and ll, which are the discrete lower and upper bound of the interval respectively.
Probability mass function
The probability mass function of the discrete Uniform(k,l) distribution is given as pX(x)={1l−k+1,for x=k,k+1,k+2,…,l0,otherwisepX(x)={1l−k+1,for x=k,k+1,k+2,…,l0,otherwise(26) where kk and ll are integers such that k<lk<l.Cumulative distribution function
The cumulative distribution function of the discrete Uniform(k,l) distribution can be determined as PX(x)={0for x<kx−k+1l−k+1for k≤x<l1for x≥lPX(x)=⎧⎪⎨⎪⎩0for x<kx−k+1l−k+1for k≤x<l1for x≥l(27)Expected value
The expected value of the discrete Uniform(k,l) distribution can be determined as E[X]=k+l2E[X]=k+l2(29)Variance
The variance of the discrete Uniform(k,l) distribution can be determined as Var[X]=(l−k+1)2−112Var[X]=(l−k+1)2−112The Poisson(αα) distribution
The Poisson distribution is a discrete probability distribution that models the number of events occurring within a certain interval of time, in which the events occur independently from each other at a constant rate. The exact moments at which the events occur are unknown, however, the average number of events occurring within the interval is known and is denoted by the parameter αα. An example of a process, where the number of events within an interval can be described as a Poisson distribution, is the number of phone calls over a network. For optimal allocation of resources, a service provider needs to know the chance that the allocated capacity is insufficient in order to limit the number of dropped calls. The inhabitants can be described as independent entities (i.e. everyone makes a phone call whenever it suits him or her), whilst they usually have their own habit of making phone calls.
Probability mass function
The probability mass function of the discrete Poisson(αα) distribution is given as pX(x)={αxe−αx!,for x=0,1,2,…0,otherwisepX(x)={αxe−αx!,for x=0,1,2,…0,otherwise(31) where αα is in the range α>0α>0.Cumulative distribution function
The cumulative distribution function of the discrete Poisson(αα) distribution can be determined as PX(x)={0,for x<0e−α∑xn=0αnn!.for x≥0PX(x)={0,for x<0e−α∑xn=0αnn!.for x≥0(32)Expected value
The expected value of the discrete Poisson(αα) distribution can be determined as E[X]=αE[X]=α(34)Variance
The variance of the discrete Poisson(αα) distribution can be determined as Var[X]=αVar[X]=α(36)Families of continuous random variables
Screencast video [⯈]
The Exponential(λλ) distribution
The exponential distribution is a continuous probability distribution that follow an exponential curve. The curve is fully characterized by the rate parameter λλ.
Probability density function
The probability density function of the continuous Exponential(λλ) distribution is given as pX(x)={λe−λx,for x≥00,for x<0pX(x)={λe−λx,for x≥00,for x<0(37) where λ>0λ>0.Cumulative distribution function
The cumulative distribution function of the continuous Exponential(λλ) distribution can be determined as PX(x)={1−e−λxfor x≥00for x<0PX(x)={1−e−λxfor x≥00for x<0(38)Expected value
The expected value of the continuous Exponential(λλ) distribution can be determined as E[X]=1λ.E[X]=1λ.(40)Variance
The variance of the continuous Exponential(λλ) distribution can be determined as Var[X]=1λ2Var[X]=1λ2(42)The continuous Uniform(a,b) distribution
The continuous Uniform distribution is a continuous probability distribution that models an experiment where the outcomes are mapped only to the interval from aa up to and including bb, with the same probability all over this range. The distribution is fully characterized by the parameters aa and bb, which are the continuous lower and upper bound of the interval respectively.
Probability density function
The probability density function of the continuous Uniform(a,b) distribution is given as pX(x)={1b−a,for a≤x≤b0,otherwise,pX(x)={1b−a,for a≤x≤b0,otherwise,(44) where b>ab>a.Cumulative distribution function
The cumulative distribution function of the continuous Uniform(a,b) distribution can be determined as PX(x)={0for x≤ax−ab−afor a<x<b1for x≥bPX(x)=⎧⎪⎨⎪⎩0for x≤ax−ab−afor a<x<b1for x≥b(45)Expected value
The expected value of the continuous Uniform(a,b) distribution can be determined as E[X]=a+b2.E[X]=a+b2.(46)Variance
The variance of the continuous Uniform(a,b) distribution can be determined as Var[X]=112(b−a)2.Var[X]=112(b−a)2.(48)The Normal or Gaussian N(μ,σ2)N(μ,σ2) distribution
The Normal or Gaussian distribution is probably the most commonly used continuous probability distribution. The distribution is bell-shaped and symmetric. The function is characterized by its mean μμ and its variance σ2σ2.
The Standard normal N(0,1)N(0,1) distribution
The Standard normal distribution is a specific case of the Normal or Gaussian distribution, where the mean equals μ=0μ=0 and the variance equals σ2=1σ2=1. This function can be regarded as the normalized Gaussian distribution. Any random variable Y∼N(μY,σ2Y)Y∼N(μY,σ2Y) can be transformed to a random variable XX under the Standard normal distribution by subtracting its mean and dividing by the standard deviation as X=Y−μYσYX=Y−μYσY
The QQ-function
The QQ-function is a commonly used function in statistics, which calculates the probability of a Standard normal distributed random variable XX exceeding a certain threshold xx. It is also known as the right-tail probability of the Gaussian distribution, since it is calculated by integrating the right side of the Gaussian PDF from the threshold xx up to ∞∞. The QQ-function is defined as Q(x)=Pr[X>x]=1√2π∫∞xe−u22du.Q(x)=Pr[X>x]=1√2π∫∞xe−u22du.(50) The function can be used for all Gaussian distributed random variables, however, the random variable and the corresponding threshold should be normalized first. Additionally, through symmetry follows that Q(x)=1−Q(−x)Q(x)=1−Q(−x), where Q(−x)Q(−x) is equal to the cumulative density function PX(x)PX(x).
Probability density function
The probability density function of the continuous Gaussian NN(μ,σ2μ,σ2) distribution is given as pX(x)=1√2πσ2e−(x−μ)22σ2,pX(x)=1√2πσ2e−(x−μ)22σ2,(51) where σ>0σ>0.Cumulative distribution function
The cumulative distribution function of the continuous Gaussian NN(μ,σ2μ,σ2) distribution can be determined as PX(x)=Q(−x−μσ)PX(x)=Q(−x−μσ)(52)Expected value
The expected value of the continuous Gaussian NN(μ,σ2μ,σ2) distribution can be determined as E[X]=μ.E[X]=μ.(54)Variance
The variance of the continuous Gaussian NN(μ,σ2μ,σ2) distribution can be determined as Var[X]=σ2.Var[X]=σ2.(56)As is implicated by the central limit theorem (explained in the section Funciton and pairs of random variables), the Gaussian distribution is extremely important. The Gaussian distribution is often used to model measurements in practice and, thanks to the CLT, its use can often be extended to other distributions. A Gaussian distribution is also often used to model the thermal noise of a band-limited system. This section will generalize the definition of the Gaussian distribution given in the previous reader and extend it to the multivariate case.
Univariate distribution
In the case of a single random variable XX that is generated according to a Gaussian distribution, defined by its mean μμ and variance σ2σ2 where the subscript ⋅X⋅X is now omitted for simplification, the probability density function is defined as pX(x)=1√2πσ2e−(x−μ)22σ2.pX(x)=1√2πσ2e−(x−μ)22σ2.(58) The left side of the figure below shows an example of an univariate Gaussian distribution.
Multivariate distribution
The definition of the univariate Gaussian distribution can be extended to a multivariate distribution. To better understand the multivariate gaussian distribution, it might be useful to first read the sections Funciton and pairs of random variables and Random vectors. To define the Gaussian distribution the position and its spread are required. These quantities are represented by the mean vector μμ and the covariance matrix ΣΣ. Whereas the covariance matrix is defined as ΓΓ here, literature has adopted the ΣΣ notation when discussing multivariate Gaussian distributions, as ΣΣ is the Greek capital letter of σσ.
To indicate that a kk-dimensional random vector XX is Gaussian distributed, we can write X∼Nk(μ,Σ)X∼Nk(μ,Σ). The probability density function of such a multivariate Gaussian distribution is defined as pX(x)=1√(2π)k|Σ|exp{−12(x−μ)⊤Σ−1(x−μ)}, where |Σ| is the determinant of the covariance matrix. Please note the similarities between the univariate Gaussian distribution and the multivariate distribution. The inverse covariance matrix Σ−1 is often also called the precision matrix and is denoted by Λ, because a low variance (i.e. low spread) relates to high precision and vice versa.
The covariance matrix of a multivariate Gaussian distribution
The probability density function of a Gaussian distribution is fully determined by its mean μ and its covariance matrix Σ. In order to give some intuition on how the mean and covariance matrix structure influence the final distribution, we jump forward to Fig. 2 in the next section where three multivariate distributions have been plotted. The covariance matrices that were used to plot these distributions in the figure are from left to right: Σ1=[1−0.5−0.51]Σ2=[1001]Σ3=[10.50.51] Please note how the off-diagonal entries, referring to Cov[X1,X2] and Cov[X2,X1] influence the shape of the distribution.
In order to understand how the covariance matrix is related to the tilt and the shape of the distribution, we need to first introduce the so-called rotation matrix and the eigenvalue decomposition. The rotation matrix Rθ rotates a coordinate counter-clockwise over an angle θ with respect to the origin. This rotation matrix is defined as Rθ=[cos(θ)−sin(θ)sin(θ)cos(θ)] and a rotation of θ from the coordinates (x,y) to the coordinates (x′,y′) can be represented by [x′y′]=Rθ[xy]=[xcos(θ)−ysin(θ)xsin(θ)+ycos(θ)]. One of the properties of a rotation matrix is that it is orthogonal. This means that RθR⊤θ=I, where I is the identity matrix. Using the fact that R−1θ=R−θ=R⊤θ from its definition, the orthogonality property makes complete sense, because rotating a coordinate with the angle −θ and θ respectively does not change anything.
Besides the rotation matrices, we need to introduce the eigenvalue decomposition in order to better understand the covariance matrix structure. The eigenvalue decomposition states that a square invertible symmetric matrix A can be written as A=QΛQ−1, where the orthogonal matrix Q contains the eigenvectors of A and Λ is a diagonal matrix containing the eigenvalues of A.
Now the general representation of the rotation matrix has been defined as well as the eigenvalue decomposition, we can show that any covariance matrix can be written as the rotations of a diagonal covariance matrix. This point is very important to understand. To start off, a diagonal covariance matrix can be represented as Σd=[a00b]. The entries a and b correspond to the individual variances of X1 and X2 according to the definitions and are at the same time the eigenvalues of Σd. An example of a Gaussian distribution that corresponds to a diagonal covariance matrix where a=25 and b=4 is shown on the left in the figure below. Please note that the ratio of √a and √b also represents the ratio of the length (the major axis) and the width (the minor axis) of the distribution. \par If we were to apply the eigenvalue decomposition to a covariance matrix Σ, we would interestingly enough find that Σ=RθΣdR⊤θ. The right of the figure below shows an example of a multivariate Gaussian distribution whose covariance matrix is a rotated version of the diagonal covariance matrix corresponding to the left side of the same figure. From this we can see that the ratio of eigenvalues of Σ corresponds to the ratio of the lengths of the major and minor axes. Furthermore, we can conclude that the matrix containing the eigenvectors of Σ is at the same time a rotation matrix, implicitly defining the rotation angle.
Sampling random variables
Most statistical packages in computing softwares provide a so-called pseudorandom number generator, which is an algorithm to randomly sample a number between 0 and 1 with equal probability. Basically, this means generating random samples from a continuos random variable U which follows a uniform distribution, U∼U(0,1). More in general, sampling a random variable means generating values x∈X in such a way that the probability of generating x is in accordance with the proability density function pX(x), or equivalently the cumulative distribution function PX(x), associated with X.
Assuming that we have a pseudorandom number generator, how can we generate samples of any random variable X if we know its probability distribution? We need to find a transformation T:[0,1]→R such that T(U)=X.
For continous random variable, the following theorem can help us with this task.
Theorem
Let X be a continuous random variable with CDF PX(x) which possesses an inverse P−1X. Let U∼U(0,1) and Y=P−1X(U), then PX(x) is the CDF for Y. In other words, Y has the same distribution as X.
According to this theorem, the transformation T we were looking for is simply given by P−1X. Then, to sample x, it is sufficient to follow these steps:
- Generate a random number u from uniform distribution U∼U(0,1);
- Find the inverse of the CDF of X, P−1X;
- Compute x as x=P−1X(u).
This method of sampling a random variable is known as the inverse transform technique.
For discrete random variables, however, this technique cannot apply directly, because when X is discrete, the relationshipe between X and P−1X(U) .
More formally, Let X={x1,…,xn} be a discrete random variable with probability mass function pX(x), and where x1≤…≤xn. Let us define each value of the CDF of X as
qi=Pr[X≤xi]=i∑j=1PX(xj).
The sampling formula for X becomes:
-
Sum of a geometric series: n−1∑k=0ark=a(1−rn1−r)for |r|<1 ↩︎
-
Derivative of the sum of a geometric series: ∞∑k=1krk−1=1(1−r)2for |r|<1 ↩︎
-
Second derivative of the sum of a geometric series: ∞∑k=1k2rk−1=−r+1(r−1)3for |r|<1 ↩︎
-
The binomial coefficient, which returns the total number of possible unsorted combinations of k distinct elements from a set of n distinct elements, defined as: (n k)=n!k!(n−k)!, where ! is the factorial operator (e.g. 4!=4⋅3⋅2⋅1), which denotes the total number of sorted permutations. ↩︎
-
Factors of the binomial coefficient: k(nk)=kn!k!(n−k)!=n(n−1)!(k−1)!((n−1)−(k−1))!=n(n−1k−1) ↩︎
-
Use substitutions l=n−1, k=m−1, i=l−1 and j=k−1. ↩︎
-
Binomial theorem: (x+y)n=n∑k=0(nk)xkyn−k ↩︎
-
Ordinary generating function: ∞∑n=k(nk)yn=yk(1−y)k+1 ↩︎
-
Recurrence relationship: (nk)=(n−1k−1)+(n−1k) ↩︎
-
Variation of the factors of the binomial coefficient: n(n−1k)=(k+1)(nk+1) ↩︎
-
Sum of consecutive integers: l∑n=kn=12[l∑n=kn+∑ln=kn]=12[k+(k+1)+…+(l−1)+l+k+(k+1)+…+(l−1)+l]=12[(k+l)+(k+l)+…+(k+l)+(k+l)]⏟l−k+1 terms=12(k+l)(l−k+1) ↩︎
-
Substitute m=n−k+1. ↩︎
-
Square pyramidal number or the sum of squared integers: k∑n=1n2=k(k+1)(2k+1)6 ↩︎
-
Definition of Poisson distribution and total probability axiom: ∞∑n=−∞pX(n)=1 ↩︎
-
Integration by parts: ∫baf(x)g′(x)dx=[f(x)g(x)]ba−∫baf′(x)g(x)dx ↩︎
-
Growth property of exponentials: limx→∞xae−x=0 ↩︎
-
Substitute w=x−μ√2σ2. ↩︎
-
Gaussian integral: ∫∞−∞e−x2dx=√π ↩︎