Families of random variables

The random processes which are generating observations under a certain probability distribution can be characterized in multiple ways as seen before. This section will discuss some common discrete and continuous probability distributions and explain in which context they are used. Their probability distributions are given together with their expected value and variance. The derivations of the cumulative distribution functions, expected values and variances are all provided. It is important to understand and follow the full derivations before blindly using the outcomes of these derivations.

In order to indicate that a random variable XX is distributed according to a certain distribution, e.g., univariate standard normal distribution, we may write XN(0,1)XN(0,1). By this notation, the letter NN indicates the normal distribution, while the numbers in parenthesis indicate the parameters controlling the distribution. In the case of a normal distribution, these are the mean and the variance. Thus, XN(0,1)XN(0,1) reads “the random variable XX is normally distributed with zero mean and unitary variance”.



Families of discrete random variables

Screencast video [⯈]

The Bernoulli(p) distribution

The Bernoulli distribution is a discrete probability distribution that models an experiment where only 2 outcomes are possible. The probability distribution of flipping a coin is an example of a Bernoulli distribution. These outcomes are mapped to 0 and 1, whose probabilities are 1p1p and pp respectively. The distribution is fully characterized by the parameter pp, which is the probability of success (Pr[X=1]Pr[X=1]).

Example plot of the (a) probability mass function and (b) cumulative density function of the Bernoulli(p) distribution.

Probability mass function

The probability mass function of the discrete Bernoulli(p) distribution is given as pX(x)={1p,for x=0p,for x=10,otherwisepX(x)=1p,for x=0p,for x=10,otherwise(1) where pp is in the range 0<p<10<p<1.

Cumulative distribution function

The cumulative distribution function of the discrete Bernoulli(p) distribution can be determined as PX(x)={0,for x<01p,for x=01.for x>0PX(x)=0,for x<01p,for x=01.for x>0
PX(x)=xn=pX(n)={0,for x<00n=pX(n),for x=01,for x>0={0,for x<01p,for x=01.for x>0PX(x)=xn=pX(n)=⎪ ⎪ ⎪ ⎪⎪ ⎪ ⎪ ⎪0,for x<00n=pX(n),for x=01,for x>0=0,for x<01p,for x=01.for x>0(2)

Expected value

The expected value of the discrete Bernoulli(p) distribution can be determined as E[X]=p.E[X]=p.(3)
E[X]=n=npX(n),=1n=0npX(n),=0(1p)+1p,=p.E[X]=n=npX(n),=1n=0npX(n),=0(1p)+1p,=p.(4)

Variance

The variance of the discrete Bernoulli(p) distribution can be determined as Var[X]=p(1p)Var[X]=p(1p)(5)
Var[X]=E[(XE[X])2],=E[X2]2E[X]E[X]+E[X]2,=E[X2]E[X]2,=n=n2pX(n)p2,=1n=0n2pX(n)p2,=02(1p)+12pp2.=p(1p)Var[X]=E[(XE[X])2],=E[X2]2E[X]E[X]+E[X]2,=E[X2]E[X]2,=n=n2pX(n)p2,=1n=0n2pX(n)p2,=02(1p)+12pp2.=p(1p)(6)



The Geometric(p) distribution

The Geometric distribution is a discrete probability distribution that models an experiment with probability of success pp. The Geometric distribution gives the probability that the first success is observed at the xthxth independent trial. The distribution is fully characterized by the parameter pp, which is the probability of success.

Example plot of the (a) probability mass function and (b) cumulative density function of the Geometric(p) distribution.

Probability mass function

The probability mass function of the discrete Geometric(p) distribution is given as pX(x)={p(1p)x1,for x=1,2,0,otherwisepX(x)={p(1p)x1,for x=1,2,0,otherwise(7) where pp is in the range 0<p<10<p<1.

Cumulative distribution function

The cumulative distribution function of the discrete Geometric(p) distribution can be determined as PX(x)={0,for x<11(1p)x.for x1PX(x)={0,for x<11(1p)x.for x1(8)
PX(x)=xn=pX(n),={0,for x<1xn=1p(1p)n1,for x1={0,for x<1px1m=0(1p)m,for x11={0,for x<1p1(1p)x1(1p),for x1={0,for x<11(1p)x.for x1PX(x)=xn=pX(n),={0,for x<1xn=1p(1p)n1,for x1={0,for x<1px1m=0(1p)m,for x11={0,for x<1p1(1p)x1(1p),for x1={0,for x<11(1p)x.for x1(9)

1

Expected value

The expected value of the discrete Geometric(p) distribution can be determined as E[X]=1p.E[X]=1p.(10)
E[X]=n=npX(n),=n=1np(1p)n1,=pn=1n(1p)n1,2=p(1(1p))2=1p.E[X]=n=npX(n),=n=1np(1p)n1,=pn=1n(1p)n1,2=p(1(1p))2=1p.(11)

2

Variance

The variance of the discrete Geometric(p) distribution can be determined as Var[X]=1pp2.Var[X]=1pp2.(12)
Var[X]=E[(XE[X])2],=E[X2]2E[X]E[X]+E[X]2,=E[X2]E[X]2,=n=n2p(1p)n11p2,=pn=1n2(1p)n11p2,3=p((1p)+1)((1p)1)31p2,=2pp2p3pp3=pp2p3=1pp2.Var[X]=E[(XE[X])2],=E[X2]2E[X]E[X]+E[X]2,=E[X2]E[X]2,=n=n2p(1p)n11p2,=pn=1n2(1p)n11p2,3=p((1p)+1)((1p)1)31p2,=2pp2p3pp3=pp2p3=1pp2.

3



Binomial(n,p) distribution

The Binomial distribution is a discrete probability distribution that models an experiment with probability of success pp. The Binomial distribution gives the probability of observing xx successes in nn independent trials. The distribution is fully characterized by the parameters nn and pp. The parameter nn denotes the number of independent trials and the parameter pp denotes the probability of observing a success per trial.

Example plot of the (a) probability mass function and (b) cumulative density function of the Binomial(n,p) distribution.

Probability mass function

The probability mass function of the discrete Binomial(n,p) distribution is given as pX(x)4=(nx)px(1p)nx,pX(x)4=(nx)px(1p)nx,(13) where 0<p<10<p<1 and nn is an integer such that n1n1.

4

Cumulative distribution function

The cumulative distribution function of the discrete Binomial(n,p) distribution can be determined as PX(x)=xm=0(nm)pm(1p)nm.PX(x)=xm=0(nm)pm(1p)nm.(14)
PX(x)=xm=0pX(m),=xm=0(nm)pm(1p)nm.PX(x)=xm=0pX(m),=xm=0(nm)pm(1p)nm.(15)

Expected value

The expected value of the discrete Binomial(n,p) distribution can be determined as E[X]=npE[X]=np(16)
E[X]=m=mpX(m),=nm=0m(nm)pm(1p)nm,=nm=1m(nm)pm(1p)nm,5=nm=1n(n1m1)pm(1p)nm,=npnm=1(n1m1)pm1(1p)(n1)(m1),6=nplk=0(lk)pk(1p)lk,7=np(p+(1p))l=npE[X]=m=mpX(m),=nm=0m(nm)pm(1p)nm,=nm=1m(nm)pm(1p)nm,5=nm=1n(n1m1)pm(1p)nm,=npnm=1(n1m1)pm1(1p)(n1)(m1),6=nplk=0(lk)pk(1p)lk,7=np(p+(1p))l=np(17)

567

Variance

The variance of the discrete Binomial(n,p) distribution can be determined as Var[X]=np(1p).Var[X]=np(1p).(18)
Var[X]=E[(XE[X])2],=E[X2]E[X]2,=(np)2+m=m2pX(m),=(np)2+nm=0m2(nm)pm(1p)nm,5=(np)2+nm=1nm(n1m1)pm(1p)nm,=(np)2+npnm=1m(n1m1)pm1(1p)(n1)(m1),6=(np)2+nplk=0(k+1)(lk)pk(1p)lk,=(np)2+nplk=1k(lk)pk(1p)lk+nplk=0(lk)pk(1p)lk,5=(np)2+nplk=1l(l1k1)pk(1p)lk+nplk=0(lk)pk(1p)lk,=(np)2+nlp2lk=1(l1k1)pk1(1p)(l1)(k1)+nplk=0(lk)pk(1p)lk,6=(np)2+nlp2ij=0(ij)pj(1p)ij+nplk=0(lk)pk(1p)lk,7=(np)2+nlp2(p+(1p))i+np(p+(1p))l,6=n2p2+n(n1)p2+np=np2+np=np(1p).Var[X]=E[(XE[X])2],=E[X2]E[X]2,=(np)2+m=m2pX(m),=(np)2+nm=0m2(nm)pm(1p)nm,5=(np)2+nm=1nm(n1m1)pm(1p)nm,=(np)2+npnm=1m(n1m1)pm1(1p)(n1)(m1),6=(np)2+nplk=0(k+1)(lk)pk(1p)lk,=(np)2+nplk=1k(lk)pk(1p)lk+nplk=0(lk)pk(1p)lk,5=(np)2+nplk=1l(l1k1)pk(1p)lk+nplk=0(lk)pk(1p)lk,=(np)2+nlp2lk=1(l1k1)pk1(1p)(l1)(k1)+nplk=0(lk)pk(1p)lk,6=(np)2+nlp2ij=0(ij)pj(1p)ij+nplk=0(lk)pk(1p)lk,7=(np)2+nlp2(p+(1p))i+np(p+(1p))l,6=n2p2+n(n1)p2+np=np2+np=np(1p).(19)



The Pascal(k,p) distribution

The Pascal distribution is a probability distribution that is also known as the negative Binomial distribution. The Pascal distribution gives the probability of observing the kthkth success at the xthxth trial. The distribution is fully characterized by the parameters kk and pp. The parameter kk denotes the desired number of successes and the parameter pp denotes the chance of success in an individual trial.

Example plot of the (a) probability mass function and (b) cumulative density function of the Pascal(k,p) distribution.

Probability mass function

The probability mass function of the discrete Pascal(k,p) distribution is given as pX(x)=(x1k1)pk(1p)xk,pX(x)=(x1k1)pk(1p)xk,(20) where 0<p<10<p<1 and kk is an integer such that k1k1.

Cumulative distribution function

The cumulative distribution function of the discrete Pascal(k,p) distribution can be determined as PX(x)=xn=(n1k1)pk(1p)nk.PX(x)=xn=(n1k1)pk(1p)nk.(21)
PX(x)=xn=pX(n),=xn=(n1k1)pk(1p)nk.PX(x)=xn=pX(n),=xn=(n1k1)pk(1p)nk.(22)

Expected value

The expected value of the discrete Pascal(k,p) distribution can be determined as E[X]=kp.E[X]=kp.(23)
E[X]=n=npX(n),=n=kn(n1k1)pk(1p)nk,5=n=kk(nk)pk(1p)nk,=kpk(1p)kn=k(nk)(1p)n,8=kpk(1p)k(1p)kpk+1,=kp.E[X]=n=npX(n),=n=kn(n1k1)pk(1p)nk,5=n=kk(nk)pk(1p)nk,=kpk(1p)kn=k(nk)(1p)n,8=kpk(1p)k(1p)kpk+1,=kp.(24)

8

Variance

The variance of the discrete Pascal(k,p) distribution can be determined as Var[X]=k(1p)p2.Var[X]=k(1p)p2.(25)
Var[X]=E[X2]E[X]2,=k2p2+n=n2pX(n),=k2p2+n=kn2(n1k1)pk(1p)nk,5=k2p2+n=knk(nk)pk(1p)nk,=k2p2+kpk(1p)kn=kn(nk)(1p)n,9=k2p2+kpk(1p)k(n=kn(n1k1)(1p)n+n=kn(n1k)(1p)n),5 10=k2p2+kpk(1p)k(kn=k(nk)(1p)n+(k+1)n=k+1(nk+1)(1p)n),8=k2p2+kpk(1p)k(k(1p)kpk+1+(k+1)(1p)k+1pk+2),=k2p2+kpk(1p)k((1p)kpk+1(k+(k+1)1pp)),=k2p2+kp(k+1pp),=k2p2+k2+kkpp2=k(1p)p2.Var[X]=E[X2]E[X]2,=k2p2+n=n2pX(n),=k2p2+n=kn2(n1k1)pk(1p)nk,5=k2p2+n=knk(nk)pk(1p)nk,=k2p2+kpk(1p)kn=kn(nk)(1p)n,9=k2p2+kpk(1p)k(n=kn(n1k1)(1p)n+n=kn(n1k)(1p)n),5 10=k2p2+kpk(1p)k(kn=k(nk)(1p)n+(k+1)n=k+1(nk+1)(1p)n),8=k2p2+kpk(1p)k(k(1p)kpk+1+(k+1)(1p)k+1pk+2),=k2p2+kpk(1p)k((1p)kpk+1(k+(k+1)1pp)),=k2p2+kp(k+1pp),=k2p2+k2+kkpp2=k(1p)p2.

910



The discrete Uniform(k,l) distribution

The discrete uniform distribution is a discrete probability distribution that models an experiment where the outcomes are mapped only to discrete points on the interval from kk up to and including ll. The distribution is fully characterized by the parameters kk and ll, which are the discrete lower and upper bound of the interval respectively.

Example plot of the (a) probability mass function and (b) cumulative density function of the discrete Uniform(k,l) distribution.

Probability mass function

The probability mass function of the discrete Uniform(k,l) distribution is given as pX(x)={1lk+1,for x=k,k+1,k+2,,l0,otherwisepX(x)={1lk+1,for x=k,k+1,k+2,,l0,otherwise(26) where kk and ll are integers such that k<lk<l.

Cumulative distribution function

The cumulative distribution function of the discrete Uniform(k,l) distribution can be determined as PX(x)={0for x<kxk+1lk+1for kx<l1for xlPX(x)=0for x<kxk+1lk+1for kx<l1for xl(27)
PX(x)=xm=pX(m),={0,for x<kxm=k1lk+1,for kx<l1,for xl={0,for x<kxk+1lk+1,for kx<l1.for xlPX(x)=xm=pX(m),=0,for x<kxm=k1lk+1,for kx<l1,for xl=0,for x<kxk+1lk+1,for kx<l1.for xl(28)

Expected value

The expected value of the discrete Uniform(k,l) distribution can be determined as E[X]=k+l2E[X]=k+l2(29)
E[X]=n=npX(n),=ln=kn1lk+1,=1lk+1ln=kn,=1lk+1ln=kn,11=1lk+112(k+l)(lk+1),=k+l2.E[X]=n=npX(n),=ln=kn1lk+1,=1lk+1ln=kn,=1lk+1ln=kn,11=1lk+112(k+l)(lk+1),=k+l2.(30)

11

Variance

The variance of the discrete Uniform(k,l) distribution can be determined as Var[X]=(lk+1)2112Var[X]=(lk+1)2112
The variance of the discrete Uniform(k,l) distribution can be determined as Var[X]=E[X2]E[X]2,=(k+l)24+ln=kn21lk+1,=(k+l)24+1lk+1ln=kn2,12=(k+l)24+1lk+1lk+1m=1(m+k1)2,=(k+l)24+1lk+1lk+1m=1(m2+2mk+k2+12m2k),=(k+l)24+1lk+1(lk+1m=1m2+(2k2)lk+1m=1m+(k2+12k)lk+1m=11),11 13=(k+l)24+1lk+1((lk+1)(lk+2)(2l2k+3)6+(2k2)12(lk+1)(lk+2)+(k2+12k)(lk+1)),=3k23l26kl12+4l24kl+6l4kl+4k26k+8l8k+1212+12kl12k2+24k12l+12k2412+12k2+1224k12,=k2+l22kl2k+2l12,=(lk+1)2112Var[X]=E[X2]E[X]2,=(k+l)24+ln=kn21lk+1,=(k+l)24+1lk+1ln=kn2,12=(k+l)24+1lk+1lk+1m=1(m+k1)2,=(k+l)24+1lk+1lk+1m=1(m2+2mk+k2+12m2k),=(k+l)24+1lk+1(lk+1m=1m2+(2k2)lk+1m=1m+(k2+12k)lk+1m=11),11 13=(k+l)24+1lk+1((lk+1)(lk+2)(2l2k+3)6+(2k2)12(lk+1)(lk+2)+(k2+12k)(lk+1)),=3k23l26kl12+4l24kl+6l4kl+4k26k+8l8k+1212+12kl12k2+24k12l+12k2412+12k2+1224k12,=k2+l22kl2k+2l12,=(lk+1)2112

1213



The Poisson(αα) distribution

The Poisson distribution is a discrete probability distribution that models the number of events occurring within a certain interval of time, in which the events occur independently from each other at a constant rate. The exact moments at which the events occur are unknown, however, the average number of events occurring within the interval is known and is denoted by the parameter αα. An example of a process, where the number of events within an interval can be described as a Poisson distribution, is the number of phone calls over a network. For optimal allocation of resources, a service provider needs to know the chance that the allocated capacity is insufficient in order to limit the number of dropped calls. The inhabitants can be described as independent entities (i.e. everyone makes a phone call whenever it suits him or her), whilst they usually have their own habit of making phone calls.

Example plot of the (a) probability mass function and (b) cumulative density function of the Poisson(αα) distribution.

Probability mass function

The probability mass function of the discrete Poisson(αα) distribution is given as pX(x)={αxeαx!,for x=0,1,2,0,otherwisepX(x)={αxeαx!,for x=0,1,2,0,otherwise(31) where αα is in the range α>0α>0.

Cumulative distribution function

The cumulative distribution function of the discrete Poisson(αα) distribution can be determined as PX(x)={0,for x<0eαxn=0αnn!.for x0PX(x)={0,for x<0eαxn=0αnn!.for x0(32)
PX(x)=xn=pX(n),={0,for x<0xn=0αneαn!,for x0={0,for x<0eαxn=0αnn!.for x0PX(x)=xn=pX(n),={0,for x<0xn=0αneαn!,for x0={0,for x<0eαxn=0αnn!.for x0(33)

Expected value

The expected value of the discrete Poisson(αα) distribution can be determined as E[X]=αE[X]=α(34)
E[X]=n=npX(n),=n=0nαneαn!,=n=1nαneαn!,=αn=1αn1eα(n1)!,6=αl=0αleαl!,14=αE[X]=n=npX(n),=n=0nαneαn!,=n=1nαneαn!,=αn=1αn1eα(n1)!,6=αl=0αleαl!,14=α(35)

14

Variance

The variance of the discrete Poisson(αα) distribution can be determined as Var[X]=αVar[X]=α(36)
The variance of the discrete Poisson(αα) distribution can be determined as Var[X]=E[X2]E[X]2,=α2+n=n2pX(n),=α2+n=0n2αneαn!,=α2+n=1n2αneαn!,=α2+αn=1nαn1eα(n1)!,6=α2+αl=0(l+1)αleαl!,=α2+αl=0αleαl!+αl=0lαleαl!,=α2+αl=0αleαl!+αl=1lαleαl!,=α2+αl=0αleαl!+α2l=1αl1eα(l1)!,6=α2+αl=0αleαl!+α2i=0αieαi!,14=α2+α+α2=αVar[X]=E[X2]E[X]2,=α2+n=n2pX(n),=α2+n=0n2αneαn!,=α2+n=1n2αneαn!,=α2+αn=1nαn1eα(n1)!,6=α2+αl=0(l+1)αleαl!,=α2+αl=0αleαl!+αl=0lαleαl!,=α2+αl=0αleαl!+αl=1lαleαl!,=α2+αl=0αleαl!+α2l=1αl1eα(l1)!,6=α2+αl=0αleαl!+α2i=0αieαi!,14=α2+α+α2=α



Families of continuous random variables

Screencast video [⯈]



The Exponential(λλ) distribution

The exponential distribution is a continuous probability distribution that follow an exponential curve. The curve is fully characterized by the rate parameter λλ.

Example plot of the (a) probability density function and (b) cumulative density function of the Exponential(λλ) distribution.

Probability density function

The probability density function of the continuous Exponential(λλ) distribution is given as pX(x)={λeλx,for x00,for x<0pX(x)={λeλx,for x00,for x<0(37) where λ>0λ>0.

Cumulative distribution function

The cumulative distribution function of the continuous Exponential(λλ) distribution can be determined as PX(x)={1eλxfor x00for x<0PX(x)={1eλxfor x00for x<0(38)
The cumulative distribution function of the continuous Exponential(λλ) distribution can be determined as PX(x)=xpX(n)dn,={x0λeλndn,for x00,for x<0={λ[1λeλn]x0,for x00,for x<0={[eλn]x0,for x00,for x<0={1eλx,for x00for x<0PX(x)=xpX(n)dn,={x0λeλndn,for x00,for x<0={λ[1λeλn]x0,for x00,for x<0={[eλn]x0,for x00,for x<0={1eλx,for x00for x<0(39)

Expected value

The expected value of the continuous Exponential(λλ) distribution can be determined as E[X]=1λ.E[X]=1λ.(40)
E[X]=xpX(x)dx,=λ0xeλxdx,15=λ[1λxeλx]0λ01λeλxdx,=[xeλx]01λ[eλx]0,16=(00)1λ(01)=1λ.E[X]=xpX(x)dx,=λ0xeλxdx,15=λ[1λxeλx]0λ01λeλxdx,=[xeλx]01λ[eλx]0,16=(00)1λ(01)=1λ.(41)

1516

Variance

The variance of the continuous Exponential(λλ) distribution can be determined as Var[X]=1λ2Var[X]=1λ2(42)
The variance of the continuous Exponential(λλ) distribution can be determined as Var[X]=E[X2]E[X]2,=x2pX(x)dx1λ2,=1λ2+λ0x2eλxdx,15=1λ2+λ[1λx2eλx]0λ02λxeλxdx,=1λ2[x2eλx]0+20xeλxdx,15 16=1λ2(00)+2[1λxeλx]0201λeλxdx,=1λ22λ[xeλx]0+2λ0eλxdx,16=1λ22λ(00)+2λ[1λeλx]0,=1λ22λ2(01)=1λ2Var[X]=E[X2]E[X]2,=x2pX(x)dx1λ2,=1λ2+λ0x2eλxdx,15=1λ2+λ[1λx2eλx]0λ02λxeλxdx,=1λ2[x2eλx]0+20xeλxdx,15 16=1λ2(00)+2[1λxeλx]0201λeλxdx,=1λ22λ[xeλx]0+2λ0eλxdx,16=1λ22λ(00)+2λ[1λeλx]0,=1λ22λ2(01)=1λ2(43)



The continuous Uniform(a,b) distribution

The continuous Uniform distribution is a continuous probability distribution that models an experiment where the outcomes are mapped only to the interval from aa up to and including bb, with the same probability all over this range. The distribution is fully characterized by the parameters aa and bb, which are the continuous lower and upper bound of the interval respectively.

Example plot of the (a) probability density function and (b) cumulative density function of the continuous Uniform(a,b) distribution.

Probability density function

The probability density function of the continuous Uniform(a,b) distribution is given as pX(x)={1ba,for axb0,otherwise,pX(x)={1ba,for axb0,otherwise,(44) where b>ab>a.

Cumulative distribution function

The cumulative distribution function of the continuous Uniform(a,b) distribution can be determined as PX(x)={0for xaxabafor a<x<b1for xbPX(x)=0for xaxabafor a<x<b1for xb(45)
PX(x)=xpX(n)dn,={0,for xaxa1badn,for a<x<b1,for xb={0for xaxabafor a<x<b1for xbPX(x)=xpX(n)dn,=0,for xaxa1badn,for a<x<b1,for xb=0for xaxabafor a<x<b1for xb

Expected value

The expected value of the continuous Uniform(a,b) distribution can be determined as E[X]=a+b2.E[X]=a+b2.(46)
E[X]=xpX(x)dx,=bax1badx,=1ba[12x2]ba,=1ba(12b212a2)=(ba)(b+a)2(ba)=a+b2.E[X]=xpX(x)dx,=bax1badx,=1ba[12x2]ba,=1ba(12b212a2)=(ba)(b+a)2(ba)=a+b2.(47)

Variance

The variance of the continuous Uniform(a,b) distribution can be determined as Var[X]=112(ba)2.Var[X]=112(ba)2.(48)
Var[X]=E[X2]E[X]2,=x2pX(x)dx(a+b)24,=bax21badx(a+b)24,=1ba[13x3]ba(a+b)24,=1ba(13b313a3)(a+b)24,=b3a33(ba)(a+b)24,=4b34a312(ba)3(a2+b2+2ab)(ba)12(ba),=4b34a33a2b+3a33b3+3b2a6ab2+6a2b12(ba),=b3a3+3a2b3ab212(ba)=(ba)312(ba)=112(ba)2.Var[X]=E[X2]E[X]2,=x2pX(x)dx(a+b)24,=bax21badx(a+b)24,=1ba[13x3]ba(a+b)24,=1ba(13b313a3)(a+b)24,=b3a33(ba)(a+b)24,=4b34a312(ba)3(a2+b2+2ab)(ba)12(ba),=4b34a33a2b+3a33b3+3b2a6ab2+6a2b12(ba),=b3a3+3a2b3ab212(ba)=(ba)312(ba)=112(ba)2.(49)

The Normal or Gaussian N(μ,σ2)N(μ,σ2) distribution

The Normal or Gaussian distribution is probably the most commonly used continuous probability distribution. The distribution is bell-shaped and symmetric. The function is characterized by its mean μμ and its variance σ2σ2.

Example plot of the (a) probability density function and (b) cumulative density function of the Gaussian(μμ, σ2σ2) distribution.

The Standard normal N(0,1)N(0,1) distribution

The Standard normal distribution is a specific case of the Normal or Gaussian distribution, where the mean equals μ=0μ=0 and the variance equals σ2=1σ2=1. This function can be regarded as the normalized Gaussian distribution. Any random variable YN(μY,σ2Y)YN(μY,σ2Y) can be transformed to a random variable XX under the Standard normal distribution by subtracting its mean and dividing by the standard deviation as X=YμYσYX=YμYσY

The QQ-function

The QQ-function is a commonly used function in statistics, which calculates the probability of a Standard normal distributed random variable XX exceeding a certain threshold xx. It is also known as the right-tail probability of the Gaussian distribution, since it is calculated by integrating the right side of the Gaussian PDF from the threshold xx up to . The QQ-function is defined as Q(x)=Pr[X>x]=12πxeu22du.Q(x)=Pr[X>x]=12πxeu22du.(50) The function can be used for all Gaussian distributed random variables, however, the random variable and the corresponding threshold should be normalized first. Additionally, through symmetry follows that Q(x)=1Q(x)Q(x)=1Q(x), where Q(x)Q(x) is equal to the cumulative density function PX(x)PX(x).

Probability density function

The probability density function of the continuous Gaussian NN(μ,σ2μ,σ2) distribution is given as pX(x)=12πσ2e(xμ)22σ2,pX(x)=12πσ2e(xμ)22σ2,(51) where σ>0σ>0.

Cumulative distribution function

The cumulative distribution function of the continuous Gaussian NN(μ,σ2μ,σ2) distribution can be determined as PX(x)=Q(xμσ)PX(x)=Q(xμσ)(52)
PX(x)=xpX(n)dn,=12πσ2xe(nμ)22σ2dn,=Q(xμσ)PX(x)=xpX(n)dn,=12πσ2xe(nμ)22σ2dn,=Q(xμσ)(53)

Expected value

The expected value of the continuous Gaussian NN(μ,σ2μ,σ2) distribution can be determined as E[X]=μ.E[X]=μ.(54)
The expected value of the continuous Gaussian NN(μ,σ2μ,σ2) distribution can be determined as E[X]=xpX(x)dx,=12πσ2xe(xμ)22σ2dx,=12πσ2((xμ)+μ)e(xμ)22σ2dx,=12πσ2(xμ)e(xμ)22σ2dx+μ12πσ2e(xμ)22σ2dx,=σ22πσ22(xμ)2σ2e(xμ)22σ2dx+μ,=σ22πσ2[e(xμ)22σ2]+μ,=σ22πσ2(00)+μ=μ.E[X]=xpX(x)dx,=12πσ2xe(xμ)22σ2dx,=12πσ2((xμ)+μ)e(xμ)22σ2dx,=12πσ2(xμ)e(xμ)22σ2dx+μ12πσ2e(xμ)22σ2dx,=σ22πσ22(xμ)2σ2e(xμ)22σ2dx+μ,=σ22πσ2[e(xμ)22σ2]+μ,=σ22πσ2(00)+μ=μ.(55)

Variance

The variance of the continuous Gaussian NN(μ,σ2μ,σ2) distribution can be determined as Var[X]=σ2.Var[X]=σ2.(56)
Var[X]=E[(Xμ)2],=12πσ2(xμ)2e(xμ)22σ2dx,17=12πσ2(xμ)2e(xμ)22σ2dx,=σ2πw2wew2dw,15=σ2π[wew2]σ2πew2dw,16 18=σ2π(00)+σ2ππ=σ2.Var[X]=E[(Xμ)2],=12πσ2(xμ)2e(xμ)22σ2dx,17=12πσ2(xμ)2e(xμ)22σ2dx,=σ2πw2wew2dw,15=σ2π[wew2]σ2πew2dw,16 18=σ2π(00)+σ2ππ=σ2.(57)

1718

As is implicated by the central limit theorem (explained in the section Funciton and pairs of random variables), the Gaussian distribution is extremely important. The Gaussian distribution is often used to model measurements in practice and, thanks to the CLT, its use can often be extended to other distributions. A Gaussian distribution is also often used to model the thermal noise of a band-limited system. This section will generalize the definition of the Gaussian distribution given in the previous reader and extend it to the multivariate case.

Univariate distribution

In the case of a single random variable XX that is generated according to a Gaussian distribution, defined by its mean μμ and variance σ2σ2 where the subscript XX is now omitted for simplification, the probability density function is defined as pX(x)=12πσ2e(xμ)22σ2.pX(x)=12πσ2e(xμ)22σ2.(58) The left side of the figure below shows an example of an univariate Gaussian distribution.

Example of an univariate and a multivariate Gaussian probability density function.
Example of an univariate and a multivariate Gaussian probability density function.

Multivariate distribution

The definition of the univariate Gaussian distribution can be extended to a multivariate distribution. To better understand the multivariate gaussian distribution, it might be useful to first read the sections Funciton and pairs of random variables and Random vectors. To define the Gaussian distribution the position and its spread are required. These quantities are represented by the mean vector μμ and the covariance matrix ΣΣ. Whereas the covariance matrix is defined as ΓΓ here, literature has adopted the ΣΣ notation when discussing multivariate Gaussian distributions, as ΣΣ is the Greek capital letter of σσ.

To indicate that a kk-dimensional random vector XX is Gaussian distributed, we can write XNk(μ,Σ)XNk(μ,Σ). The probability density function of such a multivariate Gaussian distribution is defined as pX(x)=1(2π)k|Σ|exp{12(xμ)Σ1(xμ)}, where |Σ| is the determinant of the covariance matrix. Please note the similarities between the univariate Gaussian distribution and the multivariate distribution. The inverse covariance matrix Σ1 is often also called the precision matrix and is denoted by Λ, because a low variance (i.e. low spread) relates to high precision and vice versa.

The covariance matrix of a multivariate Gaussian distribution

The probability density function of a Gaussian distribution is fully determined by its mean μ and its covariance matrix Σ. In order to give some intuition on how the mean and covariance matrix structure influence the final distribution, we jump forward to Fig. 2 in the next section where three multivariate distributions have been plotted. The covariance matrices that were used to plot these distributions in the figure are from left to right: Σ1=[10.50.51]Σ2=[1001]Σ3=[10.50.51] Please note how the off-diagonal entries, referring to Cov[X1,X2] and Cov[X2,X1] influence the shape of the distribution.

In order to understand how the covariance matrix is related to the tilt and the shape of the distribution, we need to first introduce the so-called rotation matrix and the eigenvalue decomposition. The rotation matrix Rθ rotates a coordinate counter-clockwise over an angle θ with respect to the origin. This rotation matrix is defined as Rθ=[cos(θ)sin(θ)sin(θ)cos(θ)] and a rotation of θ from the coordinates (x,y) to the coordinates (x,y) can be represented by [xy]=Rθ[xy]=[xcos(θ)ysin(θ)xsin(θ)+ycos(θ)]. One of the properties of a rotation matrix is that it is orthogonal. This means that RθRθ=I, where I is the identity matrix. Using the fact that R1θ=Rθ=Rθ from its definition, the orthogonality property makes complete sense, because rotating a coordinate with the angle θ and θ respectively does not change anything.

Besides the rotation matrices, we need to introduce the eigenvalue decomposition in order to better understand the covariance matrix structure. The eigenvalue decomposition states that a square invertible symmetric matrix A can be written as A=QΛQ1, where the orthogonal matrix Q contains the eigenvectors of A and Λ is a diagonal matrix containing the eigenvalues of A.

Now the general representation of the rotation matrix has been defined as well as the eigenvalue decomposition, we can show that any covariance matrix can be written as the rotations of a diagonal covariance matrix. This point is very important to understand. To start off, a diagonal covariance matrix can be represented as Σd=[a00b]. The entries a and b correspond to the individual variances of X1 and X2 according to the definitions and are at the same time the eigenvalues of Σd. An example of a Gaussian distribution that corresponds to a diagonal covariance matrix where a=25 and b=4 is shown on the left in the figure below. Please note that the ratio of a and b also represents the ratio of the length (the major axis) and the width (the minor axis) of the distribution. \par If we were to apply the eigenvalue decomposition to a covariance matrix Σ, we would interestingly enough find that Σ=RθΣdRθ. The right of the figure below shows an example of a multivariate Gaussian distribution whose covariance matrix is a rotated version of the diagonal covariance matrix corresponding to the left side of the same figure. From this we can see that the ratio of eigenvalues of Σ corresponds to the ratio of the lengths of the major and minor axes. Furthermore, we can conclude that the matrix containing the eigenvectors of Σ is at the same time a rotation matrix, implicitly defining the rotation angle.

Two multivariate Gaussian distribution, whose covariance matrices are related through the rotation matrices corresponding to a counter-clockwise rotation of $\pi/4$ radians.
Two multivariate Gaussian distribution, whose covariance matrices are related through the rotation matrices corresponding to a counter-clockwise rotation of π/4 radians.



Sampling random variables

Most statistical packages in computing softwares provide a so-called pseudorandom number generator, which is an algorithm to randomly sample a number between 0 and 1 with equal probability. Basically, this means generating random samples from a continuos random variable U which follows a uniform distribution, UU(0,1). More in general, sampling a random variable means generating values xX in such a way that the probability of generating x is in accordance with the proability density function pX(x), or equivalently the cumulative distribution function PX(x), associated with X.

Assuming that we have a pseudorandom number generator, how can we generate samples of any random variable X if we know its probability distribution? We need to find a transformation T:[0,1]R such that T(U)=X.

For continous random variable, the following theorem can help us with this task.

Theorem

Let X be a continuous random variable with CDF PX(x) which possesses an inverse P1X. Let UU(0,1) and Y=P1X(U), then PX(x) is the CDF for Y. In other words, Y has the same distribution as X.

According to this theorem, the transformation T we were looking for is simply given by P1X. Then, to sample x, it is sufficient to follow these steps:

  1. Generate a random number u from uniform distribution UU(0,1);
  2. Find the inverse of the CDF of X, P1X;
  3. Compute x as x=P1X(u).

This method of sampling a random variable is known as the inverse transform technique.

For discrete random variables, however, this technique cannot apply directly, because when X is discrete, the relationshipe between X and P1X(U) .

More formally, Let X={x1,,xn} be a discrete random variable with probability mass function pX(x), and where x1xn. Let us define each value of the CDF of X as

qi=Pr[Xxi]=ij=1PX(xj).

The sampling formula for X becomes:

PX(x)={x1if U<q1x2ifq1U<q2xn1if qn2U<qn1xnotherwise

  1. Sum of a geometric series: n1k=0ark=a(1rn1r)for |r|<1 ↩︎

  2. Derivative of the sum of a geometric series: k=1krk1=1(1r)2for |r|<1 ↩︎

  3. Second derivative of the sum of a geometric series: k=1k2rk1=r+1(r1)3for |r|<1 ↩︎

  4. The binomial coefficient, which returns the total number of possible unsorted combinations of k distinct elements from a set of n distinct elements, defined as: (n k)=n!k!(nk)!, where ! is the factorial operator (e.g. 4!=4321), which denotes the total number of sorted permutations. ↩︎

  5. Factors of the binomial coefficient: k(nk)=kn!k!(nk)!=n(n1)!(k1)!((n1)(k1))!=n(n1k1) ↩︎

  6. Use substitutions l=n1, k=m1, i=l1 and j=k1. ↩︎

  7. Binomial theorem: (x+y)n=nk=0(nk)xkynk ↩︎

  8. Ordinary generating function: n=k(nk)yn=yk(1y)k+1 ↩︎

  9. Recurrence relationship: (nk)=(n1k1)+(n1k) ↩︎

  10. Variation of the factors of the binomial coefficient: n(n1k)=(k+1)(nk+1) ↩︎

  11. Sum of consecutive integers: ln=kn=12[ln=kn+ln=kn]=12[k+(k+1)++(l1)+l+k+(k+1)++(l1)+l]=12[(k+l)+(k+l)++(k+l)+(k+l)]lk+1 terms=12(k+l)(lk+1) ↩︎

  12. Substitute m=nk+1. ↩︎

  13. Square pyramidal number or the sum of squared integers: kn=1n2=k(k+1)(2k+1)6 ↩︎

  14. Definition of Poisson distribution and total probability axiom: n=pX(n)=1 ↩︎

  15. Integration by parts: baf(x)g(x)dx=[f(x)g(x)]babaf(x)g(x)dx ↩︎

  16. Growth property of exponentials: limxxaex=0 ↩︎

  17. Substitute w=xμ2σ2. ↩︎

  18. Gaussian integral: ex2dx=π ↩︎