Random vectors

Introduction

Previously we saw how we could grasp this notion of uncertainty in probability functions and we saw what happened when our outcome was conditioned. Finally, a brief discussion was provided for the case when multiple random variables were involved. In these examples, only the situation where we had 2 random variables was discussed. This module will generalize this theory for multiple random variables. In this case an outcome of an experiment comprises NN observed quantities. An example of such an observation is a noisy electroencephalography (EEG) measurement, which resembles the electrical activity of the brain and is measured over several channels.



Screencast video [⯈]



Multivariate joint probability distributions

Let us denote each of these measured quantities by the random variable XnXn, where nn ranges from 11 up to NN. Using this definition, the multivariate (meaning that multiple variables are involved) joint probability functions can be introduced. For notation purposes, all random variables XnXn can be grouped in a random vector X=[X1,X2,,XN]X=[X1,X2,,XN], where the operator denotes the transpose, turning this row vector into a column vector. The bold capital letter distinguishes the random vector containing multiple random variables from a single random variable. Similarly, a specific realization of this random vector can be written in lower-case as x=[x1,x2,,xN]x=[x1,x2,,xN].

Multivariate joint cumulative distribution function

The multivariate joint cumulative distribution function of the random vector XX containing random variables X1,X2,,XNX1,X2,,XN is defined as PX(x)=PX1,,XN(x1,,xN)=Pr[X1x1,,XNxN].PX(x)=PX1,,XN(x1,,xN)=Pr[X1x1,,XNxN].(1) This definition holds for both discrete as continuous random variables.

Multivariate joint probability mass function

The multivariate joint probability mass function of the random vector XX containing discrete random variables X1,X2,,XNX1,X2,,XN is similarly defined as pX(x)=pX1,,XN(x1,,xN)=Pr[X1=x1,,XN=xN].pX(x)=pX1,,XN(x1,,xN)=Pr[X1=x1,,XN=xN].(2)

Multivariate joint probability density function

The multivariate joint probability density function of the random vector XX containing continuous random variables X1,X2,,XNX1,X2,,XN is defined from the multivariate joint cumulative distribution function as pX(x)=pX1,,XN(x1,,xN)=NPX1,,XN(x1,,xN)x1xN.pX(x)=pX1,,XN(x1,,xN)=NPX1,,XN(x1,,xN)x1xN.(3)

Generalized probability axioms for multivariate joint distributions

From these definitions several multivariate joint probability axioms can be determined, which are similar to the case of two random variables as discussed in the last reader.

  1. It holds that pX(x)0pX(x)0, where XX is a continuous or discrete random vector.
  2. From the multivariate joint probability density function it follows that PX1,,XN(x1,,xN)=x1xNpX1,,XN(x1,,xN) dx1dxNPX1,,XN(x1,,xN)=x1xNpX1,,XN(x1,,xN) dx1dxN holds for continuous random vectors.
  3. Through the law of total probability it holds that
    1. x1SX1xNSXNpX1,,XN(x1,,xN)=1x1SX1xNSXNpX1,,XN(x1,,xN)=1 for discrete random vectors and
    2. pX1,,XN(x1,,xN) dxNdx1=1pX1,,XN(x1,,xN) dxNdx1=1 for continuous random vectors.
  4. The probability of an event AA can be determined as
    1. Pr[A]=XSpX1,,XN(x1,,xN)Pr[A]=XSpX1,,XN(x1,,xN) for discrete random variables and
    2. Pr[A]=ApX1,,XN(x1,,xN) dx1dxNPr[A]=ApX1,,XN(x1,,xN) dx1dxN for continuous random variables.

Axiom 1 simply states that a probability (density) cannot be smaller than 0, since no negative probabilities exist by definition. The second axiom is a direct consequence of integrating both sides of the multivariate joint probability density function allowing us to determine the multivariate joint cumulative distribution function from the multivariate joint probability density function. The third axiom is a direct consequence of the law of total probability, where the probability of all events together equal 1. The final axiom tells us to sum or integrate over all possible outcomes of an event AA in order to calculate its probability.



Probability distributions of multiple random vectors

The notation of a random vector allows us to easily include multiple random variables in a single vector. Suppose now that our random vector ZZ contains 2 different types of random variables, where for example each random variable corresponds to a different type of measurement. If we were to distinguish between these type of random variables using two generalized random variables XiXi and YiYi, the random vector ZZ could be written as Z=[X1,X2,,XN,Y1,Y2,,YM]Z=[X1,X2,,XN,Y1,Y2,,YM]. If we now were to define the random vectors X=[X1,X2,,XN]X=[X1,X2,,XN] and Y=[Y1,Y2,,YM]Y=[Y1,Y2,,YM], it becomes evident that we could simplify the random vector ZZ as Z=[X,Y]Z=[X,Y].

This shows that it is also possible for joint probability distributions to depend on multiple random vectors, which each can be regarded as a subset of all random variables. This can prove useful in some cases when there is a clear distinction between the subsets and is purely for notation purposes. A probability distribution depending on multiple random vectors can be regarded in all aspects as a probability distribution depending on a single random vector (which is a concatenation of all different random variables). All calculations can be performed by regarding the probability distribution as if it depends on a single (concatenated) random vector. A probability distribution involving multiple random vectors can be written for example as pX,Y(x,y)pX,Y(x,y).



Conditional probabilities

Similarly as in the previous reader, the conditional probability can be determined by normalizing the joint probability with the probability of the conditional event through

pX|B(x)={pX(x)Pr[B],when xB0.otherwisepX|B(x)={pX(x)Pr[B],when xB0.otherwise(4)



Marginal probabilities

Because the notation of a random vector XX is just a shorter notation for the set of random variables X1,X2,,XNX1,X2,,XN, it is possible to calculate the marginalized probability distribution of a subset of random variables. This subset can also just consist of a single random variable. Again this operation is performed through marginalization as discussed in the previous reader. For the case that we are given the probability distribution pX(x)pX(x) and we would like to know the marginalized probability distribution pX2,X3(x2,x3)pX2,X3(x2,x3) this can be calculated as pX2,X3(x2,x3)=pX(x) dx1dx4dxNpX2,X3(x2,x3)=pX(x) dx1dx4dxN(5) for continuous random variables and as pX2,X3=x1SX1x4SX4xNSXNpX(x)pX2,X3=x1SX1x4SX4xNSXNpX(x)(6) for discrete random variables. Here we have integrated or summed over all possible values of all random variables except for the ones that we are interested in.



Independence

Independence is a term in probability theory which reflects that the probability of an event AA is not changed after observing an event BB, meaning that Pr[A|B]=Pr[A]Pr[A|B]=Pr[A]. In other words, the occurrence of an event BB has no influence on the probability of an event AA. Keep in mind that this does not mean that the physical occurrence of event AA and BB are unrelated, it just means that the probability of the occurrence of event AA is unrelated to whether event BB occurs or not.

Independent random variables

This notion of independence can be extended to probability functions. The random variables X1,X2,,XNX1,X2,,XN can be regarded as independent if and only if the following factorization holds pX1,X2,,XN(x1,x2,,xN)=pX1(x1)pX2(x2)pXN(xN).pX1,X2,,XN(x1,x2,,xN)=pX1(x1)pX2(x2)pXN(xN).(7) This equation represents that the total probability can be written a multiplication of the individual probabilities of the random variables. From a probability point of view (not a physical one) we can conclude that the random variables are independent, because the total probability solely depends on all the individual contributions of the random variables. Random variables that satisfy the independence equation and are distributed under the same probability density function are regarded as independent and identically distributed (IID or iid) random variables. This notion will become important later on when discussing random signals.

Independent random vectors

It is also possible to extend the definition of independence to random vectors. Two random vectors XX and YY can be regarded as independent if and only if the probability function can be written as pX,Y(x,y)=pX(x) pY(y).pX,Y(x,y)=pX(x) pY(y).(8)



Statistical characterization of random vectors

In the previous reader on random variables, several characteristics were discussed for probability distributions depending on a single random variable, such as the mean, the variance and the moments of a random variable. This section will extend these characterization to random vectors.

Expected value

The expected value of a random vector XX is defined as the vector containing the expected values of the individual random variables X1,X2,,XNX1,X2,,XN as E[X]=μX=[μ1,μ2,,μN]=[E[X1],E[X2],,E[XN]].E[X]=μX=[μ1,μ2,,μN]=[E[X1],E[X2],,E[XN]].(9)

Expected value of a function

When we are interested in the expected value of a certain function g(X)g(X), which accepts a random vector as argument and transforms it to a single value, this can be determined by multiplying the functions result with its corresponding probability and summing or integrating over all possible realizations of XX. For a discrete random vector XX consisting of random variables X1,X2,,XNX1,X2,,XN the expected value of a function g(X)g(X) can be determined as E[g(X)]=x1SX1xNSXNg(x)pX(x)E[g(X)]=x1SX1xNSXNg(x)pX(x)(10) and for a continuous random vector as E[g(X)]=g(x)pX(x) dx1dxN.E[g(X)]=g(x)pX(x) dx1dxN.(11)

Covariance

Previously, we have introduced the second centralized moment of a univariate random variable as the variance. While the variance denotes the spread in the univariate case, it has a different meaning in the multivariate case. Have a look at Fig. 1, where two contour plots are shown for two distinct multivariate Gaussian probability density functions. The exact mathematical description of a multivariate Gaussian probability density function is introduced here.

Visualisation of the concept of covariance.
This figure shows two different multivariate Gaussian probability density functions, under which each 10000 random two-dimensional samples are generated. The individual distributions of both the sample elements x1x1 and x2x2 are approximated using a histogram and a fitted Gaussian distribution. It can be seen that the variances of both x1x1 and x2x2 are identical, however, the initial multivariate distributions are definitely not identical. This phenomenon is caused by the covariance between x1x1 and x2x2.

It can be noted that the distributions in Fig. 1 are different, since the first contour plot shows clear circles and the second contour plot shows tilted ellipses. For both distributions, 10000 random realizations x=[x1,x2]x=[x1,x2] are generated and the marginal distributions of both X1X1 and X2X2 are shown using a histogram and a fitted Gaussian distribution. It can be seen that the individual distributions of X1X1 and X2X2 are exactly equal for both multivariate distributions, but still the multivariate distributions are different. This difference can be explained by the covariance between the random variables X1X1 and X2X2 of the random vector XX.

The covariance is a measure of the relationship between 2 random variables. The second distribution in Fig. 1 has a negative covariance between X1X1 and X2X2, because if X1X1 increases X2X2 decreases. No such thing can be said about the first distribution in Fig. 1, where X1X1 and X2X2 seem to have no relationship and behave independently from each other.

The formal definition of the covariance between two random variables X1X1 and X2X2 is given by Cov[X1,X2]=E[(X1μX1)(X2μX2)],Cov[X1,X2]=E[(X1μX1)(X2μX2)],(12) which is very similar to the definition of the variance, even to such an extent that it actually represents the variance if X1=X2X1=X2. Intuitively, one might regard the covariance as the expected value of the multiplication of X1X1 and X2X2 from which the means are subtracted. If both the normalized X1X1 and X2X2 have the same sign, then their multiplication would be positive and if X1X1 and X2X2 have different signs, then their multiplication would be negative. The covariance may therefore be regarded as a measure to indicate how X2X2 behaves if X1X1 increases or decreases.



Exercise


Recalling the exercise from previous section, let random variables XX and YY have joint PDF fX,Y(x,y)={5x2/21x1;0yx2,0otherwise.fX,Y(x,y)={5x2/21x1;0yx2,0otherwise. In the previous section, we found that:
  • E[X]=0E[X]=0 and Var[X]=1014Var[X]=1014
  • E[X]=514E[X]=514 and Var[X]=5/27(5/14)2=.0576Var[X]=5/27(5/14)2=.0576.
With this knowledge, can you compute Var[X+YX+Y]?

The variance of X+YX+Y is Var[X+Y]1=Var[X]+Var[Y]+2E[(XμX)(YμY)] =5/7+0.0576=0.7719Var[X+Y]1=Var[X]+Var[Y]+2E[(XμX)(YμY)] =5/7+0.0576=0.7719

1

Correlation

The definition of the covariance can be rewritten as Cov[X1,X2]=E[(X1μX1)(X2μX2)],=E[X1X2μX1X2μX2X1+μX1μX2],=E[X1X2]μX1E[X2]μX2E[X1]+μX1μX2,=E[X1X2]μX1μX2μX1μX2+μX1μX2,=E[X1X2]μX1μX2.Cov[X1,X2]=E[(X1μX1)(X2μX2)],=E[X1X2μX1X2μX2X1+μX1μX2],=E[X1X2]μX1E[X2]μX2E[X1]+μX1μX2,=E[X1X2]μX1μX2μX1μX2+μX1μX2,=E[X1X2]μX1μX2.(13) The term E[X1X2]E[X1X2] is called the correlation rX1,X2rX1,X2 of X1X1 and X2X2 and is defined as rX1,X2=E[X1X2].rX1,X2=E[X1X2].(14) This correlation can be regarded as a non-normalized version of the covariance. These two terms are related through Cov[X1,X2]=rX1,X2μX1μX2.Cov[X1,X2]=rX1,X2μX1μX2.(15) It can be noted that the correlation and covariance of two random variables are equal if the mean values of both random variables are 00.

Uncorrelated random variables

Two random variables are called uncorrelated if the covariance between both random variables equals 00 as Cov[X1,X2]=0.Cov[X1,X2]=0.(16) Although the term suggests that it is related to the correlation between two random variables, it is defined as a zero covariance. The former has a different definition.

Orthogonality

Two random variables are called orthogonal if the correlation between both random variables equals 00 as rX1,X2=0.rX1,X2=0.(17)

Correlation coefficient

The value of the covariance depends significantly on the variance of both random variables and is therefore unbounded. In order to express the relationship between two random variables without having a dependence on the variances, it first has to be normalized. Therefore the correlation coefficient is introduced as ρX1,X2=Cov[X1,X2]Var[X1]Var[X2]=Cov[X1,X2]σX1σX2.ρX1,X2=Cov[X1,X2]Var[X1]Var[X2]=Cov[X1,X2]σX1σX2.(18) Please note that this represents the normalized covariance and not the normalized correlation between two random variables, although the name suggests otherwise. Because of this normalization, the correlation coefficient has the property to be bounded between 11 and 11 as 1ρX1,X21.1ρX1,X21.(19) Fig. 2 shows realizations of three different probability distributions with negative, zero and positive correlation coefficients.

Visualisation of the concept of correlation coefficient.
Scatter plots of random realizations of random variables X1X1 and X2X2 with negative, zero and positive correlation coefficients.

Exercise


XX and YY are identically distributed random variables with E[X]=E[Y]=0E[X]=E[Y]=0 and covariance Cov[X,Y]=3Cov[X,Y]=3 and correlation coefficient ρX,Y=1/2ρX,Y=1/2. For nonzero constants aa and bb, U=aXU=aX and V=bYV=bY.
  1. Find Cov[U,V]Cov[U,V].
  2. Find the correlation coefficient ρU,VρU,V.
  3. Let W=U+VW=U+V. For what values of aa and bb are XX and WW uncorrelated?
  1. Since XX and YY have zero expected value, Cov[X,Y]=E[XY]=3Cov[X,Y]=E[XY]=3, E[U]=aE[X]=0E[U]=aE[X]=0 and E[V]=bE[Y]=0E[V]=bE[Y]=0. It follows that Cov[U,V]=E[UV]=E[abXY]=abE[XY]=abCov[X,Y]=3ab.Cov[U,V]=E[UV]=E[abXY]=abE[XY]=abCov[X,Y]=3ab.
  2. We start by observing that Var[U]=a2Var[X]Var[U]=a2Var[X] and Var[V]=b2Var[Y]Var[V]=b2Var[Y]. It follows that ρU,V=Cov[U,V]Var[U]Var[V]=abCov[X,Y]a2Var[X]b2Var[Y]=aba2b2ρX,Y=12ab|ab|.ρU,V=Cov[U,V]Var[U]Var[V]=abCov[X,Y]a2Var[X]b2Var[Y]=aba2b2ρX,Y=12ab|ab|. Not that ab/|ab|ab/|ab| is 1 if aa and bb have the same sign or is -1 if they have opposite signs.
  3. Since E[X]=0E[X]=0, Cov[X,W]=E[XW]E[X]E[U]=E[XW]=E[X(aX+bY)]=aE[X2]+bE[XY]=aVar[X]+bCov[X,Y].Cov[X,W]=E[XW]E[X]E[U]=E[XW]=E[X(aX+bY)]=aE[X2]+bE[XY]=aVar[X]+bCov[X,Y]. Since XX and YY are identically distributed, Var[X]=Var[Y]Var[X]=Var[Y] and 12=ρX,Y=Cov[X,Y]Var[X]Var[Y]=Cov[X,Y]Var[X]=3Var[X].12=ρX,Y=Cov[X,Y]Var[X]Var[Y]=Cov[X,Y]Var[X]=3Var[X]. This implies Var[X]=6Var[X]=6. From (3), Cov[X,W]=6a+3b=0Cov[X,W]=6a+3b=0, or b=2ab=2a.

Cross-covariance matrix

We previously discussed how we could determine the covariance of two random variables. Let us turn now to the covariance of two random vectors X=[X1,X2,,XN]X=[X1,X2,,XN] and Y=[Y1,Y2,,YN]Y=[Y1,Y2,,YN]. Intuitively, one might say that this covariance cannot be described by a single number, because there are more than one combinations of random variables of which we want to calculate the covariance. As an example, we could determine the covariances of X1X1 and Y1Y1, X1X1 and Y23Y23 and XNXN and Y1Y1. In order to facilitate for all these possible combinations, there is a need to introduce the cross-covariance matrix ΓXYΓXY, which contains the covariance of all the possible combinations of the random variables in random vectors XX and YY.

The covariance matrix is formally defined as ΓXY=E[(XμX)(YμY)]=[γ11γ12γ1Nγ21γ22γ2NγN1γN2γNN],ΓXY=E[(XμX)(YμY)]=⎢ ⎢ ⎢ ⎢ ⎢γ11γ12γ1Nγ21γ22γ2NγN1γN2γNN⎥ ⎥ ⎥ ⎥ ⎥,(20) where the individual coefficients correspond to γnm=Cov[Xn,Ym]=E[(XnμXn)(YmμYm)].γnm=Cov[Xn,Ym]=E[(XnμXn)(YmμYm)].(21) The transpose operator in the first equation creates a matrix from the two column vectors filled with the covariances of all possible combinations of random variables. For each of these covariances, the correlation coefficient ρnmρnm can be calculated similarly using the definition of the correlation coefficient. Two random vectors are called uncorrelated if ΓXY=0ΓXY=0.

Auto-covariance matrix of a random vector

For the special case that X=YX=Y, the cross-covariance matrix is called the auto-covariance matrix, which calculates the covariances between all random variables in XX. The definition is the same as the definition of the covariance matrix, where ΓXXΓXX is often simplified as ΓXΓX.

Exercise


An nn-dimensional Gaussian vector WW has a block diagonal covariance matrix CW=[CX00CY],CW=[CX00CY],(22) where CXCX is m×mm×m, CYCY is (nm)×(nm)(nm)×(nm). Show that WW can be written in terms of component vectors XX and YY in the form W=[XY],W=[XY],(23) such that XX and YY are independent Gaussian random vectors.
As given in the problem statement, we define the mm-dimensional vector XX, the nn-dimensional vector YY and W=[X,Y]W=[X,Y]. Note that WW has expected value ¯μW=E[W]=E[XY]=[E[X]E[Y]]=[¯μX¯μY].¯¯¯¯¯¯¯¯μW=E[W]=E[XY]=[E[X]E[Y]]=[¯¯¯¯¯¯¯μX¯¯¯¯¯¯μY]. The covariance matrix of WW is CW=E[(WμW)(WμW)]==E[[X¯μXY¯μY][(X¯μX)(Y¯μY)]]=[E[(X¯μX)(X¯μX)]E[(X¯μX)(Y¯μY)]E[(Y¯μY)(X¯μX)]E[(Y¯μY)(Y¯μY)]]=[CXCXYCYXCY]. The assumption that X and Y are independent implies that CXY=E[(X¯μX)(Y¯μY)]=E[(X¯μX)]E[(Y¯μY)]=0. This also implies that CYX=CXY=0. Thus CW=[CX00CY],

Cross-correlation matrix

Similarly to the cross-covariance matrix, the cross-correlation matrix can be defined containing the correlations of all combinations between the random variables in X and Y. The cross-correlation matrix of random vectors X and Y is denoted by RXY and is defined as RXY=E[XY]=[r11r12r1Nr21r22r2NrN1rN2rNN], where the individual coefficients correspond to the individual correlations rnm=E[XnYm]. Two random vectors are called orthogonal if RXY=0. Furthermore it can be proven that the cross-covariance matrix and the cross-covariance matrix are related through ΓXY=RXYμXμY.

Auto-correlation matrix of a random vector

For the special case that X=Y, the cross-correlation matrix is called the auto-correlation matrix, which calculates the correlations between all random variables in X. The definition is the same as the definition of the cross-correlation matrix, where RXX is often simplified as RX.

Linear transformations of random vectors

In the previous reader some calculation rules were determined for the mean and variance of a linearly transformed random variable. This subsection will continue with this line of thought, but now for random vectors. We will define an invertible transformation matrix A, with dimensions (N×N), which will linearly map a random vector X of length N to a random vector Y again with length N after adding an equally long column vector b through Y=g(X)=AX+b.

Probability density function

From the initial multivariate probability density function of X, pX(x), the new probability density function of Y can be determined as pY(y)=pX(g1(y))|detA|=pX(A1(yb))|detA|, where |detA| is the absolute value of the determinant of A.

Mean vector

The new mean vector of the random vector Y can be determined as μY=E[Y]=E[AX+b]=AE[X]+b=AμX+b.

Cross-covariance and cross-correlation matrix

By the definition of the cross-covariance matrix, the cross-covariance matrices ΓXY and ΓYX can be determined from the original auto-covariance matrix ΓX of X through ΓXY=E[(XμX)(YμY)],=E[(XμX)(AX+b(AμX+b))],=E[(XμX)(A(XμX))],=E[(XμX)(XμX)A],=E[(XμX)(XμX)]A=ΓXA and similarly we can find the result ΓYX=AΓX. The new cross-correlation matrices RXY and RYX can be determined as RXY=E[XY],=E[X(AX+b)],=E[X(AX)+Xb],=E[XXA]+E[X]b,=RXA+μXb and similarly as RYX=ARX+bμX.

Auto-covariance and auto-correlation matrix

The auto-covariance matrix of Y can be determined through ΓY=E[(YμY)(YμY)],=E[(AX+b(AμX+b))(AX+b(AμX+b))],=E[(A(XμX))(A(XμX))],=E[A(XμX)(XμX)A],=AE[(XμX)(XμX)]A=AΓXA. In a similar fashion the new auto-correlation matrix of Y can be calculated as RY=E[YY],=E[(AX+b)(AX+b)],=E[(AX+b)(XA+b)],=E[AXXA+AXb+bXA+bb],=AE[XX]A+AE[X]b+bE[X]A+bb,=ARXA+AμXb+bμXA+bb.


  1. Proof that Var[X+Y]=Var[X]+Var[Y]+2E[(XμX)(YμY)]: Var[X+Y]=E[(X+Y(μX+μY))2]=E[((XμX)+(YμY))2]=E[(XμX)2+2(XμX)(YμY)+(YμY))2] ↩︎