MVUE for Linear Models

Introduction

The minimum variance unbiased estimator (MVUE) has the lowest variance among all unbiased estimators. In general, there is no method to find the MVUE; however, for a special class of estimation problems, we can find these estimators. This module concerns a special class of signal models for which a minimum variance unbiased estimator (MVUE) is available. This special class consists of estimation problems with linear signal models and additive Gaussian noise.

The module begins by introducing the linear signal model and derives the MVUE for colored Gaussian noise. Next, we assume white Gaussian noise, which can be considered a special case of colored noise. When the information on noise distribution is not complete, an interesting approach is to approximate the noise as Gaussian by considering only the expected value and variance for the noise. Such estimators are called best linear unbiased estimators (BLUE), covered at the end of the module.

Linear Signal Model

A linear signal model is any model for which the signal s[n,θ] can be expressed as a linear combination of the weighted parameters, i.e., (1)s[n;θ]=k=0K1hk[n]θk. For multiple signal values, the model can be expressed as a matrix-vector product of the form (2)s(θ)=Hθ, where s(θ) is the signal vector of length N, H is the so-called observation matrix of size N×K, and θ is the parameter vector of length K.


Example:

Assume our signal model is a polynomial of degree K1 with coefficients θ0,θ1,,θK1, i.e., (3)s[n;θ]=θ0n0+θ1n1++θK1nK1, and suppose observations for n=0,1,,N1 are given. The corresponding observation matrix H is (4)H=[1000111112222K11N1(N1)2(N1)K1].


MVUE for Linear Models and Additive Gaussian Noise

To find the MVUE for linear signals in Gaussian noise, we will take advantage of the equality constraint of the CRLB. Therefore, we need to show that we can express the derivative of the log-likelihood function in the form (5)θl(x,θ)=I(θ)(g(x)θ), see module on the CRLB. To obtain the derivative of the log-likelihood function, we need first to determine the probability density function of the observations. For the linear signal model embedded in additive Gaussian noise, our observations are modeled as (6)x=Hθ+w, where w is zero mean noise with covariance matrix C, i.e., wN(0,C). Thus, the PDF of the observation vector is (7)p(x,θ)=1|2πC|12exp(12(xHθ)TC1(xHθ)).

The corresponding log-likelihood function is (8)l(x;θ)=12ln|2πC|12(xHθ)TC1(xHθ), which after expanding the second term becomes (9)l(x;θ)=12ln|2πC|12(xTC1xxTC1HθθTHTC1x+θTHTC1Hθ).

The gradient of the log-likelihood function is (10)θl(x;θ)=HTC1xHTC1Hθ.

How to compute the gradient.

The gradient of the log-likelihood function is θl(x;θ)=[θ0l(x;θ) θ1l(x;θ) θK1l(x;θ)], i.e., it is a vector containing all the partial derivatives of the log-likelihood function. To evaluate the partial derivatives, we first use the linearity property of the differential operator. The single expressions in the log-likelihood function are of the form aTx, xTa, and xTAx. The partial derivatives and the gradient of these expressions can be obtained as follows:

  1. The expression aTx is aTx=n=0N1anxn=a0x0+a1x1++aN1xN1. Taking the derivative with respect to one specific element xi is aTxxi=ai, since only the addend aixi is a function of xi with derivative ai. All other addends of the sum are constants and vanish.
  2. Since xTa=aTx is true for real-valued vectors, we also have xTaxi=ai.
  3. The expression xTAx can also be expressed as xTAx=n=0N1m=0N1xnAn,mxm, which after rearranging the summations is n=0N1m=0N1xnAn,mxm=Ai,ixi2+xim=0miN1Ai,mxm+xin=0niN1xnAn,i+n=0niN1m=0miN1xnAn,mxm. Taking the derivative with respect to xi, we obtain xTAxxi=2Ai,ixi+m=0miN1Ai,mxm+n=0niN1xnAn,i Note that the two summations miss the summation index m=i and n=i, respectively. However, we have additionally to the two sums the expression 2Ai,ixi which is exactly the missing addend in the sums. Thus, we can add distribute 2Ai,ixi over the two sums and get xTAxxi=m=0N1Ai,mxm+n=0N1xnAn,i. The above expression can equivalently be expressed as an inner product between the ith row of A, denoted by ai, and x and the inner product between the ith column of A, denoted by aiT, and x, i.e., xTAxxi=aix+aiTx

Combining the above results, we obtain (11)aTxx=xTax=a, and (12)xTAxx=(A+AT)x. For symmetric matrices, i.e., A=AT, the latter expression simplifies to (13)xTAxx=2Ax.

Note that the covariance matrix C is symmetric and so is its inverse. Using a=HTC1x, aT=xTC1H, and A=AT=HTC1H, we obtain (10)

If HTC1H is invertible, we further have (14)θl(x;θ)=HTC1H((HTC1H)1HTC1xθ). By comparing (14) with (5), we recognize the Fisher information matrix (15)I(θ)=HTC1H and the estimator (16)g(x)=(HTC1H)1HTC1x. We found an efficient estimator, and thus, an MVUE. In the module on the MLE, we also stated that if an efficient estimator exits, it is the MLE, which is easily verified by equating (10) with zero and solving for θ.

MVUE for Linear Models in White Additive Gaussian Noise

So far, we have assumed that our noise is colored, i.e., that our noise sample are correlated. For the case of additive white Gaussian noise we can reuse the above result. Therefore, we assume that our individual noise samples are IID with zero mean and variance σ2. In this case, the covariance matrix C is a diagonal matrix with elements σ2 on the main diagonal, i.e., (17)C=σ2I Substitung (17) in (14) yields (18)θl(x;θ)=HTHσ2((HTH)1HTxθ) where we recognize the estimator (19)g(x)=(HTH)1HTx and the Fisher information matrix (20)I(θ)=HTHσ2.


Example

Assume we have a signal composed of P sinusoidal signals embedded in white Gaussian noise with zero mean and variance σ2, i.e., (21)x[n]=p=0P1cpcos(2πkpNnφp)+w[n], where cp and φp are the unknown amplitude and phase, respectively. The frequency parameters k0 to k(P1) are assumed to be known, different, and between 0 and N1. The signal is linear in the parameter cp, but it is not linear in the unknown phase φp. However, we can use the trigonometric relations (22)cos(α)cos(β)=12(cos(αβ)+cos(α+β)) and (23)sin(α)sin(β)=12(cos(αβ)cos(α+β)) to express (21) as (24)x[n]=p=0P1cpcos(φ)apcos(2πkpNn)+cpsin(φ)bpsin(2πkpNn)+w[n]. The transformed model is now linear in the unknown parameters ap and bp. Due to the invariance of the MLE, we can find the maximum likelihood estimate for the amplitudes ap and bp and obtain the maximum likelihood estimate for cp and φ by using relations (25)cp=ap2+bp2 and (26)φp=arctan2(bp,ap). The signal model in (24) for multiple observation can be expressed as (27)x=Hθ+w. The observation matrix H is (28)H=[ck0ck(P1)sk0sk(P1)] with columns (29)ckp=[cos(2πkpN0)cos(2πkpN1)cos(2πkpN(N1))]T and (30)skp=[sin(2πkpN0)sin(2πkpN1)sin(2πkpN(N1))]T, respectively. The parameter vector θ for this model is (31)θ=[a0a(P1)b0b(P1)]

Note that the frequencies kp are multiple of the fundamental frequency 1/N, and thus, the columns of the observation matrix are orthogonal. The matrix HTH is (32)HTH=N2I, i.e., a diagonal matrix with entries N/2 on the main diagonal. The maximum likelihood estimate is (33)θ^ML=(HTH)1Hx=[2Nck0T 2Nck(P1)T 2Nsk0T 2Nsk(P1)T]x The estimate of the parameter ap and ab are (34)a^p=2Nn=0N1x[n]cos(2πkpNn) and (35)b^p=2Nn=0N1x[n]sin(2πkpNn) Suppose that the frequency kp takes all values of from 0 to N1, then (34) and (35) are the discrete Fourier transform for real signals.


Best Linear Unbiased Estimator (BLUE)

So far, we have derived the MVUE for linear signal models with AWGN. In many applications, it is not possible to find an MVUE estimator using techniques presented up to this point. A popular choice is to restrict the estimator to be linear in the observation x, i.e., of the form (36)θ^=Ax. Here, we assume that N observations are available and that we seek to estimate K unknown parameter with N>K. Thus, A is of size K×N. Depending on the choice of the A's coefficients, different estimators can be obtained. Among all linear estimators, we seek an estimator that is unbiased and has minimum variance. The estimator which fulfills these properties is called the best linear unbiased estimator (BLUE). The BLUE, as we will see, is also applicable, if limited knowledge about the PDF of the observation is available.

To find the BLUE, we have to start from the assumptions at hand; that the estimator has zero bias and minimum variance. The zero bias condition is formulated as (37)E[θ^]=E[Ax]=AE[x]=θ, where A is the K×N matrix that consists of the coefficients ak,n. The constraint of unbiasedness requires a linear relation between the parameter vector θ and the expected value of the data such that (38)E[x]=Hθ. Substituting (38) in (37) yields (39)E[θ^]=AHθ.

Thus, for the linear estimator to be unbiased, we require that (40)AH=I, where I is the K×K identity matrix.

The variance of the kth estimate θ^k is (41)Var[θ^k]=E[(akTxE[akTx])2] where akT is the kth row of A. This can be equivalently expressed as (42)Var[θ^k]=E[(akT(xE[x]))2](43) =E[(akT(xE[x])(xE[x])Tak](44) =akTCak. Here, C is the covariance matrix for the data, which is equivalent to the covariance matrix for the zero-mean noise due to the linear signal model.

The BLUE is found by minimizing the variance for all estimates with the constraint AH=I. Thus, the optimization problem is (45)minakTakTCak,s.t. akThl=δk,l, where hl is the lth column of H. The minimum can be found using Lagrange multipliers, which results in (46)θ^=(HTC1H)1HTC1x. The covariance matrix for the estimate θ^ is (47)Cθ^=(HTC1H)1.

Note that the BLUE is the MVUE for the linear model in (6). This follows directly from the unbiased constraint (48)E[x]=E[Hθ+w]=Hθ.