1-617-275-8164

Theory of estimation,theory of point estimation,theory of regression !

Get Help on Theory of estimation from our experts tutors !

Home - Statistics - Variance Unbiased Estimator

Minimum Variance Unbiased Estimator
  For a particular problem, it may happen that the efficient estimator does not exist, that is, we cannot write the data density function, but it may happen that a minimum variance unbiased estimator or MVUE still exists, though with worse performance that -if it exists- the efficient one. Whenever that is true, the method known as Rao-Blackwell-Lehmann-Scheffe procedure can be built that kind of estimator. That procedure is based on the concept of sufficient statistics. A data function T(x) is referred to as a sufficient statistics whenever the data density function conditioned to the sufficient statistics is not a function of the parameter to estimate. It can be seen this way that the sufficient statistics captures all data dependencies with respect to the parameter to estimate thus the knowledge of that data function alone (and then forgetting about what value each individual datum from the observed sample have taken) is sufficient information for the correct estimation of the parameter. What is said in this paragraph can be expressed as follows: T(x) is the sufficient statistics to estimate the # parameter if we can factorize the density function in the following way:

Best Linear Unbiased Estimator

The two preceding approaches are very illustrative and define the optima to take into account as reference; nevertheless, in most of the practical cases the complexity joined to such estimators, mainly the second, can be such too much great that it can be analytically untractable. In these cases, suboptimal schemes but with analytical tractability can be used instead. Such schemes are the linear approaches, that is, we constrain the estimator to be a linear function of the data.

The question here is what linear function is the optimum one. The answer seems to be clear: such function which makes the estimator the best unbiased linear estimator or BLUE, that is, such estimator which besides of being a linear function of the data and unbiased, has minimum variance.

Hence, the estimator will have the form

For this estimator to be unbiased, we have to suppose that the expected value of the data is a linear function of the unknown parameter
Then, it can be proven that As it can be observed, the linear estimator is function only of the first and second order data statistics, with independency on the data distribution shape.

Maximum Likelihood Estimator

The maximum likelihood estimator or MLE is probably the parameter estimation method used mostly in practice. This fact is given because its calculus complexity is not as great as for the non-linear estimators explained above, but however the estimator presents some asymptotic optimality properties which confer it a huge practical interest.

The MLE is based on the principle of maximum likelihood. Specifically, if we consider a data sample x (think about a vector with N independent and identically distributed components) and if we know the marginal density of each, we can write the joint density function f(x;) for the N components as the product of the marginalities.

If we consider now that expression as a function of the parameter (so with the data sample vector x being constant), this function is referred to as likelihood function. Basically, that function means the probability that a random variable takes values around an infinitesimal volume centered about the point x, but as a function of the parameter . Hence, it may seem to be reasonable thinking about, once the data have been observed, the adequate value of the unknown parameter should be such one which makes more probable to have observed the x sample. Thus, the MLE is defined by
The MLE enjoys some properties which make it very attractive in practice. Particularly:
  • If the efficient estimator exists, the MLE will produce it. Effectively, if the efficient estimator exists, hence we can write the derivative of the logarithm of the density function (likelihood). So, as maximizing a function is equivalent to maximize the logarithm of that function, we can infer that the estimator under search is .
  • Likewise, under very low constrained conditions, the MLE is asymptotically distributed (as the data sample size N goes up) with Gaussian behavior, in particular

    This gives rise, whenever the efficient estimator does not exist for finite samples sizes, to that the MLE to be asymptotically efficient. In addition, this property makes possible to calculate, assuming some kind of Gaussianity, confidence ranges where the true value of the parameter is confined with a given probability.
  • Finally, the MLE satisfies also the invariance property. If to estimate is wished, hence it can be proven that is satisfied. This property can noticeably simplify the calculations wherever the likelihood function of the transformed parameter is difficult to obtain

Least Squares Estimation

The least squares estimation LSE is used whenever the probabilistic information about the data is not given. So this is an entirely deterministic approach, in general without any optimality property.

The LSE approach supposes that the observations follow the next formulation

that is, a signal which is function of the parameters to be estimated over which a perturbation is overlaid. The latter could be understood as such a component of the observation which cannot be explained by the signal generating model. The objective is, from a consistent sample for the N observations, to calculate the parameter vector that makes the signal model to explain, as better as it can, the collected observations, that is, such a signal model that minimizes the term which is not explained by it. So the estimator will be

This kind of estimation presents an operative difficulty which depends hardly on the signal generating model. In particular, if the parameters are a linear function of the data, the LSE has a closed-form solution. On the contrary, the optimization problem stated in last equation is non linear and, in each case, the way to obtain the solution have to be found out. Let refine the linear model: if we reorganize the values s[n;] as a column vector with N components, and the parameter vector # is formed by d parameters, we can write
S=H
with H a matrix with dimensions N × d. In order to the problem can be solved we suppose that N > d, and likewise that the H matrix is full rank, that is, with rank d. Under such assumptions, is easy to see that

solution that would be equal to the BLUE, in the case the used data for the latter were uncorrelated. However, as it can be seen, in LSE case we have made no assumption to that respect. Finally and with the objective to emphasize more similarities, realize that the equation could be written in matrix form as

If, by any means, it would be interesting to give more importance to some errors than to the others we could make use of a positive definite matrix W in the previous expression to be minimized
which would give rise to the solution

which would increase the similarity with the corresponding expression for the BLUE. Nonetheless, and once more, we arrive to formal similarities employing wholly different methodologies. The LSE presents interesting geometrical properties and, by means of those, a recursive formulation could be developed which will progressively increase the estimator accurateness as more data are available to build it.
 
 

Submit your homework or assignment.