![]() |
Minimum Variance Unbiased Estimator | ![]() |
For a particular problem, it may happen that the efficient estimator does not exist, that is, we cannot write the data density function, but it may happen that a minimum variance unbiased estimator or MVUE still exists, though with worse performance that -if it exists- the efficient one. Whenever that is true, the method known as Rao-Blackwell-Lehmann-Scheffe procedure can be built that kind of estimator. That procedure is based on the concept of sufficient statistics. A data function T(x) is referred to as a sufficient statistics whenever the data density function conditioned to the sufficient statistics is not a function of the parameter to estimate. It can be seen this way that the sufficient statistics captures all data dependencies with respect to the parameter to estimate thus the knowledge of that data function alone (and then forgetting about what value each individual datum from the observed sample have taken) is sufficient information for the correct estimation of the parameter. What is said in this paragraph can be expressed as follows: T(x) is the sufficient statistics to estimate the # parameter if we can factorize the density function in the following way:
Best Linear Unbiased EstimatorThe question here is what linear function is the optimum one. The answer seems to be clear: such function which makes the estimator the best unbiased linear estimator or BLUE, that is, such estimator which besides of being a linear function of the data and unbiased, has minimum variance. Hence, the estimator will have the form ![]() For this estimator to be unbiased, we have to suppose that the expected value of the data is a linear function of the unknown parameter Then, it can be proven that
As it can be observed, the linear estimator is function only of the first and second order data statistics, with independency on the data distribution shape.
Maximum Likelihood EstimatorThe MLE is based on the principle of maximum likelihood. Specifically, if we consider a data sample x (think about a vector with N independent and identically distributed components) and if we know the marginal density of each, we can write the joint density function f(x; ) for the N components as the product of the marginalities.If we consider now that expression as a function of the parameter (so with the data sample vector x being constant), this function is referred to as likelihood function. Basically, that function means the probability that a random variable takes values around an infinitesimal volume centered about the point x, but as a function of the parameter . Hence, it may seem to be reasonable thinking about, once the data have been observed, the adequate value of the unknown parameter should be such one which makes more probable to have observed the x sample. Thus, the MLE is defined by
The MLE enjoys some properties which make it very attractive in practice. Particularly:
Least Squares EstimationThe LSE approach supposes that the observations follow the next formulation ![]() that is, a signal which is function of the parameters to be estimated over which a perturbation is overlaid. The latter could be understood as such a component of the observation which cannot be explained by the signal generating model. The objective is, from a consistent sample for the N observations, to calculate the parameter vector that makes the signal model to explain, as better as it can, the collected observations, that is, such a signal model that minimizes the term which is not explained by it. So the estimator will be![]() This kind of estimation presents an operative difficulty which depends hardly on the signal generating model. In particular, if the parameters are a linear function of the data, the LSE has a closed-form solution. On the contrary, the optimization problem stated in last equation is non linear and, in each case, the way to obtain the solution have to be found out. Let refine the linear model: if we reorganize the values s[n; ] as a column vector with N components, and the parameter vector # is formed by d parameters, we can write S=H ![]() with H a matrix with dimensions N × d. In order to the problem can be solved we suppose that N > d, and likewise that the H matrix is full rank, that is, with rank d. Under such assumptions, is easy to see that ![]() solution that would be equal to the BLUE, in the case the used data for the latter were uncorrelated. However, as it can be seen, in LSE case we have made no assumption to that respect. Finally and with the objective to emphasize more similarities, realize that the equation could be written in matrix form as If, by any means, it would be interesting to give more importance to some errors than to the others we could make use of a positive definite matrix W in the previous expression to be minimized ![]() which would give rise to the solution ![]() which would increase the similarity with the corresponding expression for the BLUE. Nonetheless, and once more, we arrive to formal similarities employing wholly different methodologies. The LSE presents interesting geometrical properties and, by means of those, a recursive formulation could be developed which will progressively increase the estimator accurateness as more data are available to build it. |
||
Statistics