web.archive.org

delta method: Information from Answers.com

  • ️Wed Jul 01 2015

The delta method is a method for deriving an approximate probability distribution for a function of a statistical estimator from knowledge of the limiting distribution of that estimator. In many cases the limiting distribution of the initial estimator is a Normal distribution with mean zero, therefore it is sufficient to obtain the variance of the function of this estimator. More broadly, the Delta Method may be considered a fairly general Central Limit Theorem.

Univariate Delta Method

While the Delta Method generalizes easilly to a multivariate setting, careful motivation of the technique is more easilly demonstrated in univariate terms. Roughly, for some sequence of random variables Xn satisfying

{\sqrt{n}[X_n-\theta] \rightarrow_d N(0,\sigma^2)},

where θ and σ2 are finite valued constants and d denotes convergence in distribution, it is the case that

{\sqrt{n}[g(X_n)-g(\theta)] \rightarrow_d N(0,\sigma^2[g'(\theta)]^2)}

for some function g satisfying the property that g'(θ) exists and is non-zero valued. (The final restriction is really only needed for purposes of clarity in argument and application. Should the first derivative evaluate to zero at θ, then the Delta Method may be extended via use of a second or higher order Taylor Series expansion.)

Proof in the Univariate Case

Demonstration of this result is fairly straightforward. To begin, we construct a first order Taylor Series expansion of g(Xn) around θ:

\,\! g(X_n)=g(\theta)+g'(\theta)(X_n-\theta)+\tau

Note that by application of Slutsky's Theorem, Xn→θ in probability

{\sqrt{n}[X_n-\theta] \rightarrow_d N(0,\sigma^2)} \Rightarrow X_n - \theta \rightarrow_p 0.

By construction, this implies that the remainder term τ converges to 0 in probability.

Xnpθ⇒τ→p0.

Rearranging the non-remainder terms and multiplying by \sqrt{n} now gives

\sqrt{n}[g(X_n)-g(\theta)]=g'(\theta)\sqrt{n}[X_n-\theta].

Since

{\sqrt{n}[X_n-\theta] \rightarrow_d N(0,\sigma^2)}

by assumption, it follows immediately from appeal to Slutsky's Theorem that

{\sqrt{n}[g(X_n)-g(\theta)] \rightarrow_d N(0,\sigma^2[g'(\theta)]^2)}.

This concludes the proof.

Motivation of Multivariate Delta Method

By definition, a consistent estimator converges in probability to its true value:

\sqrt{n}\left(B-\beta\right) \rightarrow N\left(0, \operatorname{Var}(B) \right)

where n is the number of observations. Suppose we want to estimate the variance of a function h of the estimator B. Keeping only the first two terms of the Taylor series, and using vector notation for the gradient, we can estimate h(B) as

h(B) \approx h(\beta) + \nabla h(\beta)^T \cdot (B-\beta)

which implies the variance of h(B) is approximately

Failed to parse (unknown function\begin): \begin{align} \operatorname{Var}\left(h(B)\right) & \approx \operatorname{Var}\left(h(\beta) + \nabla h(\beta)^T \cdot (B-\beta)\right) \\ & = \operatorname{Var}\left(h(\beta) + \nabla h(\beta)^T \cdot B - \nabla h(\beta)^T \cdot \beta\right) \\ & = \operatorname{Var}\left(\nabla h(\beta)^T \cdot B\right) \\ & = \nabla h(\beta)^T \cdot \operatorname{Var}(B) \cdot \nabla h(\beta) \end{align}


The delta method therefore implies that

\sqrt{n}\left(h(B)-h(\beta)\right) \rightarrow N\left(0, \nabla h(\beta)^T \cdot \operatorname{Var}(B) \cdot \nabla h(\beta) \right)

or in univariate terms,

\sqrt{n}\left(h(B)-h(\beta)\right) \rightarrow N\left(0, \operatorname{Var}(B) \cdot \left(h^\prime(\beta)\right)^2 \right).

Note

The delta method is nearly identical to the formulae presented in Klein (1953, p. 258):

\operatorname{Var} \left( h_r \right) = \sum_i    \left( \frac{ \partial h_r }{ \partial B_i } \right)^2   \operatorname{Var}\left( B_i \right) + \sum_i \sum_{j \neq i}    \left( \frac{ \partial h_r }{ \partial B_i } \right)   \left( \frac{ \partial h_r }{ \partial B_j } \right)   \operatorname{Cov}\left( B_i, B_j \right)
\operatorname{Cov}\left( h_r, h_s \right) = \sum_i    \left( \frac{ \partial h_r }{ \partial B_i } \right)   \left( \frac{ \partial h_s }{ \partial B_i } \right)   \operatorname{Var}\left( B_i \right) + \sum_i \sum_{j \neq i}    \left( \frac{ \partial h_r }{ \partial B_i } \right)   \left( \frac{ \partial h_s }{ \partial B_j } \right)   \operatorname{Cov}\left( B_i, B_j \right)

where hr is the rth element of h(B) and Bi is the ith element of B. The only difference is that Klein stated these as identities, whereas they are actually approximations.

References

This entry is from Wikipedia, the leading user-contributed encyclopedia. It may not have been reviewed by professional editors (see full disclaimer)