web.archive.org

false discovery rate: Information and Much More from Answers.com

  • ️Wed Jul 01 2015


False discovery rate (FDR) control is a statistical method used in multiple hypothesis testing to correct for multiple comparisons. It controls the expected proportion of incorrectly rejected null hypotheses (type I errors) in a list of rejected hypotheses.[1] It is a less conservative comparison procedure with greater power than familywise error rate[2] (FWER) control, at a cost of increasing the likelihood of obtaining type I errors.

The q value is defined to be the FDR analogue of the p-value. The q-value of an individual hypothesis test is the minimum FDR at which the test may be called significant. One approach is to directly estimating q-values rather than fixing a level at which to control the FDR.

Classification of m hypothesis tests

The following table defines some random variables related to the m hypothesis tests.

# declared non-significant # declared significant Total
# true null hypotheses U V m0
# non-true null hypotheses T S m - m0
Total m - R R m

The false discovery rate is given by \mathrm{E}\!\left [\frac{V}{V+S}\right ] = \mathrm{E}\!\left [\frac{V}{R}\right ] and one wants to keep this value below a threshold α.

Controlling procedures

Independent tests

The Simes procedure ensures that its expected value \mathrm{E}\!\left[ \frac{V}{V + S} \right]\, is less than a given α (Benjamini and Hochberg 1995). This procedure is valid when the m tests are independent. Let H1...Hm be the null hypotheses and P1...Pm their corresponding p-values. Order these values in increasing order and denote them by P(1)...P(m). For a given α, find the largest k such that

P_{(k)} \leq \frac{k}{m} \alpha.

Then reject (i.e. declare positive) all H(i) for i = 1,...,k. ...Note, the mean α for these m tests is \frac{\alpha(m+1)}{2m} which could be used as a rough FDR (RFDR) or "α adjusted for m indep. tests."

Dependent tests

The Benjamini and Yekutieli procedure controls the false discovery rate under dependence assumptions. This refinement modifies the threshold and finds the largest k such that:

P_{(k)} \leq \frac{k}{m \cdot c(m)} \alpha
  • If the tests are independent: c(m) = 1 (same as above)
  • If the tests are positively correlated: c(m) = 1
  • If the tests are negatively correlated: c(m) = \sum _{i=1} ^m \frac{1}{i}

In the case of negative correlation, c(m) can be approximated by using the Euler-Mascheroni constant

\sum _{i=1} ^m \frac{1}{i} \approx \ln(m) + \gamma.

Using RFDR above, an approximate FDR (AFDR) is the min(mean α) for m dependent tests = RFDR / ( ln(m)+ 0.57721...).

References

This entry is from Wikipedia, the leading user-contributed encyclopedia. It may not have been reviewed by professional editors (see full disclaimer)