web.archive.org

confounding variable: Information and Much More from Answers.com

  • ️Wed Jul 01 2015

Emblem-important.svg

This article or section is in need of attention from an expert on the subject.

WikiProject Probability may be able to help recruit one.
If a more appropriate WikiProject or portal exists, please adjust this template accordingly.

A confounding variable (also confounding factor, lurking variable, a confound, or confounder) is an extraneous variable in a statistical or research model that should have been experimentally controlled, but was not. Failing to take a confounding variable into account can lead to a false conclusion that the dependent variables are in a causal relationship with the independent variable. Such a relation between two observed variables is termed a spurious relationship.

For example, assume that a child's weight and a country's gross domestic product (GDP) rise with time. A person carrying out an experiment could measure weight and GDP, and conclude that a higher GDP causes children to gain weight. However, the confounding variable, time, was not accounted for, and is the real cause of both rises.

By definition, a confounding variable is associated with both the probable cause and the outcome. The confounder is not allowed to lie in the causal pathway between the cause and the outcome: If A is thought to be the cause of disease C, the confounding variable B may not be solely caused by behaviour A; and behaviour B shall not always lead to behaviour C. An example: Being female does not always lead to smoking tobacco, and smoking tobacco does not always lead to cancer. Therefore, in any study that tries to elucidate the relation between being female and cancer should take smoking into account as a possible confounder. In addition, a confounder is always a risk factor that has a different prevalence in two risk groups (e.g. females/males). (Hennekens, Buring & Mayrent, 1987).

In statistical experimental design, attempts are made to remove lurking variables such as the placebo effect from the experiment. Because it can never be certain that observational data are not hiding a confounding variable, it is never safe to conclude that a regression model demonstrates a causal relationship with 100% certainty, no matter how strong the association.

Though criteria for causality in statistical studies has been researched intensely, Pearl has shown that confounding variables cannot be defined in terms of statistical notions alone; some causal assumptions are necessary.[1] In a 1965 paper, Austin Bradford Hill proposed a set of causal criteria.[2]. Many working epidemiologists take these as a good place to start when considering confounding and causation. However, these are of heuristic value at best. When causal assumptions are articulated in the form of causal graph, a simple criterion is available, called backdoor, to identify sets of confounding variables.

Anecdotal evidence does not take account of confounding variables.

How to remove confounding in a study setup

There are several ways to exclude or to control confounding variables from a study. "Epidemiology in Medicine" by Hennekens/Buring/Mayrent (1987) gives an oversight into this topic.

  • Case-control studies assign confounders to both groups, cases and controls, equally. For example if somebody wanted to study the cause of myocardial infarct and thinks that the age is a probable confounding variable, each 67 years old infarct patient will be matched with a healthy 67 year old "control" person. In case-control studies, matched variables most often are the age and sex.
  • Cohort studies: A degree of matching is also possible and it is often done by only admitting certain age groups or a certain sex into the study population, and thus all cohorts are comparable in regard to the possible confounding variable. For example, if age and sex are thought to be a confounders, only 40 to 50 years old males would be involved in a cohort study that would assess the myocardial infarct risk in cohorts that either are physically active or inactive.
  • Stratification: As in the example above, physical activity is thought to be a behaviour that protects from myocardial infarct; and age is assumed to be a possible confounder. The data sampled is then stratified by age group – this means, the association between activity and infarct would be analyzed per each age group. If the different age groups (or age strata) yield much different risk ratios from the crude risk ratio, age must be viewed as a confounding variable. There are statistical tools like Mantel-Haenszel methods that deal with stratified data.

All these methods have their drawbacks. This can be clearly seen in this example: A 45 years old Afro-American from Alaska, avid football player and vegetarian, working in education, suffers from a disease and is enrolled into a case-control study. Proper matching would call for a person with the same characteristics, with the sole difference of being healthy – but finding such one would be an enormous task. Additionally, there is always the risk of over- and undermatching of the study population. In cohort studies, too many people can be excluded; and in stratification, single strata can get too thin and thus contain only a small, non-significant number of samples.

  • Multivariate analysis is also possible. Multinomial and logistic regression models exist; and the latter are especially suited if there are binary variables like "Vaccinated against polio: Yes/No". Yet, a drawback of these regression models is that they give little information about the strength of the confounding variable as opposed to stratification methods.

External links

These sites contain descriptions or examples of lurking variables:

References

  1. ^ Pearl, Judea (2000). Causality: Models, Reasoning, and Inference. Cambridge University Press. ISBN 0-521-77362-8. 
  2. ^ Hill, Austin Bradford (1965). "The environment or disease: association or causation?". Proc R Soc Med 58 (May): 295-300. PMID 14283879. 

See also

Statistics
Descriptive statistics Mean (Arithmetic, Geometric) - Median - Mode - Power - Variance - Standard deviation
Inferential statistics Hypothesis testing - Significance - Null hypothesis/Alternate hypothesis - Error - Z-test - Student's t-test - Maximum likelihood - Standard score/Z score - P-value - Analysis of variance
Survival analysis Survival function - Kaplan-Meier - Logrank test - Failure rate - Proportional hazards models
Probability distributions Normal (bell curve) - Poisson - Bernoulli
Correlation Confounding variable - Pearson product-moment correlation coefficient - Rank correlation (Spearman's rank correlation coefficient, Kendall tau rank correlation coefficient)
Regression analysis Linear regression - Nonlinear regression - Logistic regression

This entry is from Wikipedia, the leading user-contributed encyclopedia. It may not have been reviewed by professional editors (see full disclaimer)