web.archive.org

statistics: Definition and Much More from Answers.com

  • ️Wed Jul 01 2015

Nearly every day statistics are used to support assertions about health and what people can do to improve their health. The press frequently quotes scientific articles assessing the roles of diet, exercise, the environment, and access to medical care in maintaining and improving health. Because the effects are often small, and vary greatly from person to person, an understanding of statistics and how it allows researchers to draw conclusions from data is essential for every person interested in public health. Statistics is also of paramount importance in determining which claims regarding factors affecting our health are not valid, not supported by the data, or are based on faulty experimental design and observation.

When an assertion is made such as "electromagnetic fields are dangerous," or "smoking causes lung cancer," statistics plays a central role in determining the validity of such statements. Methods developed by statisticians are used to plan population surveys and to optimally design experiments aimed at collecting data that allows valid conclusions to be drawn, and thus either confirm or refute the assertions. Biostatisticians also develop the analytical tools necessary to derive the most appropriate conclusions based on the collected data.

Role of Biostatistics in Public Health

In the Institute of Medicine's report The Future of Public Health, the mission of public health is defined as assuring conditions in which people can be healthy. To achieve this mission three functions must be undertaken: (1) assessment, to identify problems related to the health of populations and determine their extent; (2) policy development, to prioritize the identified problems, determine possible interventions and/or preventive measures, set regulations in an effort to achieve change, and predict the effect of those changes on the population; and (3) assurance, to make certain that necessary services are provided to reach the desired goal—as determined by policy measures—and to monitor how well the regulators and other sectors of the society are complying with policy.

An additional theme that cuts across all of the above functions is evaluation, that is, how well are the functions described above being performed.

Biostatistics plays a key role in each of these functions. In assessment, the value of biostatistics lies in deciding what information to gather to identify health problems, in finding patterns in collected data, and in summarizing and presenting these in an effort to best describe the target population. In so doing, it may be necessary to design general surveys of the population and its needs, to plan experiments to supplement these surveys, and to assist scientists in estimating the extent of health problems and associated risk factors. Biostatisticians are adept at developing the necessary mathematical tools to measure the problems, to ascertain associations of risk factors with disease, and creating models to predict the effects of policy changes. They create the mathematical tools necessary to prioritize problems and to estimate costs, including undesirable side effects of preventive and curative measures.

In assurance and policy development, biostatisticians use sampling and estimation methods to study the factors related to compliance and outcome. Questions that can be addressed include whether an improvement is due to compliance or to something else, how best to measure compliance, and how to increase the compliance level in the target population. In analyzing survey data, biostatisticians take into account possible inaccuracies in responses and measurements, both intentional and unintentional. This effort includes how to design survey instruments in a way that checks for inaccuracies, and the development of techniques that correct for nonresponse or for missing observations. Finally, biostatisticians are directly involved in the evaluation of the effects of interventions and whether to attribute beneficial changes to policy.

Understanding Variation in Data

Nearly all observations in the health field show considerable variation from person to person, making it difficult to identify the effects of a given factor or intervention on a person's health. Most people have heard of someone who smoked every day of his or her life and lived to be ninety, or of the death at age thirty of someone who never smoked. The key to sorting out seeming contradictions such as these is to study properly chosen groups of people (samples), and to look for the aggregate effect of something on one group as compared to another. Identifying a relationship, for example, between lung cancer and smoking, does not mean that everyone who smokes will get lung cancer, nor that if one refrains from smoking one will not die from lung cancer. It does mean, however, that the group of people who smoke are more likely than those who do not smoke to die from lung cancer.

How can we make statements about groups of people, but be unable to claim with any certainty that these statements apply to any given individual in the group? Statisticians do this through the use of models for the measurements, based on ideas of probability. For example, it can be said that the probability that an adult American male will die from lung cancer during one year is 9 in 100,000 for a nonsmoker, but is 190 in 100,000 for a smoker. Dying from lung cancer during a year is called an "event," and "probability" is the science that describes the occurrence of such events. For a large group of people, quite accurate statements can be made about the occurrence of events, even though for specific individuals the occurrence is uncertain and unpredictable. A simple but useful model for the occurrence of an event can be made based on two important assumptions: (1) for a group of individuals, the probability that an event occurs is the same for all members of the group; and (2) whether or not a given person experiences the event does not affect whether others do. These assumptions are known as (1) common distribution for events, and (2) independence of events. This simple model can apply to all sorts of public health issues. Its wide applicability lies in the freedom it affords researchers in defining events and population groups to suit the situation being studied.

Consider the example of brain injury and helmet use among bicycle riders. Here groups can be defined by helmet use (yes/no), and the "event" is severe head injury resulting from a bicycle accident. Of course, more comprehensive models can be used, but the simple ones described here are the basis for much public health research. Table 1 presents hypothetical data about bicycle accidents and helmet use in thirty cases.

It can be seen that 20 percent (2 out of 10) of those not wearing a helmet sustained severe head injury, compared to only 5 percent (1 out of 20) among those wearing a helmet, for a relative risk of four to one. Is this convincing evidence? An application of probability tells us that it is not, and the reason is that with such a small number of cases, this difference in rates is just not that unusual. To better understand this concept, the meaning of probability and what conclusions can drawn after setting up a model for the data must be described.

Probability is the branch of mathematics that uses models to describe uncertainty in the occurrence of events. Suppose that the chance of severe head injury following a bicycle accident is one in ten. This risk can be simulated using a spinning disk with the numbers "1" through "10" equally spaced around its edge, with a pointer in the center to be spun. Since a spinner has no memory, spins will be independent. A spin will indicate severe head injury if a "1" shows up, and no severe head injury for "2" through "10." The pointer could be spun ten times to see what could happen among ten people not wearing a helmet. The theory of probability uses the binomial distribution to tell you exactly what could happen with ten spins, and how likely each outcome is. For example, the probability that we would not see a "1" in ten spins is .349, the probability that we will see

Table 1

Wearing helmetNot wearing helmet
SOURCE: Courtesy of author.
Severe head injury12
Not severe head injury198

exactly one "1" in ten spins is .387, exactly two is.194, exactly three is .057, exactly four is .011, exactly five is .001, with negligible probability for six or more. So if this is a good model for head injury, the probability of two or more people experiencing severe head injury in ten accidents is .264.

A common procedure in statistical analysis is to hypothesize that no difference exists between two groups (called the "null" hypothesis) and then to use the theory of probability to determine how tenable such an hypothesis is. In the bicycle accident example, the null hypothesis states that the risk of injury is the same for both groups. Probability calculations then tell how likely it is under null hypothesis to observe a risk ratio of four or more in samples of twenty people wearing helmets and ten people not wearing helmets. With a common risk of injury equal to one in ten for both groups, and with these sample sizes, the surprising answer is that one will observe a risk ratio greater than four quite often, about 16 percent of the time, which is far too large to give us confidence in asserting that wearing helmets prevents head injury.

This is the essence of statistical hypothesis testing. One assumes that there is no difference in the occurrences of events in our comparison groups, and then calculates the probabilities of various outcomes. If one then observes something that has a low probability of happening given the assumption of no differences between groups, then one rejects the hypothesis and concludes that there is a difference. To thoroughly test whether helmet use does reduce the risk of head injury, it is necessary to observe a larger sample—large enough so that any observed differences between groups cannot be simply attributed to chance.

Sources of Data

Data used for public health studies come from observational studies (as in the helmet, use example above), from planned experiments, and from carefully designed surveys of population groups. An example of a planned experiment is the use of a clinical trial to evaluate a new treatment for cancer. In these experiments, patients are randomly assigned to one of two groups—treatment or placebo (a mock treatment)—and then followed to ascertain whether the treatment affects clinical outcome. An example of a survey is the National Health and Nutrition Examination Survey (NHANES) conducted by the National Center for Health Statistics. NHANES consists of interviews of a carefully chosen subset of the population to determine their health status, but chosen so that the conclusions apply to the entire U.S. population.

Both planned experiments and surveys of populations can give very good data and conclusions, partly because the assumptions necessary for the underlying probability calculations are more likely to be true than for observational studies. Nonetheless, much of our knowledge about public health issues comes from observational studies, and as long as care is taken in the choice of subjects and in the analysis of the data, the conclusions can be valid.

The biggest problem arising from observational studies is inferring a cause-and-effect relationship between the variables studied. The original studies relating lung cancer to smoking showed a striking difference in smoking rates between lung cancer patients and other patients in the hospitals studied, but they did not prove that smoking was the cause of lung cancer. Indeed, some of the original arguments put forth by the tobacco companies followed this logic, stating that a significant association between factors does not by itself prove a causal relationship. Although statistical inference can point out interesting associations that could have a significant influence on public health policy and decision making, these statistical conclusions require further study to substantiate a cause-and-effect relationship, as has been done convincingly in the case of smoking.

A tremendous amount of recent data is readily available through the Internet. These include already tabulated observations and reports, as well as access to the raw data. Some of this data cannot be accessed through the Internet because of confidentiality requirements, but even in this case it is sometimes possible to get permission to analyze the data at a secure site under the supervision of employees of the agency. Described here are some of the key places to go for data on health. Full Internet addresses for the sites below can be found in the bibliography.

A comprehensive source can be found at Fedstats, which provides a gateway to over one hundred federal government agencies that compile publicly available data. The links here are rather comprehensive and include many not directly related to health. The key government agency providing statistics and data on the extent of the health, illness, and disability of the U.S. population is the National Center for Health Statistics (NCHS), which is one of the centers of the Centers for Disease Control and Prevention (CDC). The CDC provides data on morbidity, infectious and chronic diseases, occupational diseases and injuries, vaccine efficacy, and safety studies. All the centers of the CDC maintain online lists of their thousands of publications related to health, many of which are now available in electronic form. Other major governmental sources for health data are the National Cancer Institute (which is part of the National Institutes of Health), the U.S. Bureau of the Census, and the Bureau of Labor Statistics. The Agency for Healthcare Research and Quality is an excellent source for data relating to the quality, access, and medical effectiveness of health care in the United States. The National Highway and Traffic Safety Administration, in addition to publishing research reports on highway safety, provides data on traffic fatalities in the Fatality Analysis Reporting System, which can be queried to provide data on traffic fatalities in the United States.

A number of nongovernmental agencies share data or provide links to online data related to health. The American Public Health Association provides links to dozens of databases and research summaries. The American Cancer Society provides many links to data sources related to cancer. The Research Forum on Children, Families, and the New Federalism lists links to numerous studies and data on children's health, including the National Survey of America's Families.

Table 2

Wearing helmetNot wearing helmet
SOURCE: Courtesy of author.
Severe head injury510
Not severe head injury9541

Much of public health is concerned with international health, and the World Health Organization (WHO) makes available a large volume of data on international health issues, as well as provides links to its publications. The Center for International Earth Science Information Network provides data on world population, and its goals are to support scientists engaged in international research.

The Internet provides an opportunity for health research unequaled in the history of public health. The accessibility, quality, and quantity of data are increasing so rapidly that anyone with an understanding of statistical methodology will soon be able to access the data necessary to answer questions relating to health.

Analysis of Tabulated Data

One of the most commonly used statistical techniques in public health is the analysis of tabled data, which is generally referred to as "contingency table analysis." In these tables, observed proportions of adverse events are compared in the columns of the table by a method known as a "chisquare test." Such data can naturally arise from any of the three data collection schemes mentioned. In our helmet, for example, observational data from bicycle accidents were used to create a table with helmet use defining the columns, and head injury, the rows. In a clinical trial, the columns are defined by the treatment/placebo groups, and the rows by outcome (e.g., disease remission or not). In a population survey, columns can be different populations surveyed, and rows indicators of health status (e.g., availability of health insurance). The chi-square test assumes that there is no difference between groups, and calculates a statistic based on what would be expected if no

Table 3

Wearing helmetNot wearing helmetRow totals
SOURCE: Courtesy of author.
Severe head injury9.9345.06615
Not severe head injury90.06645.933136
Column totals10051151

difference truly existed, and on what is actually observed. The calculation of the test statistic and the conclusions proceed as follows:

  1. The expected frequency (E) is calculated (assuming no difference) for each cell in the table by first adding to get the totals for each row, the totals for each column, and the grand total (equal to the total sample size); then for each cell we find the expected count: E = (rowtotal) (columntotal)/(grandtotal).
  2. For each cell in the table, let O be the observed count, and calculate: (O−E)2/E.
  3. The values from step 2 are summed over all cells in the table. This is the test statistic, X.
  4. If X exceeds 3.96 for a table with four cells, then the contingency table is said to be statistically "significant." If the sample and the resulting analysis is repeated a large number of times, this significant result will happen only 5 percent of the time when there truly is no difference between groups, and hence this is called a "5 percent significance test."
  5. For tables with more than two rows and/or columns different comparisons values are needed. For a table with six cells, we check to see if X is larger than5.99, with eight cells, we compare X to7.81, and for nine or ten cells we compare X to 9.49.
  6. This method has problems if there are too many rows and columns and not enough observations. In such cases, the table should be reconfigured to have a smaller number of cells. If any of the E values in a table fall below five, some of the rows and/or columns should be combined to make all values of E five or more.

Table 4

Wearing helmetNot wearing helmet
SOURCE: Courtesy of author.
Severe head injury2.4504.805
Not severe head injury0.2700.530

Suppose a larger data set is used for our helmet use/head injury example (see Table 2).

It is worthwhile noting that the proportions showing head injury for these data are almost the same as before, but that the sample size is now considerably larger, 151. Following the steps outlined above, the chi-square statistic can be calculated and conclusions can be drawn:

  1. The total for the first column is 100, for the second column 51, for the first row 15, and for the second row 136. The grand total is 151, the total sample size. The expected frequencies, that is the "Es," assuming no difference in injury rates between the helmet group and the no helmet group, are given in Table 3. It is quite possible for the Es to be non-integer, and if so, we keep the decimal part in all our calculations.
  2. The values for (O−E)2/E are presented in Table 4.
  3. X=8.055
  4. Because this exceeds 3.84, we have a significant association between the helmet use and head injury, at the 5 percent level.

The chi-square procedure presented here is one of the most important analytic techniques used in public health research. Its simplicity allows it to be widely used and understood by nearly all professionals in the field, as well as by interested third parties. Much of what we know about what

Table 5

Low blood-lead levelHigh blood-lead level
SOURCE: Courtesy of author.
Low soil-lead level6337
High soil-lead level3763

makes us healthy or what endangers our lives has been shown through the use of contingency tables.

Studying Relationships Among Variables

A major contribution to our knowledge of public health comes from understanding trends in disease rates and examining relationships among different predictors of health. Biostatisticians accomplish these analyses through the fitting of mathematical models to data. The models can vary from a simple straight-line fit to a scatter plot of XY observations, all the way to models with a variety of nonlinear multiple predictors whose effects change over time. Before beginning the task of model fitting, the biostatistician must first be thoroughly familiar with the science behind the measurements, be this biology, medicine, economics, or psychology. This is because the process must begin with an appropriate choice of a model. Major tools used in this process include graphics programs for personal computers, which allow the biostatistician to visually examine complex relationships among multiple measurements on subjects.

The simplest graph is a two-variable scatter plot, using the y-axis to represent a response variable of interest (the outcome measurement), and the x-axis for the predictor, or explanatory, variable. Typically, both the x and y measurements take on a whole range of values—commonly referred to as continuous measurements, or sometimes as quantitative variables. Consider, for example, the serious health problem of high blood-lead levels in children, known to cause serious brain and neurologic damage at levels as low as 10 micrograms per deciliter. Since lead was removed from gasoline, blood-levels of lead in children in the United States have been steadily declining, but there is still a residual risk from environmental pollution. One way to assess this problem is to relate soil-lead levels to blood levels in a survey of children—taking a measurement of the blood-lead concentration on each child, and measuring the soil-lead concentration (in milligrams per kilogram) from a sample of soil near their residences. As is often the case, a plot of the blood levels and soil concentrations shows some curvature, so transformations of the measurements are taken to make the relationship more nearly linear. Choices commonly used for transforming data include taking square roots, logarithms, and sometimes reciprocals of the measurements. For the case of lead, logarithms of both the blood levels and of the soil concentrations produce an approximately linear relationship. Of course, this is not a perfect relationship, so, when plotted, the data will appear as a cloud of points as shown in Figure 1.

This plot, representing two hundred children, was produced by a statistical software program called Stata, using, as input, values from a number of different studies on this subject. On the graph, the software program plotted the fitted straight line to the data, called the regression equation of y on x. The software also prints out the fitted regression equation: y = .29x + .01. How does one interpret this regression? First, it is not appropriate to interpret it for an individuals; it applies to the population from which the sample was taken. It says that a increase of 1 in log (soil-lead) concentration will correspond, on average, to an increase in log (blood-lead) of .29. To predict the average blood-lead level given a value for soil lead, the entire equation is used. For example, a soil-lead level of 1,000 milligrams per kilogram, whose log is three, predicts an average log blood-lead level of.29×3 + .01 = .57, corresponding to a measured blood level of 7.6 micrograms per deciliter. The main point is that, from the public health viewpoint, there is a positive relationship between the level of lead in the soil and blood-lead levels in the population. An alternative interpretation is to state that soil-lead and blood-lead levels are positively correlated.

We note that many calculators, and all statistical software for personal computers, can calculate the best line for a given data set. Commercially available statistical software packages such as Stata, SAS, and SPSS can be purchased in versions for both IBM and Macintosh PCs. One comprehensive package, EpiInfo 2000, may be acquired free from CDC and can be downloaded on the Internet.

As in the case of contingency tables, the significance of the regression can be tested. In this case, as in all of statistics, statistical significance does not refer to the scientific importance of the relationship, but rather to a test of whether or not the observed relationship is the result of random association. Every statistical software package for personal computers includes a test of significance as part of its standard output. These packages, and some hand calculators, along with the line itself, will produce an estimate of the correlation between the two variables, called the correlation coefficient, or "r." If t is larger than 2, or smaller than -2, the regression is declared significant at the 5 percent level. For the data in Figure 1, r is 0.42. This number can very easily be used to test for the statistical significance of the regression through the following formula:

The number "r" calculated this way must lie between -l and +1, and is often interpreted as a measure of how close to a straight line the data lie. Values near –1 indicate a nearly perfect linear relationship, while values near 0 indicate no linear relationship. It is important not to make the mistake of interpreting r near 0 as meaning there is no relationship whatever—a curved relation can lead to low values of r.

The relationship between soil-lead and blood-lead could be studied using the contingency table analysis discussed earlier. For each child, both the soil-lead levels and the blood-lead levels could be classified as high or low, choosing appropriate criteria for the definitions of high and low. This too would have shown a relationship, but it would not be as powerful, nor would it have quantified the relationship between the two measurements as the regression did. Choosing a cutoff value for low and high on each measurement that divides each group into two equal-size subgroups as shown in Table 5.

The chi-square statistic calculated from this table is 13.5, which also indicates an association between blood-lead level and soil-lead levels in children. The conclusion is not as compelling as in the linear regression analysis, and a lot of information in the data has been lost by simplifying them in this way. One benefit, however, of this simpler analysis is that we do not have to take logarithms of the data and worry about the appropriate choice of a model.

Regression is a very powerful tool, and it is used for many different data analyses. It can be used to compare quantitative measurements on two groups by setting x = 1 for each subject in group one, and setting x = 2 for each subject in group two. The resulting analysis is equivalent to the two-sample t-test discussed in every elementary statistics text.

The most common application of regression analysis occurs when an investigator wishes to relate an outcome measurement y to several x variables—multiple linear regression. For example, regression can be used to relate blood lead to soil lead, environmental dust, income, education, and sex. Note that, as in this example, the x variables can be either quantitative, such as soil lead, or qualitative, such as sex, and they can be used together in the same equation. The statistical software will easily fit the regression equation and print out significance tests for each explanatory variable and for the model as a whole. When we have more than one x variable, there is no simple way to perform the calculations (or to represent them) and one must rely on a statistical package to do the work.

Regression methods are easily extended to compare a continuous response measurement across several groups—this is known as analysis of variance, also discussed in every elementary statistics text. It is done by choosing for the x variables indicators for the different groups—so-called dummy variables.

An important special case of multiple linear regression occurs when the outcome measurement y is dichotomous—indicating presence or absence of an attribute. In fact, this technique, called logistic regression, is one of the most commonly used statistical techniques in public health research today, and every statistical software package includes one or more programs to perform the analysis. The predictor x-variables used for logistic regression are almost always a mixture of quantitative and qualitative variables. When only qualitative variables are used, the result is essentially equivalent to a complicated contingency table analysis.

Methods of regression and correlation are essential tools for biostatisticians and public health researchers when studying complex relationships among different quantitative and qualitative measurements related to health. Many of the studies widely quoted in the public health literature have relied on this powerful technique to reach their conclusions.

(SEE ALSO: Bayes' Theorem; Biostatistics; Chi-Square Test; Epidemiology; Probability Model; Rates; Rates: Adjusted; Sampling; Survey Research Methods; T-Test)

Bibliography

Afifi, A. A., and Clark, V. A. (1996). Computer-Aided Multivariate Analysis, 3rd edition. London: Chapman and Hall.

Armitage, P., and Berry, G. (1994). Statistical Methods in Medical Research. Oxford: Blackwell Science.

Doll, R., and Hill, A. B. (1950). "Smoking and Carcinoma of the Lung. Preliminary Report." British Medical Journal 13:739–748.

Dunn, O. J., and Clark, V. A. (2001). Basic Statistics, 3rd edition. New York: John Wiley and Sons.

Glantz, S. A., and Slinker, B. K. (2001). Primer of Applied Regression and Analysis of Variance. New York: McGraw-Hill.

Hosmer, D. W., and Lemeshow, S. (2000). Applied Logistic Regression. New York: John Wiley and Sons.

Institute of Medicine. Committee for the Study of the Future of Public Health (1988). The Future of Public Health. Washington, DC: National Academy Press.

Kheifets, L.; Afifa, A. A.; Buffler, P. A.; and Zhang, Z. W. (1995). "Occupational Electric and Magnetic Field Exposure and Brain Cancer: A Meta-Analysis." Journal of Occupational and Environmental Medicine 37:1327–1341.

Lewin, M. D.; Sarasua, S.; and Jones, P. A. (1999). "A Multivariate Linear Regression Model for Predicting Children's Blood Lead Levels Based on Soil Lead Levels: A Study at Four Superfund Sites." Environmental Research 81(A):52–61.

National Center for Health Statistics (1996). NHANES: National Health and Nutrition Examination Survey. Hyattsville, MD: Author.

Thompson, D. C.; Rivara, F. P.; and Thompson, R. S. (1996). "Effectiveness of Bicycle Safety Helmets in Preventing Head Injuries—A Case-Control Study." Journal of the American Medical Association 276: 1968–1973.

STATISTICAL SOFTWARE

Dean, A. G.; Arner, T. G.; Sangam, S.; Sunki, G. G.; Friedman, R.; Lantinga, M.; Zubieta, J. C.; Sullivan, K. M.; and Smith, D. C. (2000). EpiInfo 2000, A Database and Statistics Program for Public Health Professionals, for Use on Windows 95, 98, NY, and 2000 Computers. Atlanta, GA: Centers for Disease Control and Prevention.

The SAS System for Windows, Version 8.1. Cary, NC: SAS Institute.

SPSS Version 10.1 for Windows. Chicago, IL: SPSS.

Stata Statistical Software: Release 6.0. College Station, TX: Stata Corp.

LINKS FOR DATA SOURCES

Agency for Healthcare Research and Quality: http://www.ahcpr.gov/data/.

American Cancer Society: http://www.cancer.org/.

American Public Health Association: http://www.apha.org/public_health/.

Bureau of Labor Statistics: http://stats.bls.gov/datahome.htm.

Center for International Earth Science Information Network: http://www.ciesin.org/.

Centers for Disease Control and Prevention: http://www.cdc.gov/scientific.htm.

Fedstats: http://www.fedstats.gov.

National Cancer Institute: http://cancernet.nci.nih.gov/statistics.shtml.

National Center for Health Statistics: http://www.cdc.gov/nchs/.

National Highway and Traffic Safety Administration: http://www-fars.nhtsa.dot.gov/.

National Institutes of Health: http://www.nih.gov.

National Survey of America's Families: http://newfederalism.urban.org/nsaf/cpuf/index,htm.

Research Forum on Children Families and the New Federalism: http://www.researchforum.org/.

U.S. Bureau of the Census: http://www.census.gov.

World Health Organization: http://www.who.int/whosis.

— WILLIAM G. CUMBERLAND; ABDELMONEM A. AFIFI



Statistics is the set of mathematical tools and techniques that are used to analyze data. In genetics, statistical tests are crucial for determining if a particular chromosomal region is likely to contain a disease gene, for instance, or for expressing the certainty with which a treatment can be said to be effective.

Statistics is a relatively new science, with most of the important developments occurring with the last 100 years. Motivation for statistics as a formal scientific discipline came from a need to summarize and draw conclusions from experimental data. For example, Sir Ronald Aylmer Fisher, Karl Pearson, and Sir Francis Galton each made significant contributions to early statistics in response to their need to analyze experimental agricultural and biological data. For example, one of Fisher's interests was whether crop yield could be predicted from meteorological readings. This problem was one of several that motivated Fisher to develop some of the early methods of data analysis. Much of modern statistics can be categorized as exploratory data analysis, point estimation, or hypothesis testing.

The goal of exploratory data analysis is to summarize and visualize data and information in a way that facilitates the identification of trends or interesting patterns that are relevant to the question at hand. A fundamental exploratory data-analysis tool is the histogram, which describes the frequency with which various outcomes occur. Histograms summarize the distribution of the outcomes and facilitate the comparison of outcomes from different experiments. Histograms are usually plotted as bar plots, with the range of outcomes plotted on the x-axis and the frequency of the individual outcome represented by a bar on the y-axis. For instance, one might use a histogram to describe the number of people in a population with each of the different genotypes for the ApoEalleles, which influence the risk of Alzheimer's disease.

The range of outcomes from an experiment are also described mathematically by their central tendency and their dispersion. Central tendency is a measure of the center of the distribution. This can be characterized by the mean (the arithmetic average) of the outcomes or by the median, which is the value above and below which the number of outcomes is the same. The mean of 3, 4, and 8 is 5, whereas the median is 4. The median length of response to a gene therapy trial might be 30 days, meaning as many people had less than 30 days' benefit as had more than that. The mean might be considerably more—if one person benefited for 180 days, for instance.

Dispersion is a measure of how spread out the outcomes of the random variable are from their mean. It is characterized by the variance or standard deviation. The spread of the data can often be as important as the central tendency in estimating the value of the results. For instance, suppose the median number of errors in a gene-sequencing procedure was 3 per 10,000 bases sequenced. This error rate might be acceptable if the range that was found in 100 trials was between 0 and 5 errors, but it would be unacceptable if the range was between 0 and 150 errors. The occasional large number of errors makes the data from any particular procedure suspect.

Another important concept in statistics is that of populations and samples. The population represents every possible experimental unit that could be measured. For example, every zebra on the continent of Africa might represent a population. If we were interested in the mean genetic diversity of zebras in Africa, it would be nearly impossible to actually analyze the DNA of every single zebra; neither can we sequence the entire DNA of any individual. Therefore we must take a random selection of some smaller number of zebras and some smaller amount of DNA, and then use the mean differences among these zebras to make inferences about the mean diversity in the entire population.

Any summary measure of the data, such as the mean of variance in a subset of the population, is called a sample statistic. The summary measure of the entire group is called a population parameter. Therefore, we use statistics to estimate parameters. Much of statistics is concerned with the accuracy of parameter estimates. This is the statistical science of point estimation.

The final major discipline of statistics is hypothesis testing. All scientific investigations begin with a motivating question. For example, do identical twins have a higher likelihood than fraternal twins of both developing alcoholism ?

From the question, two types of hypotheses are derived. The first is called the null hypothesis. This is generally a theory about the value of one or more population parameters and is the status quo, or what is commonly believed or accepted. In the case of the twins, the null hypothesis might be that the rates of concordance (i.e., both twins are or are not alcoholic) are the same for identical and fraternal twins. The alternate hypothesis is generally what you are trying to show. This might be that identical twins have a higher concordance rate for alcoholism, supporting a genetic basis for this disorder. It is important to note that statistics cannot prove one or the other hypothesis. Rather, statistics provides evidence from the data that supports one hypothesis or the other.

Much of hypothesis testing is concerned with making decisions about the null and alternate hypotheses. You collect the data, estimate the parameter, calculate a test statistic that summarizes the value of the parameter estimate, and then decide whether the value of the test statistic would be expected if the null hypothesis were true or the alternate hypothesis were true. In our case, we collect data on alcoholism in a limited number of twins (which we hope accurately represent the entire twin population) and decide whether the results we obtain better match the null hypothesis (no difference in rates) or the alternate hypothesis (higher rate in identical twins).

Of course, there is always a chance that you have made the wrong decision—that you have interpreted your data incorrectly. In statistics, there are two types of errors that can be made. A type I error is when the conclusion was made in favor of the alternate hypothesis, when the null hypothesis was really true. A type II error refers to the converse situation, where the conclusion was made in favor of the null hypothesis when the alternate hypothesis was really true. Thus a type I error is when you see something that is not there, and a type II error is when you do not see something that is really there. In general, type I errors are thought to be worse than type II errors, since you do not want to spend time and resources following up on a finding that is not true.

How can we decide if we have made the right choice about accepting or rejecting our null hypothesis? These statistical decisions are often made by calculating a probability value, or p-value. P-values for many test statistics are easily calculated using a computer, thanks to the theoretical work of mathematical statisticians such as Jerzy Neyman.

A p-value is simply the probability of observing a test statistic as large or larger than the one observed from your data, if the null hypothesis were really true. It is common in many statistical analyses to accept a type I error rate of one in twenty, or 0.05. This means there is less than a one-in-twenty chance of making a type I error.

To see what this means, let us imagine that our data show that identical twins have a 10 percent greater likelihood of being concordant for alcoholism than fraternal twins. Is this a significant enough difference that we should reject the null hypothesis of no difference between twin types? By examining the number of individuals tested and the variance in the data, we can come up with an estimate of the probability that we could obtain this difference by chance alone, even if the null hypothesis were true. If this probability is less than 0.05—if the likelihood of obtaining this difference by chance is less than one in twenty—then we reject the null hypothesis in favor of the alternate hypothesis.

Prior to carrying out a scientific investigation and a statistical analysis of the resulting data, it is possible to get a feel for your chances of seeing something if it is really there to see. This is referred to as the power of a study and is simply one minus the probability of making a type II error. A commonly accepted power for a study is 80 percent or greater. That is, you would like to know that you have at least an 80 percent chance of seeing something if it is really there. Increasing the size of the random sample from the population is perhaps the best way to improve the power of a study. The closer your sample is to the true population size, the more likely you are to see something if it is really there.

Thus, statistics is a relatively new scientific discipline that uses both mathematics and philosophy for exploratory data analysis, point estimation, and hypothesis testing. The ultimate utility of statistics is for making decisions about hypotheses to make inferences about the answers to scientific questions.

Bibliography

Gonick, Larry, and Woollcott Smith. The Cartoon Guide to Statistics. New York:Harper Collins, 1993.

Jaisingh, Lloyd R. Statistics for the Utterly Confused. New York: McGraw-Hill, 2000.

Salsberg, David. The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century. New York: W. H. Freeman, 2001.

Internet Resource

HyperStat Online: An Introductory Statistics Book and Online Tutorial for Help in Statistics Courses. David M. Lane., ed. http://davidmlane.com/hyperstat/.

—Jason H. Moore

Statistics, the scientific discipline that deals with the collection, classification, analysis, and interpretation of numerical facts or data, was invented primarily in the nineteenth and twentieth centuries in Western Europe and North America. In the eighteenth century, when the term came into use, "statistics" referred to a descriptive analysis of the situation of a political state—its people, resources, and social life. In the early nineteenth century, the term came to carry the specific connotation of a quantitative description and analysis of the various aspects of a state or other social or natural phenomenon. Many statistical associations were founded in the 1830s, including the Statistical Society of London (later the Royal Statistical Society) in 1833 and the American Statistical Association in 1839.

Early Use of Statistics

Although scientific claims were made for the statistical enterprise almost from the beginning, it had few characteristics of an academic discipline before the twentieth century, except as a "state science" or Staatswissenschaft in parts of central Europe. The role of statistics as a tool of politics, administration, and reform defined its character in the United States throughout the nineteenth century. Advocates of statistics, within government and among private intellectuals, argued that their new field would supply important political knowledge. Statistics could provide governing elites with concise, systematic, and authoritative information on the demographic, moral, medical, and economic characteristics of populations. In this view, statistical knowledge was useful, persuasive, and hence powerful, because it could capture the aggregate and the typical, the relationship between the part and the whole, and when data were available, their trajectory over time. It was particularly appropriate to describe the new arrays of social groups in rapidly growing, industrializing societies, the character and trajectory of social processes in far-flung empires, and the behavior and characteristics of newly mobilized political actors in the age of democratic revolutions.

One strand in this development was the creation of data sets and the development of rules and techniques of data collection and classification. In America, the earliest statistical works were descriptions of the American population and economy dating from the colonial period. British officials watched closely the demographic development of the colonies. By the time of the American Revolution (1775–1783), colonial leaders were aware of American demographic realities, and of the value of statistics. To apportion the tax burden and raise troops for the revolution, Congress turned to population and wealth measures to assess the differential capacities among the colonies. In 1787, the framers institutionalized the national population census to apportion seats among the states in the new Congress, and required that statistics on revenues and expenditures of the national state be collected and published by the new government. Almanacs, statistical gazetteers, and the routine publication of numerical data in the press signaled the growth of the field. Government activities produced election numbers, shipping data from tariff payments, value of land sales, and population distributions. In the early nineteenth century, reform organizations and the new statistical societies published data on the moral status of the society in the form of data on church pews filled, prostitutes arrested, patterns of disease, and drunkards reformed. The collection and publication of statistics thus expanded in both government and private organizations.

Professionalization of Statistics

The professionalization of the discipline began in the late nineteenth century. An International Statistical Congress, made up of government representatives from many states, met for the first time in 1853 and set about the impossible task of standardizing statistical categories across nations. In 1885, a new, more academic organization was created, the International Statistical Institute. Statistical work grew in the new federal agencies such as the Departments of Agriculture and Education in the 1860s and 1870s. The annual Statistical Abstract of the United States first appeared in 1878. The states began to create bureaus of labor statistics to collect data on wages, prices, strikes, and working conditions in industry, the first in Massachusetts in 1869; the federal Bureau of Labor, now the Bureau of Labor Statistics, was created in 1884. Statistical analysis became a university subject in the United States with Richmond Mayo Smith's text and course at Columbia University in the 1880s. Governments created formal posts for "statisticians" in government service, and research organizations devoted to the development of the field emerged. The initial claims of the field were descriptive, but soon, leaders also claimed the capacity to draw inferences from data.

Throughout the nineteenth century, a strong statistical ethic favored complete enumerations whenever possible, to avoid what seemed the speculative excess of early modern "political arithmetic." In the first decades of the twentieth century, there were increasingly influential efforts to define formal procedures of sampling. Agricultural economists in the U.S. Department of Agriculture were pioneers of such methods. By the 1930s, sampling was becoming common in U.S. government statistics. Increasingly, this was grounded in the mathematical methods of probability theory, which favored random rather than "purposive" samples. A 1934 paper by the Polish-born Jerzy Neyman, who was then in England but would soon emigrate to America, helped to define the methods of random sampling. At almost the same time, a notorious failure of indiscriminate large-scale polling in the 1936 election—predicting a landslide victory by Alf Landon over Franklin D. Roosevelt—gave credence to the more mathematical procedures.

Tools and Strategies

The new statistics of the twentieth century was defined not by an object of study—society—nor by counting and classifying, but by its mathematical tools, and by strategies of designing and analyzing observational and experimental data. The mathematics was grounded in an eighteenth-century tradition of probability theory, and was first institutionalized as a mathematical statistics in the 1890s by the English biometrician and eugenicist Karl Pearson. The other crucial founding figure was Sir R. A. Fisher, also an English student of quantitative biology and eugenics, whose statistical strategies of experimental design and analysis date from the 1920s. Pearson and Fisher were particularly influential in the United States, where quantification was associated with Progressive reforms of political and economic life. A biometric movement grew up in the United States under the leadership of scientists such as Raymond Pearl, who had been a postdoctoral student in Pearson's laboratory in London. Economics, also, was highly responsive to the new statistical methods, and deployed them to find trends, correlate variables, and detect and analyze business cycles. The Cowles Commission, set up in 1932 and housed at the University of Chicago in 1939, deployed and created statistical methods to investigate the causes of the worldwide depression of that decade. An international Econometric Society was established at about the same time, in 1930, adapting its name from Pearson's pioneering journal Biometrika.

Also prominent among the leading statistical fields in America were agriculture and psychology. Both had statistical traditions reaching back into the nineteenth century, and both were particularly receptive to new statistical tools. Fisher had worked out his methods of experimental design and tests of statistical significance with particular reference to agriculture. In later years he often visited America, where he was associated most closely with a statistical group at Iowa State University led by George Snedecor. The agriculturists divided their fields into many plots and assigned them randomly to experimental and control groups in order to determine, for example, whether a fertilizer treatment significantly increased crop yields. This strategy of collective experiments and randomized treatment also became the model for much of psychology, and especially educational psychology, where the role of the manure (the treatment) was now filled by novel teaching methods or curricular innovations to test for differences in educational achievement. The new experimental psychology was closely tied to strategies for sorting students using tests of intelligence and aptitude in the massively expanded public school systems of the late nineteenth and early twentieth centuries.

The methods of twentieth-century statistics also had a decisive role in medicine. The randomized clinical trial was also in many ways a British innovation, exemplified by a famous test of streptomycin in the treatment of tuberculosis just after World War II (1939–1945). It quickly became important also in America, where medical schools soon began to form departments of biostatistics. Statistics provided a means to coordinate treatments by many physicians in large-scale medical trials, which provided, in effect, a basis for regulating the practice of medicine. By the 1960s, statistical results had acquired something like statutory authority in the evaluation of pharmaceuticals. Not least among the sources of their appeal was the presumed objectivity of their results. The "gold standard" was a thoroughly impersonal process—a well-designed experiment generating quantitative results that could be analyzed by accepted statistical methods to give an un-ambiguous result.

Historical analysis was fairly closely tied to the field of statistics in the nineteenth century, when statistical work focused primarily on developing data and information systems to analyze "state and society" questions. Carroll Wright, first Commissioner of Labor, often quoted August L. von Schloezer's aphorism that "history is statistics ever advancing, statistics is history standing still." The twentieth century turn in statistics to experimental design and the analysis of biological processes broke that link, which was tenuously restored with the development of cliometrics, or quantitative history, in the 1960s and 1970s. But unlike the social sciences of economics, political science, psychology, and sociology, the field of history did not fully restore its relationship with statistics, for example, by making such training a graduate degree requirement. Thus the use of statistical analysis and "statistics" in the form of data in historical writing has remained a subfield of the American historical writing as history has eschewed a claim to being a "scientific" discipline.

Statistics as a field embraces the scientific ideal. That ideal, which replaces personal judgment with impersonal law, resonates with an American political tradition reaching back to the eighteenth century. The place of science, and especially statistics, as a source of such authority grew enormously in the twentieth century, as a greatly expanded state was increasingly compelled to make decisions in public, and to defend them against challenges.

Bibliography

Anderson, Margo. The American Census: A Social History. New Haven, Conn.: Yale University Press, 1988.

———. American Medicine and Statistical Thinking, 1800–1860. Cambridge, Mass.: Harvard University Press, 1984.

Cohen, Patricia Cline. A Calculating People: The Spread of Numeracy in Early America. Chicago: University of Chicago Press, 1982.

Cullen, M. J. The Statistical Movement in Early Victorian Britain: The Foundations of Empirical Social Research. New York: Barnes and Noble, 1975.

Curtis, Bruce. The Politics of Population: State Formation, Statistics, and the Census of Canada, 1840–1875. Toronto: University of Toronto Press, 2001.

Desrosières, Alan. The Politics of Large Numbers: A History of Statistical Reasoning (English translation of Alain Desrosieres 1993 study, La politique des grands nombres: Histoire de la raison statistique). Cambridge, Mass.: Harvard University Press, 1998.

Gigerenzer, G., et al. The Empire of Chance: How Probability Changed Science and Everyday Life. Cambridge, Mass.: Cambridge University Press, 1989.

Glass, D. V. Numbering the People: The Eighteenth-Century Population Controversy and the Development of Census and Vital Statistics in Britain. New York: D.C. Heath, 1973.

Marks, Harry M. The Progress of Experiment: Science and Therapeutic Reform in the United States, 1900–1990. New York: Cambridge University Press, 1997.

Morgan, Mary S. The History of Econometric Ideas. New York: Cambridge University Press, 1990.

Patriarca, Silvana. Numbers and Nationhood: Writing Statistics in Nineteenth-Century Italy. New York: Cambridge University Press, 1996.

Porter, Theodore M. The Rise of Statistical Thinking, 1820–1900. Princeton, N.J.: Princeton University Press, 1986.

———. Trust in Numbers: The Pursuit of Objectivity in Science and Public Life. Princeton, N.J.: Princeton University Press, 1995.

Stigler, Stephen M. The History of Statistics: The Measurement of Uncertainty Before 1900. Cambridge, Mass.: Belknap Press of Harvard University Press, 1986.

———. Statistics on the Table: The History of Statistical Concepts and Methods. Cambridge, Mass.: Harvard University Press, 1999.

The word statistics comes from the German Statistik and was coined by Gottfried Achenwall (1719–1772) in 1749. This term referred to a thorough, generally nonquantitative description of features of the state—its geography, peoples, customs, trade, administration, and so on. Hermann Conring (1606–1681) introduced this field of inquiry under the name Staatenkunde in the seventeenth century, and it became a standard part of the university curriculum in Germany and in the Netherlands. Recent histories of statistics in France, Italy, and the Netherlands have documented the strength of this descriptive approach. The descriptive sense of statistics continued throughout the eighteenth century and into the nineteenth century.

The numerical origins of statistics are found in distinct national traditions of quantification. In England, self-styled political and medical arithmeticians working outside government promoted numerical approaches to the understanding of the health and wealth of society. In Germany, the science of cameralism provided training and rationale for government administrators to count population and economic resources for local communities. In France, royal ministers, including the duke of Sully (1560–1641) and Jean-Baptiste Colbert (1619–1683), initiated statistical inquiries into state finance and population that were continued through the eighteenth century.

Alongside these quantitative studies of society, mathematicians developed probability theory, which made use of small sets of numerical data. The emergence of probability has been the subject of several recent histories and its development was largely independent of statistics. The two traditions of collecting numbers and analyzing them using the calculus of probabilities did not merge until the nineteenth century, thus creating the modern discipline of statistics.

The early modern field of inquiry that most closely resembles modern statistics was political arithmetic, created in the 1660s and 1670s by two Englishman, John Graunt (1620–1674) and William Petty (1623–1687). Graunt's Natural and Political Observations Made upon the Bills of Mortality (1662) launched quantitative studies of population and society, which Petty labeled political arithmetic. In their work, they showed how numerical accounts of population could be used to answer medical and political questions such as the comparative mortality of specific diseases and the number of men of fighting age. Graunt developed new methods to calculate population from the numbers of christenings and burials. He created the first life table, a numerical table that showed how many individuals out of a given population survived at each year of life. Petty created sample tables to be used in Ireland to collect vital statistics and urged that governments collect regular and accurate accounts of the numbers of christenings, burials, and total population. Such accounts, Petty argued, would put government policy on a firm foundation.

Political arithmetic was originally associated with strengthening monarchical authority, but several other streams of inquiry flowed from Graunt's and Petty's early work. One tradition was medical statistics, which developed most fully in England during the eighteenth century. Physicians such as James Jurin (1684–1750) and William Black (1749–1829) advocated the collection and evaluation of numerical information about the incidence and mortality of diseases. Jurin pioneered the use of statistics in the 1720s to evaluate medical practice in his studies of the risks associated with smallpox inoculation. William Black coined the term medical arithmetic to refer to the tradition of using numbers to analyze the comparative mortality of different diseases. New hospitals and dispensaries such as the London Smallpox and Inoculation Hospital, established in the eighteenth century, provided institutional support for the collection of medical statistics; some treatments were evaluated numerically.

Theology provided another context for the development of statistics. Graunt had identified a constant birth ratio between male and females (14 to 13) and had used this as an argument against polygamy. The physician John Arbuthnot (1667–1735) argued in a 1710 article that this regularity was "an Argument for Divine Providence." Later writers, including William Derham (1657–1735), author of Physico-Theology (1713), and Johann Peter Süssmilch (1707–1767), author of Die Göttliche Ordnung (1741), made the stability of this statistical ratio a part of the larger argument about the existence of God.

One final area of statistics that flowed from Graunt's work and was the most closely associated with probability theory was the development of life (or mortality) tables. Immediately following the publication of Graunt's book, several mathematicians, including Christiaan Huygens (1629–1695), Gottfried Leibniz (1646–1716), and Edmund Halley (1656–1742) refined Graunt's table. Halley, for example, based his life table on numerical data from the town of Breslau that listed ages of death. (Graunt had to estimate ages of death.) In the eighteenth century, further modifications were introduced by the Dutchmen Willem Kersseboom (1690–1771) and Nicolaas Struyck (1686–1769), the Frenchman Antoine Deparcieux (1703–1768), the German Leonard Euler (1707–1783), and the Swede Pehr Wargentin (1717–1783). A French historian has recently argued that the creation of life tables was one of the leading achievements of the scientific revolution. Life tables were used to predict life expectancy and aimed to improve the financial soundness of annuities and tontines.

The administrative demands brought about by state centralization in early modern Europe also fostered the collection and analysis of numerical information about births, deaths, marriages, trade, and so on. In France, for example, Sébastien le Prestre de Vauban (1633–1707), adviser to Louis XIV (ruled 1643–1715), provided a model for the collection of this data in his census of Vézelay (1696), a small town in Burgundy. Although his recommendations were not adopted, a similar approach was pursued decades later by the Controller-General Joseph Marie Terray (1715–1778), who requested in 1772 that the provincial intendants collect accounts of births and deaths from parish clergy and forward them to Paris. Sweden created the most consistent system for the collection of vital statistics through parish clerks in 1749. Efforts in other countries failed. In England, two bills were put before Parliament in the 1750s to institute a census and to insure the collection of vital statistics. Both bills were defeated because of issues concerning personal liberty. While these initiatives enjoyed mixed success, they all spoke to the desire to secure numerical information about the population. Regular censuses, which would provide data for statistical analysis, were not instituted until the nineteenth century.

Bibliography

Primary Sources

Arbuthnot, John. "An Argument for Divine Providence Taken from the Regularity Observ'd in the Birth of Both Sexes." Philosophical Transactions 27 (1710–1712): 186–190.

Black, William. An Arithmetical and Medical Analysis of the Diseases and Mortality of the Human Species. London, 1789. Reprinted with an introduction by D. V. Glass. Farnborough, U.K., 1973.

Jurin, James. An Account of the Success of Inoculating the Small Pox in Great Britain with a Comparison between the Miscarriages in That Practice, and the Mortality of the Natural Small Pox. London, 1724.

Petty, William. The Economic Writings of Sir William Petty. Edited by Charles Henry Hull. 2 vols. Cambridge, U.K., 1899.

Secondary Sources

Bourguet, Marie-Noëlle. Déchiffer la France: La statistique départementale à l'époque napoléonienne. Paris, 1988.

Buck, Peter. "People Who Counted: Political Arithmetic in the Eighteenth Century." Isis 73 (1982): 28–45.

——. "Seventeenth-Century Political Arithmetic: Civil Strife and Vital Statistics." Isis 68 (1977): 67–84.

Daston, Lorraine. Classical Probability in the Enlightenment. Princeton, 1988.

Dupâquier, Jacques. L'invention de la table de mortalité, de Graunt à Wargentin, 1622–1766. Paris, 1996.

Dupâquier, Jacques, and Michel Dupâquier. Histoire de la démographie. Paris, 1985.

Hacking, Ian. The Emergence of Probability. Cambridge, U.K., 1975.

——. The Taming of Chance. Cambridge, U.K., 1990.

Hald, Anders. A History of Probability and Statistics and Their Applications before 1750. New York, 1990.

Klep, Paul M. M., and Ida H. Stamhuis, eds. The Statistical Mind in a Pre-Statistical Era: The Netherlands, 1750–1850. Amsterdam, 2002.

Patriarca, Silvana. Numbers and Nationhood: Writing Statistics in Nineteenth-Century Italy. Cambridge, U.K., 1996.

Pearson, Karl. The History of Statistics in the 17th and 18th Centuries against the Changing Background of Intellectual, Scientific and Religious Thought. Edited by E. S. Pearson. London, U.K., 1978.

Porter, Theodore M. The Rise of Statistical Thinking, 1820–1900. Princeton, 1986.

Rusnock, Andrea. Vital Accounts: Quantifying Health and Population in Eighteenth-Century England and France. Cambridge, U.K., 2002.

—ANDREA RUSNOCK