pubmed.ncbi.nlm.nih.gov

Should I use fixed effects or random effects when I have fewer than five levels of a grouping factor in a mixed-effects model? - PubMed

  • ️Sat Jan 01 2022

Should I use fixed effects or random effects when I have fewer than five levels of a grouping factor in a mixed-effects model?

Dylan G E Gomes. PeerJ. 2022.

Abstract

As linear mixed-effects models (LMMs) have become a widespread tool in ecology, the need to guide the use of such tools is increasingly important. One common guideline is that one needs at least five levels of the grouping variable associated with a random effect. Having so few levels makes the estimation of the variance of random effects terms (such as ecological sites, individuals, or populations) difficult, but it need not muddy one's ability to estimate fixed effects terms-which are often of primary interest in ecology. Here, I simulate datasets and fit simple models to show that having few random effects levels does not strongly influence the parameter estimates or uncertainty around those estimates for fixed effects terms-at least in the case presented here. Instead, the coverage probability of fixed effects estimates is sample size dependent. LMMs including low-level random effects terms may come at the expense of increased singular fits, but this did not appear to influence coverage probability or RMSE, except in low sample size (N = 30) scenarios. Thus, it may be acceptable to use fewer than five levels of random effects if one is not interested in making inferences about the random effects terms (i.e. when they are 'nuisance' parameters used to group non-independent data), but further work is needed to explore alternative scenarios. Given the widespread accessibility of LMMs in ecology and evolution, future simulation studies and further assessments of these statistical methods are necessary to understand the consequences both of violating and of routinely following simple guidelines.

Keywords: ANOVA; Block-design; Experimental design; Hierarchical modelling; Quantitative; Statistics; Varying effects; regression.

© 2022 Gomes.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1
Figure 1. Number of levels of random effects terms from 50 recent papers in Ecology.

The type (y axis) and number of levels (x axis) of random effect terms are displayed from a survey of the 50 most recent papers in the journal Ecology. Dotted line at 5 levels, shows a common cutoff for estimation of grouping factors as random effects. Twenty-nine random effects were mentioned in 18 papers, although it was unclear how many levels two random effects had, so they were omitted here (see Table S1).

Figure 2
Figure 2. 95% Coverage probability of fixed effects in LMs and LMMs.

The proportion of simulation runs in which the true value was found within the 95% confidence intervals of model estimates is displayed on the y axis, while the number of levels of a grouping factor is on the x axis. The symbols represent linear models (LM) vs linear mixed-effect models (LMM) where the grouping factor is specified as a fixed effect vs a random effect, respectively, within the same datasets. Colors denote overall sample size (N = 30, 60, or 120). The left column shows data for the relatively weaker slope (0.2), and the right column shows data for the relatively stronger slope (2).

Figure 3
Figure 3. Relative RMSE of model fixed effects estimates.

The relative RMSE of model fixed effects estimates is plotted against the number of levels of a grouping factor. Triangles represent linear model RMSE values (dotted line) and circles represent linear mixed-effect model RMSE values (solid line) for the slope = 2 estimates (see Fig. S2 for a plot of slope = 0.2), which is qualitatively similar. Colors denote simulated dataset sample size (N = 30, 60, or 120).

Figure 4
Figure 4. Random effects variance estimates from LMMs of simulated data.

Each point is the mean point estimate for 10,000 simulated runs, whereas error bars are the 95% intervals (0.025 and 0.975 quartiles) of the distribution of 10,000 point estimates for the random effects variance. N = the number of observations (i.e. number of rows) in each dataset. Dashed lines indicate the true value.

Figure 5
Figure 5. Singular fits in LMM.

Each point is the proportion of linear mixed-effect model runs (total = 10,000) that had a singular fit (variances of one or more effects are zero, or close to zero). N = the number of observations (i.e. number of rows) in each simulated dataset.

Figure 6
Figure 6. 95% Coverage probability of LMMs with singular fits.

The proportion of simulation runs in which the true value was found within the 95% confidence intervals of model estimates is displayed on the y axis, while the number of levels of a grouping factor is on the x axis. The symbols represent linear mixed-effect models (LMM) with a singular fit (variances of one or more effects are zero, or close to zero; singular = 1) or a non-singular fit (singular = 0). Colors denote overall sample size (N = 30, 60, or 120). The left column is where the slope is relatively weaker (slope = 0.2), and the right column is where the slope is relatively stronger (slope = 2). Note that there was only one singular fit (out of 10,000 runs) for the largest data set (N = 120) with the most levels of the random effect (N levels = 10). Since this single point can only be 0 or 1, and is unlikely to reflect a mean value (that more simulation runs might elucidate), it was omitted here.

Figure 7
Figure 7. Relative RMSE of LMM fixed effects estimates with singular fits.

The relative RMSE of model estimates is plotted on the y axis, while the number of levels of a grouping factor is on the x axis. The symbols represent linear mixed-effect models (LMM) with a singular fit (variances of one or more effects are zero, or close to zero; triangles; singular = 1) or a non-singular fit (circles; singular = 0). Colors denote overall sample size (N = 30, 60, or 120). The left column is where the slope is relatively weaker (slope = 0.2), and the right column is where the slope is relatively stronger (slope = 2). Note that there was only one singular fit (out of 10,000 runs) for the largest data set (N = 120) with the most levels of the random effect (N levels = 10). Since this single point may not reflect a mean value (that more simulation runs might elucidate), it was omitted here.

Similar articles

Cited by

References

    1. Allen LC, Hristov NI, Rubin JJ, Lightsey JT, Barber JR. Noise distracts foraging bats. Proceedings of the Royal Society B: Biological Sciences. 2021;288(1944):20202689. doi: 10.1098/rspb.2020.2689. - DOI - PMC - PubMed
    1. Arnqvist G. Mixed models offer no freedom from degrees of freedom. Trends in Ecology & Evolution. 2020;35(4):329–335. doi: 10.1016/j.tree.2019.12.004. - DOI - PubMed
    1. Bain GC, Johnson CN, Jones ME. Chronic stress in superb fairy-wrens occupying remnant woodlands: are noisy miners to blame? Austral Ecology. 2019;44(7):1139–1149. doi: 10.1111/aec.12785. - DOI
    1. Barr DJ, Levy R, Scheepers C, Tily HJ. Random effects structure for confirmatory hypothesis testing: keep it maximal. Journal of Memory and Language. 2013;68(3):255–278. doi: 10.1016/j.jml.2012.11.001. - DOI - PMC - PubMed
    1. Bates D, Kliegl R, Vasishth S, Baayen H. Parsimonious mixed models. 2015a. https://arxiv.org/abs/1506.04967 https://arxiv.org/abs/1506.04967

Publication types

MeSH terms

Grants and funding

The work was supported by the National Science Foundation (NSF GRFP 2018268606). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.