pubmed.ncbi.nlm.nih.gov

Should I use fixed effects or random effects when I have fewer than five levels of a grouping factor in a mixed-effects model? - PubMed

️Sat Jan 01 2022

Should I use fixed effects or random effects when I have fewer than five levels of a grouping factor in a mixed-effects model?

Dylan G E Gomes. PeerJ. 2022.

Abstract

As linear mixed-effects models (LMMs) have become a widespread tool in ecology, the need to guide the use of such tools is increasingly important. One common guideline is that one needs at least five levels of the grouping variable associated with a random effect. Having so few levels makes the estimation of the variance of random effects terms (such as ecological sites, individuals, or populations) difficult, but it need not muddy one's ability to estimate fixed effects terms-which are often of primary interest in ecology. Here, I simulate datasets and fit simple models to show that having few random effects levels does not strongly influence the parameter estimates or uncertainty around those estimates for fixed effects terms-at least in the case presented here. Instead, the coverage probability of fixed effects estimates is sample size dependent. LMMs including low-level random effects terms may come at the expense of increased singular fits, but this did not appear to influence coverage probability or RMSE, except in low sample size (N = 30) scenarios. Thus, it may be acceptable to use fewer than five levels of random effects if one is not interested in making inferences about the random effects terms (i.e. when they are 'nuisance' parameters used to group non-independent data), but further work is needed to explore alternative scenarios. Given the widespread accessibility of LMMs in ecology and evolution, future simulation studies and further assessments of these statistical methods are necessary to understand the consequences both of violating and of routinely following simple guidelines.

Keywords: ANOVA; Block-design; Experimental design; Hierarchical modelling; Quantitative; Statistics; Varying effects; regression.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

**Figure 1. Number of levels of random effects terms from 50 recent papers in Ecology.**
The type (y axis) and number of levels (x axis) of random effect terms are displayed from a survey of the 50 most recent papers in the journal Ecology. Dotted line at 5 levels, shows a common cutoff for estimation of grouping factors as random effects. Twenty-nine random effects were mentioned in 18 papers, although it was unclear how many levels two random effects had, so they were omitted here (see Table S1).

**Figure 2. 95% Coverage probability of fixed effects in LMs and LMMs.**
The proportion of simulation runs in which the true value was found within the 95% confidence intervals of model estimates is displayed on the y axis, while the number of levels of a grouping factor is on the x axis. The symbols represent linear models (LM) vs linear mixed-effect models (LMM) where the grouping factor is specified as a fixed effect vs a random effect, respectively, within the same datasets. Colors denote overall sample size (N = 30, 60, or 120). The left column shows data for the relatively weaker slope (0.2), and the right column shows data for the relatively stronger slope (2).

**Figure 3. Relative RMSE of model fixed effects estimates.**
The relative RMSE of model fixed effects estimates is plotted against the number of levels of a grouping factor. Triangles represent linear model RMSE values (dotted line) and circles represent linear mixed-effect model RMSE values (solid line) for the slope = 2 estimates (see Fig. S2 for a plot of slope = 0.2), which is qualitatively similar. Colors denote simulated dataset sample size (N = 30, 60, or 120).

**Figure 4. Random effects variance estimates from LMMs of simulated data.**
Each point is the mean point estimate for 10,000 simulated runs, whereas error bars are the 95% intervals (0.025 and 0.975 quartiles) of the distribution of 10,000 point estimates for the random effects variance. N = the number of observations (*i.e*. number of rows) in each dataset. Dashed lines indicate the true value.

**Figure 5. Singular fits in LMM.**
Each point is the proportion of linear mixed-effect model runs (total = 10,000) that had a singular fit (variances of one or more effects are zero, or close to zero). N = the number of observations (*i.e*. number of rows) in each simulated dataset.

**Figure 6. 95% Coverage probability of LMMs with singular fits.**
The proportion of simulation runs in which the true value was found within the 95% confidence intervals of model estimates is displayed on the y axis, while the number of levels of a grouping factor is on the x axis. The symbols represent linear mixed-effect models (LMM) with a singular fit (variances of one or more effects are zero, or close to zero; singular = 1) or a non-singular fit (singular = 0). Colors denote overall sample size (N = 30, 60, or 120). The left column is where the slope is relatively weaker (slope = 0.2), and the right column is where the slope is relatively stronger (slope = 2). Note that there was only one singular fit (out of 10,000 runs) for the largest data set (N = 120) with the most levels of the random effect (N levels = 10). Since this single point can only be 0 or 1, and is unlikely to reflect a mean value (that more simulation runs might elucidate), it was omitted here.

**Figure 7. Relative RMSE of LMM fixed effects estimates with singular fits.**
The relative RMSE of model estimates is plotted on the y axis, while the number of levels of a grouping factor is on the x axis. The symbols represent linear mixed-effect models (LMM) with a singular fit (variances of one or more effects are zero, or close to zero; triangles; singular = 1) or a non-singular fit (circles; singular = 0). Colors denote overall sample size (N = 30, 60, or 120). The left column is where the slope is relatively weaker (slope = 0.2), and the right column is where the slope is relatively stronger (slope = 2). Note that there was only one singular fit (out of 10,000 runs) for the largest data set (N = 120) with the most levels of the random effect (N levels = 10). Since this single point may not reflect a mean value (that more simulation runs might elucidate), it was omitted here.

Cited by

Territory and population attributes affect Florida scrub-jay fecundity in fire-adapted ecosystems.
Breininger DR, Stolen ED, Carter GM, Legare SA, Payne WV, Breininger DJ, Lyon JE, Schumann CD, Hunt DK. Breininger DR, et al. Ecol Evol. 2023 Jan 15;13(1):e9704. doi: 10.1002/ece3.9704. eCollection 2023 Jan. Ecol Evol. 2023. PMID: 36687801 Free PMC article.
How to be a good partner and father? The role of adult males in pair bond maintenance and parental care in Javan gibbons.
Yi Y, Mardiastuti A, Choe JC. Yi Y, et al. Proc Biol Sci. 2023 Jun 28;290(2001):20230950. doi: 10.1098/rspb.2023.0950. Epub 2023 Jun 28. Proc Biol Sci. 2023. PMID: 37369349 Free PMC article.
Gradual changes in model shape affect egg-directed behaviours by parasitic shiny cowbirds Molothrus bonariensis in captivity.
Crudele I, Hauber ME, Reboreda JC, Fiorini VD. Crudele I, et al. R Soc Open Sci. 2023 May 10;10(5):221477. doi: 10.1098/rsos.221477. eCollection 2023 May. R Soc Open Sci. 2023. PMID: 37181795 Free PMC article.
Low hybridization temperatures improve target capture success of invertebrate loci: a case study of leaf-footed bugs (Hemiptera: Coreoidea).
Forthman M, Gordon ERL, Kimball RT. Forthman M, et al. R Soc Open Sci. 2023 Jun 28;10(6):230307. doi: 10.1098/rsos.230307. eCollection 2023 Jun. R Soc Open Sci. 2023. PMID: 37388308 Free PMC article.
Exploring the Feasibility and Usability of Smartphones for Monitoring Physical Activity in Orthopedic Patients: Prospective Observational Study.
Ghaffari A, Kildahl Lauritsen RE, Christensen M, Rolighed Thomsen T, Mahapatra H, Heck R, Kold S, Rahbek O. Ghaffari A, et al. JMIR Mhealth Uhealth. 2023 Jul 4;11:e44442. doi: 10.2196/44442. JMIR Mhealth Uhealth. 2023. PMID: 37283228 Free PMC article.

References

1. Allen LC, Hristov NI, Rubin JJ, Lightsey JT, Barber JR. Noise distracts foraging bats. Proceedings of the Royal Society B: Biological Sciences. 2021;288(1944):20202689. doi: 10.1098/rspb.2020.2689. - DOI - PMC - PubMed
1. Arnqvist G. Mixed models offer no freedom from degrees of freedom. Trends in Ecology & Evolution. 2020;35(4):329–335. doi: 10.1016/j.tree.2019.12.004. - DOI - PubMed
1. Bain GC, Johnson CN, Jones ME. Chronic stress in superb fairy-wrens occupying remnant woodlands: are noisy miners to blame? Austral Ecology. 2019;44(7):1139–1149. doi: 10.1111/aec.12785. - DOI
1. Barr DJ, Levy R, Scheepers C, Tily HJ. Random effects structure for confirmatory hypothesis testing: keep it maximal. Journal of Memory and Language. 2013;68(3):255–278. doi: 10.1016/j.jml.2012.11.001. - DOI - PMC - PubMed
1. Bates D, Kliegl R, Vasishth S, Baayen H. Parsimonious mixed models. 2015a. https://arxiv.org/abs/1506.04967 https://arxiv.org/abs/1506.04967

Publication types

MeSH terms

Grants and funding

The work was supported by the National Science Foundation (NSF GRFP 2018268606). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Should I use fixed effects or random effects when I have fewer than five levels of a grouping factor in a mixed-effects model? - PubMed