Should I use fixed effects or random effects when I have fewer than five levels of a grouping factor in a mixed-effects model? - PubMed
- ️Sat Jan 01 2022
Should I use fixed effects or random effects when I have fewer than five levels of a grouping factor in a mixed-effects model?
Dylan G E Gomes. PeerJ. 2022.
Abstract
As linear mixed-effects models (LMMs) have become a widespread tool in ecology, the need to guide the use of such tools is increasingly important. One common guideline is that one needs at least five levels of the grouping variable associated with a random effect. Having so few levels makes the estimation of the variance of random effects terms (such as ecological sites, individuals, or populations) difficult, but it need not muddy one's ability to estimate fixed effects terms-which are often of primary interest in ecology. Here, I simulate datasets and fit simple models to show that having few random effects levels does not strongly influence the parameter estimates or uncertainty around those estimates for fixed effects terms-at least in the case presented here. Instead, the coverage probability of fixed effects estimates is sample size dependent. LMMs including low-level random effects terms may come at the expense of increased singular fits, but this did not appear to influence coverage probability or RMSE, except in low sample size (N = 30) scenarios. Thus, it may be acceptable to use fewer than five levels of random effects if one is not interested in making inferences about the random effects terms (i.e. when they are 'nuisance' parameters used to group non-independent data), but further work is needed to explore alternative scenarios. Given the widespread accessibility of LMMs in ecology and evolution, future simulation studies and further assessments of these statistical methods are necessary to understand the consequences both of violating and of routinely following simple guidelines.
Keywords: ANOVA; Block-design; Experimental design; Hierarchical modelling; Quantitative; Statistics; Varying effects; regression.
© 2022 Gomes.
Conflict of interest statement
The authors declare that they have no competing interests.
Figures

The type (y axis) and number of levels (x axis) of random effect terms are displayed from a survey of the 50 most recent papers in the journal Ecology. Dotted line at 5 levels, shows a common cutoff for estimation of grouping factors as random effects. Twenty-nine random effects were mentioned in 18 papers, although it was unclear how many levels two random effects had, so they were omitted here (see Table S1).

The proportion of simulation runs in which the true value was found within the 95% confidence intervals of model estimates is displayed on the y axis, while the number of levels of a grouping factor is on the x axis. The symbols represent linear models (LM) vs linear mixed-effect models (LMM) where the grouping factor is specified as a fixed effect vs a random effect, respectively, within the same datasets. Colors denote overall sample size (N = 30, 60, or 120). The left column shows data for the relatively weaker slope (0.2), and the right column shows data for the relatively stronger slope (2).

The relative RMSE of model fixed effects estimates is plotted against the number of levels of a grouping factor. Triangles represent linear model RMSE values (dotted line) and circles represent linear mixed-effect model RMSE values (solid line) for the slope = 2 estimates (see Fig. S2 for a plot of slope = 0.2), which is qualitatively similar. Colors denote simulated dataset sample size (N = 30, 60, or 120).

Each point is the mean point estimate for 10,000 simulated runs, whereas error bars are the 95% intervals (0.025 and 0.975 quartiles) of the distribution of 10,000 point estimates for the random effects variance. N = the number of observations (i.e. number of rows) in each dataset. Dashed lines indicate the true value.

Each point is the proportion of linear mixed-effect model runs (total = 10,000) that had a singular fit (variances of one or more effects are zero, or close to zero). N = the number of observations (i.e. number of rows) in each simulated dataset.

The proportion of simulation runs in which the true value was found within the 95% confidence intervals of model estimates is displayed on the y axis, while the number of levels of a grouping factor is on the x axis. The symbols represent linear mixed-effect models (LMM) with a singular fit (variances of one or more effects are zero, or close to zero; singular = 1) or a non-singular fit (singular = 0). Colors denote overall sample size (N = 30, 60, or 120). The left column is where the slope is relatively weaker (slope = 0.2), and the right column is where the slope is relatively stronger (slope = 2). Note that there was only one singular fit (out of 10,000 runs) for the largest data set (N = 120) with the most levels of the random effect (N levels = 10). Since this single point can only be 0 or 1, and is unlikely to reflect a mean value (that more simulation runs might elucidate), it was omitted here.

The relative RMSE of model estimates is plotted on the y axis, while the number of levels of a grouping factor is on the x axis. The symbols represent linear mixed-effect models (LMM) with a singular fit (variances of one or more effects are zero, or close to zero; triangles; singular = 1) or a non-singular fit (circles; singular = 0). Colors denote overall sample size (N = 30, 60, or 120). The left column is where the slope is relatively weaker (slope = 0.2), and the right column is where the slope is relatively stronger (slope = 2). Note that there was only one singular fit (out of 10,000 runs) for the largest data set (N = 120) with the most levels of the random effect (N levels = 10). Since this single point may not reflect a mean value (that more simulation runs might elucidate), it was omitted here.
Similar articles
-
Crider K, Williams J, Qi YP, Gutman J, Yeung L, Mai C, Finkelstain J, Mehta S, Pons-Duran C, Menéndez C, Moraleda C, Rogers L, Daniels K, Green P. Crider K, et al. Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217. Cochrane Database Syst Rev. 2022. PMID: 36321557 Free PMC article.
-
Oberpriller J, de Souza Leite M, Pichler M. Oberpriller J, et al. Ecol Evol. 2022 Jul 24;12(7):e9062. doi: 10.1002/ece3.9062. eCollection 2022 Jul. Ecol Evol. 2022. PMID: 35898418 Free PMC article.
-
The future of Cochrane Neonatal.
Soll RF, Ovelman C, McGuire W. Soll RF, et al. Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12. Early Hum Dev. 2020. PMID: 33036834
-
Improving Methods for Discrete Choice Experiments to Measure Patient Preferences [Internet].
Ellis AR. Ellis AR. Washington (DC): Patient-Centered Outcomes Research Institute (PCORI); 2021 Mar. Washington (DC): Patient-Centered Outcomes Research Institute (PCORI); 2021 Mar. PMID: 38386769 Free Books & Documents. Review.
-
Desser AS, Arentz-Hansen H, Fagerlund BF, Harboe I, Lauvrak V. Desser AS, et al. Oslo, Norway: Knowledge Centre for the Health Services at The Norwegian Institute of Public Health (NIPH); 2017 Aug 25. Report from the Norwegian Institute of Public Health No. 2017-01. Oslo, Norway: Knowledge Centre for the Health Services at The Norwegian Institute of Public Health (NIPH); 2017 Aug 25. Report from the Norwegian Institute of Public Health No. 2017-01. PMID: 29553663 Free Books & Documents. Review.
Cited by
-
Territory and population attributes affect Florida scrub-jay fecundity in fire-adapted ecosystems.
Breininger DR, Stolen ED, Carter GM, Legare SA, Payne WV, Breininger DJ, Lyon JE, Schumann CD, Hunt DK. Breininger DR, et al. Ecol Evol. 2023 Jan 15;13(1):e9704. doi: 10.1002/ece3.9704. eCollection 2023 Jan. Ecol Evol. 2023. PMID: 36687801 Free PMC article.
-
Yi Y, Mardiastuti A, Choe JC. Yi Y, et al. Proc Biol Sci. 2023 Jun 28;290(2001):20230950. doi: 10.1098/rspb.2023.0950. Epub 2023 Jun 28. Proc Biol Sci. 2023. PMID: 37369349 Free PMC article.
-
Crudele I, Hauber ME, Reboreda JC, Fiorini VD. Crudele I, et al. R Soc Open Sci. 2023 May 10;10(5):221477. doi: 10.1098/rsos.221477. eCollection 2023 May. R Soc Open Sci. 2023. PMID: 37181795 Free PMC article.
-
Forthman M, Gordon ERL, Kimball RT. Forthman M, et al. R Soc Open Sci. 2023 Jun 28;10(6):230307. doi: 10.1098/rsos.230307. eCollection 2023 Jun. R Soc Open Sci. 2023. PMID: 37388308 Free PMC article.
-
Ghaffari A, Kildahl Lauritsen RE, Christensen M, Rolighed Thomsen T, Mahapatra H, Heck R, Kold S, Rahbek O. Ghaffari A, et al. JMIR Mhealth Uhealth. 2023 Jul 4;11:e44442. doi: 10.2196/44442. JMIR Mhealth Uhealth. 2023. PMID: 37283228 Free PMC article.
References
-
- Bain GC, Johnson CN, Jones ME. Chronic stress in superb fairy-wrens occupying remnant woodlands: are noisy miners to blame? Austral Ecology. 2019;44(7):1139–1149. doi: 10.1111/aec.12785. - DOI
-
- Bates D, Kliegl R, Vasishth S, Baayen H. Parsimonious mixed models. 2015a. https://arxiv.org/abs/1506.04967 https://arxiv.org/abs/1506.04967
Publication types
MeSH terms
Grants and funding
The work was supported by the National Science Foundation (NSF GRFP 2018268606). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
LinkOut - more resources
Full Text Sources
Research Materials