Back to Search
Start Over
Explaining the Shortcomings of Log‐Transforming the Dependent Variable in Regression Models and Recommending a Better Alternative: Evidence From Soil CO2 Emission Studies.
- Source :
- Journal of Geophysical Research. Biogeosciences; May2021, Vol. 126 Issue 5, p1-18, 18p
- Publication Year :
- 2021
-
Abstract
- Log‐transforming the dependent variable of a regression model, though convenient and frequently used, is accompanied by an under‐prediction problem. We found that this underprediction can reach up to 20%, which is significant in studies that aim to estimate annual budgets. The fundamental reason for this problem is simply that the log‐function is concave, and it has nothing to do with whether the dependent variable has a log‐normal distribution or not. Using field‐observed data of soil CO2 emission, soil temperature and soil moisture in a saturated‐specification of a regression model for predicting emissions, we revealed that the under‐predictions of the log‐transformed approach were pervasive and systematically biased. The key determinant of the problem's severity was the coefficient of variation in the dependent variable that differed among different combinations of the values of the explanatory factors. By applying a parsimonious (Gaussian‐Gamma) specification of the regression model to data from four different ecosystems, we found that this under‐prediction problem was serious to various extents, and that for a relatively weak explanatory factor, the log‐transformed approach is prone to yield a physically nonsensical estimated coefficient. Finally, we showed and concluded that the problem can be avoided by switching to the nonlinear approach, which does not require the assumption of homoscedasticity for the error term in computing the standard errors of the estimated coefficients. Plain Language Summary: The goal of this study is to persuade empirical researchers to switch from a conventional practice of log‐transforming the dependent variable in a regression model to a nonlinear approach, because the conventional practice has a pervasive and systematically biased under‐prediction problem that can be quite serious. For many decades, this problem was mistakenly assumed to result from the dependent variable being log‐normally distributed and hence could not be properly corrected by an adjustment factor derived from this assumption. Using the examples of predicting soil CO2 emission from soil temperature and soil moisture in four ecosystems, we showed (1) that the fundamental reason for this problem is the concavity of the log‐function, (2) that the under‐predictions by the conventional practice were indeed pervasive and systematically biased, and (3) that the under‐prediction problem was quite serious, but could be avoided by switching to a nonlinear approach. Key Points: Log‐transforming the dependent variable of a regression model has a pervasive and systematically biased under‐prediction problemThe problem is due to the concavity of the log‐function and has nothing to do with whether the dependent variable has a log‐normal distribution or notThe under‐prediction was up to 20% when soil CO2 emission was predicted by soil temperature and moisture in four ecosystems, but can be avoided by using a nonlinear approach [ABSTRACT FROM AUTHOR]
- Subjects :
- CARBON dioxide
SOILS
REGRESSION analysis
LOGNORMAL distribution
SOIL temperature
Subjects
Details
- Language :
- English
- ISSN :
- 21698953
- Volume :
- 126
- Issue :
- 5
- Database :
- Complementary Index
- Journal :
- Journal of Geophysical Research. Biogeosciences
- Publication Type :
- Academic Journal
- Accession number :
- 150539966
- Full Text :
- https://doi.org/10.1029/2021JG006238