1. A multiple linear regression model with multiplicative log-normal error term for atmospheric concentration data
- Author
-
Liao, Kezheng, Park, Eun Sug, Zhang, Jie, Cheng, Linjun, Ji, Dongsheng, Ying, Qi, Yu, Jianzhen, Liao, Kezheng, Park, Eun Sug, Zhang, Jie, Cheng, Linjun, Ji, Dongsheng, Ying, Qi, and Yu, Jianzhen
- Abstract
The homoscedasticity assumption (the variance of the error term is the same across all the observations) is a key assumption in the ordinary linear squares (OLS) solution of a linear regression model. The validity of this assumption is examined for a multiple linear regression model used to determine the source contributions to the observed black carbon concentrations at 12 background monitoring sites across China using a hybrid modeling approach. Residual analysis from the traditional OLS method, which assumes that the error term is additive and normally distributed with a mean of zero, shows pronounced heteroscedasticity based on the Breusch–Pagan test for 11 datasets. Noticing that the atmospheric black carbon data are log-normally distributed, we make a new assumption that the error terms are multiplicative and log-normally distributed. When the coefficients of the multilinear regression model are determined using the maximum likelihood estimation (MLE), the distribution of the residuals in 8 out of the 12 datasets is in good accordance with the revised assumption. Furthermore, the MLE computation under this novel assumption could be proved mathematically identical to minimizing a log-scale objective function, which considerably reduces the complexity in the MLE calculation. The new method is further demonstrated to have clear advantages in numerical simulation experiments of a 5-variable multiple linear regression model using synthesized data with prescribed coefficients and lognormally distributed multiplicative errors. Under all 9 simulation scenarios, the new method yields the most accurate estimations of the regression coefficients and has significantly higher coverage probability (on average, 95% for all five coefficients) than OLS (79%) and weighted least squares (WLS, 72%) methods. © 2020 Elsevier B.V.
- Published
- 2021