Back to Search
Start Over
Unmasking the sky: high-resolution PM 2.5 prediction in Texas using machine learning techniques.
- Source :
-
Journal of exposure science & environmental epidemiology [J Expo Sci Environ Epidemiol] 2024 Sep; Vol. 34 (5), pp. 814-820. Date of Electronic Publication: 2024 Apr 01. - Publication Year :
- 2024
-
Abstract
- Background: Although PM <subscript>2.5</subscript> (fine particulate matter with an aerodynamic diameter less than 2.5 µm) is an air pollutant of great concern in Texas, limited regulatory monitors pose a significant challenge for decision-making and environmental studies.<br />Objective: This study aimed to predict PM <subscript>2.5</subscript> concentrations at a fine spatial scale on a daily basis by using novel machine learning approaches and incorporating satellite-derived Aerosol Optical Depth (AOD) and a variety of weather and land use variables.<br />Methods: We compiled a comprehensive dataset in Texas from 2013 to 2017, including ground-level PM <subscript>2.5</subscript> concentrations from regulatory monitors; AOD values at 1-km resolution based on images retrieved from the MODIS satellite; and weather, land-use, population density, among others. We built predictive models for each year separately to estimate PM <subscript>2.5</subscript> concentrations using two machine learning approaches called gradient boosted trees and random forest. We evaluated the model prediction performance using in-sample and out-of-sample validations.<br />Results: Our predictive models demonstrate excellent in-sample model performance, as indicated by high R <superscript>2</superscript> values generated from the gradient boosting models (0.94-0.97) and random forest models (0.81-0.90). However, the out-of-sample R <superscript>2</superscript> values fall within a range of 0.52-0.75 for gradient boosting models and 0.44-0.69 for random forest models. Model performance varies slightly across years. A generally decreasing trend in predicted PM <subscript>2.5</subscript> concentrations over time is observed in Eastern Texas.<br />Impact Statement: We utilized machine learning approaches to predict PM <subscript>2.5</subscript> levels in Texas. Both gradient boosting and random forest models perform well. Gradient boosting models perform slightly better than random forest models. Our models showed excellent in-sample prediction performance (R <superscript>2</superscript> > 0.9).<br /> (© 2024. The Author(s), under exclusive licence to Springer Nature America, Inc.)
Details
- Language :
- English
- ISSN :
- 1559-064X
- Volume :
- 34
- Issue :
- 5
- Database :
- MEDLINE
- Journal :
- Journal of exposure science & environmental epidemiology
- Publication Type :
- Academic Journal
- Accession number :
- 38561475
- Full Text :
- https://doi.org/10.1038/s41370-024-00659-w