Back to Search Start Over

Unmasking the sky: high-resolution PM 2.5 prediction in Texas using machine learning techniques.

Authors :
Zhang K
Lin J
Li Y
Sun Y
Tong W
Li F
Chien LC
Yang Y
Su WC
Tian H
Fu P
Qiao F
Romeiko XX
Lin S
Luo S
Craft E
Source :
Journal of exposure science & environmental epidemiology [J Expo Sci Environ Epidemiol] 2024 Sep; Vol. 34 (5), pp. 814-820. Date of Electronic Publication: 2024 Apr 01.
Publication Year :
2024

Abstract

Background: Although PM <subscript>2.5</subscript> (fine particulate matter with an aerodynamic diameter less than 2.5 µm) is an air pollutant of great concern in Texas, limited regulatory monitors pose a significant challenge for decision-making and environmental studies.<br />Objective: This study aimed to predict PM <subscript>2.5</subscript> concentrations at a fine spatial scale on a daily basis by using novel machine learning approaches and incorporating satellite-derived Aerosol Optical Depth (AOD) and a variety of weather and land use variables.<br />Methods: We compiled a comprehensive dataset in Texas from 2013 to 2017, including ground-level PM <subscript>2.5</subscript> concentrations from regulatory monitors; AOD values at 1-km resolution based on images retrieved from the MODIS satellite; and weather, land-use, population density, among others. We built predictive models for each year separately to estimate PM <subscript>2.5</subscript> concentrations using two machine learning approaches called gradient boosted trees and random forest. We evaluated the model prediction performance using in-sample and out-of-sample validations.<br />Results: Our predictive models demonstrate excellent in-sample model performance, as indicated by high R <superscript>2</superscript> values generated from the gradient boosting models (0.94-0.97) and random forest models (0.81-0.90). However, the out-of-sample R <superscript>2</superscript> values fall within a range of 0.52-0.75 for gradient boosting models and 0.44-0.69 for random forest models. Model performance varies slightly across years. A generally decreasing trend in predicted PM <subscript>2.5</subscript> concentrations over time is observed in Eastern Texas.<br />Impact Statement: We utilized machine learning approaches to predict PM <subscript>2.5</subscript> levels in Texas. Both gradient boosting and random forest models perform well. Gradient boosting models perform slightly better than random forest models. Our models showed excellent in-sample prediction performance (R <superscript>2</superscript>  > 0.9).<br /> (© 2024. The Author(s), under exclusive licence to Springer Nature America, Inc.)

Details

Language :
English
ISSN :
1559-064X
Volume :
34
Issue :
5
Database :
MEDLINE
Journal :
Journal of exposure science & environmental epidemiology
Publication Type :
Academic Journal
Accession number :
38561475
Full Text :
https://doi.org/10.1038/s41370-024-00659-w