1. Improving river water quality prediction with hybrid machine learning and temporal analysis.
- Author
-
del Castillo, Alberto Fernández, Garibay, Marycarmen Verduzco, Díaz-Vázquez, Diego, Yebra-Montes, Carlos, Brown, Lee E., Johnson, Andrew, Garcia-Gonzalez, Alejandro, and Gradilla-Hernández, Misael Sebastián
- Subjects
ARTIFICIAL neural networks ,TIME series analysis ,WATER quality ,SUPPORT vector machines ,WATER analysis - Abstract
River systems provide multiple ecosystem services to society globally, but these are already degraded or threatened in many areas of the world due to water quality issues linked to diffuse and point-source pollutant inputs. Water quality evaluation is essential to develop remediation and management strategies. Computational tools such as machine learning based predictive models have been developed to improve monitoring network capabilities. The model's performance is reduced when datasets composed of reductant information are used for training, on the other hand, the selection of most representative and variable water quality scenarios could result in higher precision. This study analyzed historical water quality behavior in the Santiago River, Mexico, to identify the most variable and representative data available to train machine learning models (Adaptive Neuro Fuzzy Inference System – ANFIS, Artificial Neural Network – ANN, and Support Vector Machine - SVM). Thirteen monitoring sites were clustered according to their water quality variability from 2009 to 2022. Subsequently, a Time Series Analysis (TSA) was used to select the most representative monitoring station from each cluster. Data for 6/13 monitoring sites were retained for the Best Training Subset (BTS) used to train restricted models that performed with similar (ANN and SMV) or higher (ANFIS) prediction accuracy (in terms of RMSE, MAE, MSE and R
2 ) for both training and testing. This study provides evidence of water quality data containing redundant information that is not useful to improve machine learning model performance, in turn leading to overtraining. Combined analytical approaches can maximize the representativeness and variability of data selected for machine learning applications, leading to improved prediction. [Display omitted] • Selection of the most variable scenarios for training can improve predictive models' precision. • The most representative water quality scenarios were efficiently selected by CA and TSA. • Training with a reduced dataset performed a better prediction than using a complete dataset. • Using larger dataset for training does not necessarily result in enhanced model performance. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF