1. Mitigating Prediction Error of Deep Learning Streamflow Models in Large Data‐Sparse Regions With Ensemble Modeling and Soft Data.
- Author
-
Feng, Dapeng, Lawson, Kathryn, and Shen, Chaopeng
- Subjects
- *
DEEP learning , *STREAMFLOW , *HYDROLOGIC cycle , *SOIL moisture , *FLOW simulations , *STREAM measurements - Abstract
Predicting discharge in contiguously data‐scarce or ungauged regions is needed for quantifying the global hydrologic cycle. We show that prediction in ungauged regions (PUR) has major, underrecognized uncertainty and is drastically more difficult than previous problems where basins can be represented by neighboring or similar basins (known as prediction in ungauged basins). While deep neural networks demonstrated stellar performance for streamflow predictions, performance nonetheless declined for PUR, benchmarked here with a new stringent region‐based holdout test on a US data set with 671 basins. We tested approaches to reduce such errors, leveraging deep network's flexibility to integrate "soft" data, such as satellite‐based soil moisture product, or daily flow distributions which improved low flow simulations. A novel input‐selection ensemble improved average performance and greatly reduced catastrophic failures. Despite challenges, deep networks showed stronger performance metrics for PUR than traditional hydrologic models. They appear competitive for geoscientific modeling even in data‐scarce settings. Plain Language Summary: Large areas of land exist on all continents where we do not have access to daily streamflow rate measurements, but predictions are still needed to plan for a changing climate. In such a scenario, if we apply hydrologic models calibrated elsewhere, how reliable are they? Previously, hydrologists studied the problem of prediction in ungauged basins (PUB), where a basin needing predictions can be represented by neighboring or similar basins. There has not been ample recognition that prediction in large, contiguously data‐sparse regions, which we call prediction in ungauged regions (PUR), is a substantially more difficult problem than PUB. Using a new benchmark problem with 671 US basins, we highlight this danger using machine learning models: performance dropped significantly from PUB to PUR scenarios. We proposed multiple approaches to improve model robustness, including a novel input‐selection ensemble method which combines models with different input options, as well as assimilating more widely available data. Despite the challenges with PUR, the machine learning model appears to be the best available tool for PUB and PUR applications. Perhaps counterintuitively, deep networks seem to be highly competitive even in data‐scarce settings, and geoscientific domains should carefully consider the extrapolation performance of their models. Key Points: We highlight increased model error with predictions in contiguously data‐scarce (or ungauged) regions (PUR) and propose a streamflow benchmarkWe introduce methods to mitigate error, including an input‐selection ensemble and integrating satellite data or flow distributionsWith these methods, our machine learning PUR model presents metrics competitive with basin‐by‐basin calibrated hydrologic models in the conterminous United States [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF