Identifying Trustworthiness Challenges in Deep Learning Models for Continental-Scale Water Quality Prediction

Xiaobo XiaXiaofeng LiuJiale LiuKuai FangLu LuSamet OymakWilliam S. CurrieTongliang Liu

Water quality is foundational to environmental sustainability, ecosystem resilience, and public health. Deep learning offers transformative potential for large-scale water quality prediction and scientific insights generation. However, their widespread adoption in high-stakes operational decision-making, such as pollution mitigation and equitable resource allocation, is prevented by unresolved trustworthiness challenges, including performance disparity, robustness, uncertainty, interpretability, generalizability, and reproducibility. In this work, we present a multi-dimensional, quantitative evaluation of trustworthiness benchmarking three state-of-the-art deep learning architectures: recurrent (LSTM), operator-learning (DeepONet), and transformer-based (Informer), trained on 37 years of data from 482 U.S. basins to predict 20 water quality variables. Our investigation reveals systematic performance disparities tied to process complexity, data availability, and basin heterogeneity. Management-critical variables remain the least predictable and most uncertain. Robustness tests reveal pronounced sensitivity to outliers and corrupted targets; notably, the architecture with the strongest baseline performance (LSTM) proves most vulnerable under data corruption. Attribution analyses align for simple variables but diverge for nutrients, underscoring the need for multi-method interpretability. Spatial generalization to ungauged basins remains poor across all models. This work serves as a timely call to action for advancing trustworthy data-driven methods for water resources management and provides a pathway to offering critical insights for researchers, decision-makers, and practitioners seeking to leverage artificial intelligence (AI) responsibly in environmental management.

Comments:Accepted by Nexus (Cell Press). 61 pages, 24 figures, 2 tables
Subjects:Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:arXiv:2503.09947 [cs.LG]
 (or arXiv:2503.09947v3 [cs.LG] for this version)
 https://doi.org/10.48550/arXiv.2503.09947Focus to learn more

Submission history

From: Xiaobo Xia [view email]
[v1] Thu, 13 Mar 2025 01:50:50 UTC (9,275 KB)
[v2] Sun, 15 Jun 2025 11:47:43 UTC (9,274 KB)
[v3] Sat, 25 Oct 2025 01:57:51 UTC (22,023 KB)

CLICK HERE TO READ MORE

https://arxiv.org/abs/2503.09947?