The Water Health Open Knowledge Graph

Scientific Data volume 12, Article number: 274 (2025) Cite this article

Abstract

Global sustainability challenges have recently led to an increasing interest in the management of water and health resources. Thus, the availability of effective, meaningful and open data is crucial to address those issues in the broader context of the Sustainable Development Goals of clean water and sanitation as targeted by the United Nations. In this paper, we present the Water Health Open Knowledge Graph (WHOW-KG) along with its design methodology and analysis on impact. Developed in the context of the EU-funded WHOW (Water Health Open Knowledge) project, the WHOW-KG is a semantic knowledge graph that models data on water consumption, pollution, extreme weather events, infectious disease rates and drug distribution. Indeed, it aims at supporting a wide range of applications: from knowledge discovery to decision-making, making it a valuable resource for researchers, policymakers, and practitioners in the water and health domains. The WHOW-KG consists of a network of five ontologies and related linked open data, modelled according to those ontologies. As a fully distributed system, it is sustainable over time, can handle large datasets, and allows data providers full control, establishing it as a vital European asset in the fields of water consumption and pollution.

Similar content being viewed by others

Occurrences, sources and health hazard estimation of potentially toxic elements in the groundwater of Garhwal Himalaya, India

Article Open access11 August 2023

Evaluating water-related health risks in East and Central Asian Islamic Nations using predictive models (2020–2030)

Article Open access22 July 2024

Examining awareness, implementation, and challenges of sustainable development goal 6 in rural Osun State, Nigeria

Article Open access06 January 2026

Introduction

Interest in water and sanitation management has grown in recent years driven by global sustainability challenges that prioritise, among the others, clean water and sanitation, as outlined in the UN Sustainable Development Goals1.

To provide effective responses to these global issues, the availability of high quality and open data becomes an essential requirement. However, the heterogeneity and complexity of water and health data, when available, can pose significant challenges. Not only data is heterogeneous both in format and in semantics, but mostly it does not guarantee, at any level, the FAIR principles2, designed to assess to what extent data is Findable, Accessible, Interoperable, and Reusable. FAIR principles aim at enhancing data sharing and reuse in both human and machine contexts. More specifically, findable refers to the fact that both data and metadata should be easy to locate for both humans and machines. This includes assigning persistent identifiers (e.g. DOIs) and ensuring metadata is richly described and indexed in searchable repositories. Accessible is about the use of standardised protocols to retrievable data and metadata. Instead, interoperable refers to the use of standardised formats, vocabularies, ontologies, and frameworks to ensure compatibility with other datasets, tools, and workflows, facilitating integration across disciplines. Finally, reusable refers to the specification of rich and detailed metadata to describe data, by including clear licensing terms, and adhere reproducible processes hence supporting reuse by third parties. However, some research studies3,4,5 show that (open) data are often not findable, and not accessible nor interoperable. This claim is especially relevant since the FAIR principles currently do not include detailed guidelines on data or software quality, nor do they address issues of trustworthiness or content interoperability—gaps that ontologies can help bridge6. Furthermore, the absence of clear licensing frameworks makes it common to encounter datasets with unspecified licenses, rendering direct reuse of the data impossible7. In response, only a few ontological modelling works have emerged to represent this fragmented knowledge within a FAIR framework, aiming to cater to the need for coverage of heterogeneous datasets in the international landscape.

This paper introduces the Water Health Open Knowledge Graph (WHOW-KG), which is the first European open distributed knowledge graph aimed at linking, using a common semantics, data on water consumption and quality with health parameters (e.g., infectious diseases rates, general health conditions of the population). Designed to understand the impact of water-related climate events, water quality, and water consumption on health, it provides a harmonised data layer that can be re-used for analysis, research, and development of innovative services and applications. The project’s primary driver was to establish a sustainable methodology for open knowledge graph production to ensure authoritativeness, timeliness, semantic accuracy, and consistency data quality characteristics, as well as metadata compliance with the European DCAT-AP profile8 and related national and thematic extensions.

The WHOW-KG currently consists of more than 100 millions of RDF triples from 19 selected datasets according to three use cases. The WHOW-KG is distributed and it is available via three SPARQL endpoints: two endpoints available from two data providers, i.e. Lombardy Region (https://lod.dati.lombardia.it/sparql) and ISPRA – Italian National Institute of Environmental Research (https://dati.isprambiente.it/sparql), and one endpoint from CNR – Institute of Cognitive Sciences and Technologies (https://semscout.istc.cnr.it/sparql). The Lombardy region was included in this project as one of the consortium partners (i.e. ARIA SpA) is the in-house company of the Lombardy region responsible for creating, managing, and curating open data on behalf of the region. Furthermore, Lombardy is recognised for its excellence in open data production. Those open data include extensive datasets covering microbiological, chemical, and physical parameters of water. Additional data from the Region’s Agency for Environmental Protection (ARPA) and its epidemiological observatory contribute to a comprehensive overview of the topics covered by WHOW, from bathing water quality to infectious diseases and associated health services. All the resources from the Lombardy Region are licensed under the Creative Commons Public Domain License (CC0) and the ones from ISPRA under the Creative Commons Attribution 4.0 International (CC-BY 4.0) License.

In summary, this paper presents the following contributions:

  • An analysis of the five WHOW ontologies: the Hydrography ontology, the Water Monitoring ontology, the Water Indicator Ontology, the Weather Monitoring ontology, and the Health Monitoring ontology; including a review of the state of the art in terms of similar works in both domains of water and health;
  • The WHOW-KG and a discussion of its impact;
  • design methodology to support data providers in the publication of FAIR, highly extensible and sustainable Linked Open Data.

CLICK HERE FOR MORE INFORMATION

https://www.nature.com/articles/s41597-025-04537-4?

Identifying Trustworthiness Challenges in Deep Learning Models for Continental-Scale Water Quality Prediction

Xiaobo XiaXiaofeng LiuJiale LiuKuai FangLu LuSamet OymakWilliam S. CurrieTongliang Liu

Water quality is foundational to environmental sustainability, ecosystem resilience, and public health. Deep learning offers transformative potential for large-scale water quality prediction and scientific insights generation. However, their widespread adoption in high-stakes operational decision-making, such as pollution mitigation and equitable resource allocation, is prevented by unresolved trustworthiness challenges, including performance disparity, robustness, uncertainty, interpretability, generalizability, and reproducibility. In this work, we present a multi-dimensional, quantitative evaluation of trustworthiness benchmarking three state-of-the-art deep learning architectures: recurrent (LSTM), operator-learning (DeepONet), and transformer-based (Informer), trained on 37 years of data from 482 U.S. basins to predict 20 water quality variables. Our investigation reveals systematic performance disparities tied to process complexity, data availability, and basin heterogeneity. Management-critical variables remain the least predictable and most uncertain. Robustness tests reveal pronounced sensitivity to outliers and corrupted targets; notably, the architecture with the strongest baseline performance (LSTM) proves most vulnerable under data corruption. Attribution analyses align for simple variables but diverge for nutrients, underscoring the need for multi-method interpretability. Spatial generalization to ungauged basins remains poor across all models. This work serves as a timely call to action for advancing trustworthy data-driven methods for water resources management and provides a pathway to offering critical insights for researchers, decision-makers, and practitioners seeking to leverage artificial intelligence (AI) responsibly in environmental management.

Comments:Accepted by Nexus (Cell Press). 61 pages, 24 figures, 2 tables
Subjects:Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Cite as:arXiv:2503.09947 [cs.LG]
 (or arXiv:2503.09947v3 [cs.LG] for this version)
 https://doi.org/10.48550/arXiv.2503.09947Focus to learn more

Submission history

From: Xiaobo Xia [view email]
[v1] Thu, 13 Mar 2025 01:50:50 UTC (9,275 KB)
[v2] Sun, 15 Jun 2025 11:47:43 UTC (9,274 KB)
[v3] Sat, 25 Oct 2025 01:57:51 UTC (22,023 KB)

CLICK HERE TO READ MORE

https://arxiv.org/abs/2503.09947?

ChatGPT ‘drinks’ a bottle of fresh water for every 20 to 50 questions we ask, study warns

By Rosie Frost

Published on 20/04/2023 – 13:13 GMT+2•Updated 21/04/2023 – 8:19 GMT+2

There is still very little research into the environmental impact of AI.

For every 20 to 50 questions ChatGPT is asked, it “drinks” a bottle of water according to new research.

OpenAI’s AI chatbot has soared in popularity thanks to its uncanny ability to accurately answer our questions. After being made available to the public for testing last November, it has been used for everything from poetry to coding and even answering exam questions meant for medical students.

But despite billions of users around the world, there’s still very little research on what environmental impact AI like this is having.

A new study from researchers at the University of Colorado Riverside and the University of Texas Arlington in the US gives some insight into its water consumption. The paper has not yet been peer-reviewed and has been shared ahead of its publication.

Its authors say that the “water footprint” of these AI models has so far “remained under the radar”.

How do AI chatbots use water?

The study’s water consumption figures refer to fresh clean water used by data centres to generate electricity and cool the racks of servers.

Most of the prominent chatbots’ cloud computing relies on thousands of servers inside data centres around the world. Computers are used to train algorithms known as ‘models’ to perform tasks like answering questions from users.

During a 20 to 50 question conversation with the AI chatbot, they estimate it could “drink” a 500ml bottle of water.

A bottle of water might not seem like much but ChatGPT has billions of users.

“While a 500ml bottle of water might not seem too much, the total combined water footprint for inference is still extremely large, considering ChatGPT’s billions of users,” the researchers say.

Scientists believe that while training GPT-3 alone, Microsoft may have consumed an incredible 700,000 litres of water.

More complex next-generation models like GPT-4 could consume even more during training, they say, but there is hardly any publicly available data with which to make an accurate estimate.

Companies need to ‘take responsibility and lead by example’

The study’s authors have urged companies to “take social responsibility and lead by example” to address their water footprint in the face of global shortages.

Earlier this year, a landmark report on water economics said that demand is expected to outstrip the supply of fresh water by 40 per cent by the end of this decade. The report from the Global Commission on the Economics of Water said that all industries need to overhaul their wasteful practices.

The study’s authors are also asking for more data transparency so that the environmental impact of these AI systems can be better assessed through research like this.

“AI models’ water footprint can no longer stay under the radar – water footprint must be addressed as a priority as part of collective efforts to combat global water challenges,” they conclude.

OpenAI didn’t immediately respond to Euronews’ request for comment and Microsoft declined to comment on the study.

CLICK HERE FOR MORE INFORMATION

https://www.euronews.com/green/2023/04/20/chatgpt-drinks-a-bottle-of-fresh-water-for-every-20-to-50-questions-we-ask-study-warns