The Water Health Open Knowledge Graph

Scientific Data volume 12, Article number: 274 (2025) Cite this article

Abstract

Global sustainability challenges have recently led to an increasing interest in the management of water and health resources. Thus, the availability of effective, meaningful and open data is crucial to address those issues in the broader context of the Sustainable Development Goals of clean water and sanitation as targeted by the United Nations. In this paper, we present the Water Health Open Knowledge Graph (WHOW-KG) along with its design methodology and analysis on impact. Developed in the context of the EU-funded WHOW (Water Health Open Knowledge) project, the WHOW-KG is a semantic knowledge graph that models data on water consumption, pollution, extreme weather events, infectious disease rates and drug distribution. Indeed, it aims at supporting a wide range of applications: from knowledge discovery to decision-making, making it a valuable resource for researchers, policymakers, and practitioners in the water and health domains. The WHOW-KG consists of a network of five ontologies and related linked open data, modelled according to those ontologies. As a fully distributed system, it is sustainable over time, can handle large datasets, and allows data providers full control, establishing it as a vital European asset in the fields of water consumption and pollution.

Similar content being viewed by others

Occurrences, sources and health hazard estimation of potentially toxic elements in the groundwater of Garhwal Himalaya, India

Article Open access11 August 2023

Evaluating water-related health risks in East and Central Asian Islamic Nations using predictive models (2020–2030)

Article Open access22 July 2024

Examining awareness, implementation, and challenges of sustainable development goal 6 in rural Osun State, Nigeria

Article Open access06 January 2026

Introduction

Interest in water and sanitation management has grown in recent years driven by global sustainability challenges that prioritise, among the others, clean water and sanitation, as outlined in the UN Sustainable Development Goals1.

To provide effective responses to these global issues, the availability of high quality and open data becomes an essential requirement. However, the heterogeneity and complexity of water and health data, when available, can pose significant challenges. Not only data is heterogeneous both in format and in semantics, but mostly it does not guarantee, at any level, the FAIR principles2, designed to assess to what extent data is Findable, Accessible, Interoperable, and Reusable. FAIR principles aim at enhancing data sharing and reuse in both human and machine contexts. More specifically, findable refers to the fact that both data and metadata should be easy to locate for both humans and machines. This includes assigning persistent identifiers (e.g. DOIs) and ensuring metadata is richly described and indexed in searchable repositories. Accessible is about the use of standardised protocols to retrievable data and metadata. Instead, interoperable refers to the use of standardised formats, vocabularies, ontologies, and frameworks to ensure compatibility with other datasets, tools, and workflows, facilitating integration across disciplines. Finally, reusable refers to the specification of rich and detailed metadata to describe data, by including clear licensing terms, and adhere reproducible processes hence supporting reuse by third parties. However, some research studies3,4,5 show that (open) data are often not findable, and not accessible nor interoperable. This claim is especially relevant since the FAIR principles currently do not include detailed guidelines on data or software quality, nor do they address issues of trustworthiness or content interoperability—gaps that ontologies can help bridge6. Furthermore, the absence of clear licensing frameworks makes it common to encounter datasets with unspecified licenses, rendering direct reuse of the data impossible7. In response, only a few ontological modelling works have emerged to represent this fragmented knowledge within a FAIR framework, aiming to cater to the need for coverage of heterogeneous datasets in the international landscape.

This paper introduces the Water Health Open Knowledge Graph (WHOW-KG), which is the first European open distributed knowledge graph aimed at linking, using a common semantics, data on water consumption and quality with health parameters (e.g., infectious diseases rates, general health conditions of the population). Designed to understand the impact of water-related climate events, water quality, and water consumption on health, it provides a harmonised data layer that can be re-used for analysis, research, and development of innovative services and applications. The project’s primary driver was to establish a sustainable methodology for open knowledge graph production to ensure authoritativeness, timeliness, semantic accuracy, and consistency data quality characteristics, as well as metadata compliance with the European DCAT-AP profile8 and related national and thematic extensions.

The WHOW-KG currently consists of more than 100 millions of RDF triples from 19 selected datasets according to three use cases. The WHOW-KG is distributed and it is available via three SPARQL endpoints: two endpoints available from two data providers, i.e. Lombardy Region (https://lod.dati.lombardia.it/sparql) and ISPRA – Italian National Institute of Environmental Research (https://dati.isprambiente.it/sparql), and one endpoint from CNR – Institute of Cognitive Sciences and Technologies (https://semscout.istc.cnr.it/sparql). The Lombardy region was included in this project as one of the consortium partners (i.e. ARIA SpA) is the in-house company of the Lombardy region responsible for creating, managing, and curating open data on behalf of the region. Furthermore, Lombardy is recognised for its excellence in open data production. Those open data include extensive datasets covering microbiological, chemical, and physical parameters of water. Additional data from the Region’s Agency for Environmental Protection (ARPA) and its epidemiological observatory contribute to a comprehensive overview of the topics covered by WHOW, from bathing water quality to infectious diseases and associated health services. All the resources from the Lombardy Region are licensed under the Creative Commons Public Domain License (CC0) and the ones from ISPRA under the Creative Commons Attribution 4.0 International (CC-BY 4.0) License.

In summary, this paper presents the following contributions:

  • An analysis of the five WHOW ontologies: the Hydrography ontology, the Water Monitoring ontology, the Water Indicator Ontology, the Weather Monitoring ontology, and the Health Monitoring ontology; including a review of the state of the art in terms of similar works in both domains of water and health;
  • The WHOW-KG and a discussion of its impact;
  • design methodology to support data providers in the publication of FAIR, highly extensible and sustainable Linked Open Data.

CLICK HERE FOR MORE INFORMATION

https://www.nature.com/articles/s41597-025-04537-4?

Leave a comment