LODQuator - Linked Data Quality Assessment Monitor with Luzzu

Nominees

This widespread and rapid adoption of the Linked Data principles has brought an unprecedented dimension on the Web, contributing to the transformation of the Web of Documents to a Web of Data. Thanks to links between the data, one can jump from one source to another in order to retrieve more complete information and answers. Similarly to the Web of Documents, these sources, heterogeneous with regard to their domain, have highly varying quality [1]. Document quality is often only subjectively assessable, and indirect measures such page rank and HITS (hubs and authorities), which calculate the importance of a document vis-a-vis the Web (via links), give a good indication whether a document is of good quality or a good authoritative source. In a parallel situation, resources (the data) in the Web of Data are not simply text (or other HTML components such as tables, images) and links. For LOD datasets, indirect link related quality measures are much less meaningful (e.g. since they are even more prone to link spamming than on the Web) but at the same time a number of other more direct quality indicators exist. Data resources are usually a complex structures encompassing some existing thing (an object in the real world), giving it semantics (i.e. meaning) and possibly linking to other resources, that both machines and humans can understand. According to the editors of the Data on the Web Best Practices document, "data quality can affect the potentiality of the application that uses data, as a consequence, its inclusion in the data publishing and consumption pipelines is of primary importance"[2].

In this project we monitor a number of linked datasets that portrayed in the Linked Open Data cloud and assess them periodically, using Luzzu - a quality assessment framework for Linked Data [3], for their quality. We currently assess 27 different quality metrics and create quality metadata as Linked Data that can be linked to the datasets themselves for around 130 linked datasets. The LODQuator together with the Luzzu framework is the first known service that provides quality metadata based on the Dataset Quality Vocabulary (daQ) [4], that can be easily transformed (by an inference engine) into the upcoming W3C Data Quality Vocabulary (DQV) [5]. The web interface assist users in the analysis of data in a visual manner, providing (1) a faceted search facility that allows data consumers to explore, filter, and rank possible 'fit for use' datasets; (2) visualise the quality metadata of the assessed datasets; and (3) an endpoint for querying quality metadata. The exploration and ranking features allow users to search through quality assessed datasets according to their quality criteria fit for their use. Furthermore, quality metadata can be visualised as charts based on the data cube structure definition. A visualisation wizard helps the stakeholders to choose the right visualisation type and charts: (a) multiple datasets vs metric; (b) dataset vs metric over time; (c) quality of dataset. On the whole, the Luzzu web interface ensures that stakeholders (both publishers and consumers) can easily assess and analyse data quality from a single visual entry point.

[1] Pascal Hitzler and Krzysztof Janowicz - Linked Data, Big Data, and the 4th Paradigm.
[2] https://www.w3.org/TR/dwbp/
[3] Jeremy Debattista, Sören Auer, Christoph Lange. Luzzu - A Framework for Linked Data Quality Assessment.
[4] Jeremy Debattista, Christoph Lange, Sören Auer. - Representing dataset quality metadata using multi-dimensional views.
[5] https://www.w3.org/TR/vocab-dqv/

Jeremy Debattista - Enterprise Information Systems, University of Bonn/Fraunhofer IAIS

Enterprise Information Systems, Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS)
Schloss Birlinghoven
53757 Sankt Augustin
Germany