Nominees
Nowadays, business processes are being digitized across all industries. Just-in-time manufacturing and mass customization generate vast amounts of data at a faster pace than ever. Specialization and outsourcing multiply the number of actors involved in business exchanges. Data management is adapting to these trends: data quality is assured proactively and data is increasingly considered a strategic asset.
The problem of integrating data from different systems receives ever-increasing attention. Identifying the main terms across heterogeneous data sources by finding a consensus between the developers and defining a shared vocabulary is an effective approach to tackle this problem. However, this process, which we refer to as distributed vocabulary development, can be quite complex. In fact, the main challenge for vocabulary engineers is to work collaboratively on a shared objective in a harmonic and efficient way, while avoiding misunderstandings, uncertainty, and ambiguity
We present VoCol, an integrated environment that supports the development of vocabularies using version control systems. We implemented VoCol using a loose coupling of validation, querying, analytics, visualization, and documentation generation components. VoCol is a core component of the Industrial Data Space initiative. It supports a fundamental round-trip model of vocabulary development, consisting of the three core activities modeling, population, and testing.
In the spirit of test-driven software engineering, VoCol allows to formulate queries, which represent competency questions for testing the expressivity and applicability of a vocabulary a priori. Modeling comprises the analysis and conceptualization of the domain and the specification of the vocabulary terms, such as classes, properties, and the relationships between them. The creation of this terminology is realized using a logical formalism during the modeling activity. VoCol integrates a number of techniques facilitating the conceptual work, such as automatically generated documentations and visualizations providing different views on the vocabulary as well as an evolution timeline supporting traceability. Once the vocabulary modeling has been completed, the next activity is typically population. It includes the addition of actual data in line with the defined classes and properties. For population, VoCol supports the integration of mappings between data sources and the vocabulary, including R2RML mappings to relational databases.
The governance of distributed vocabulary development is supported by the access control as well as branching and merging mechanisms of the underlying VCS system. As a result, VoCol bridges between the conceptual development of vocabularies and the operational execution in a concrete IT landscape. The implementation of VoCol is based on a loose coupling, leveraging the webhook method provided by many VCSs with tools and techniques focusing on particular aspects of vocabulary development. By providing Vagrant and Docker containers bundling all tools and encapsulating dependencies, VoCol is easily deployable or even usable as-a-service in conjunction with arbitrary VCS installations
VoCol has been successfully applied in industrial use cases to enable semantic data integration over multiple heterogeneous data sources. It facilitates the development and maintenance of vocabularies that are based on standards and the intellectual property of the industrial partners. VoCol’s loosely coupled architecture allows for the easy integration of additional features, such as components to support the definition of mappings between the developed vocabularies and legacy data sources of industry systems. Users are thus enabled to execute queries against multiple data sources of the legacy system, and gain new insights from the integrated data.
The semantic integration of heterogeneous data sources, as supported by VoCol, can significantly increase the data quality and ease the data access. Ultimately, it can lead to new business models and applications as well as an enhanced traceability throughout the supply and value chain.