Taking data quality assurance as an opportunity and not only duty

August 19, 2016

As a head-up to the SEMANTiCS 2016 we invited several experts from Linked Enterprise Data Services (LEDS), a “Wachstumskern” project supported by the German Federal Ministry of Research and Technology (BMBF), to talk a bit about their work and visions. They will share their insights into the fields of natural language processing, e-commerce, e-government, data integration and quality assurance right here. This is part 7, so stay tuned.

Prof. Dr.Eng. Martin Gaedke is one of the leading German experts in the field of Web Engineering and Linked Data. He is Vice-Dean of the Faculty of Computer Science at the Technical University Chemnitz and Head of Chair of Distributed and Self-Organizing Systems, where he has been conducting research on optimized interaction between people, software services and cyber-physical systems for many years. He has published more than 170 publications on this topic. The focus of his research is the question: How can the cooperation of people and new technologies sustainably improve our lives in a highly networked society?

In addition to his academic obligations Martin Gaedke is also president of the International Society of Web Engineering (ISWE), consultant on digital transformation, agile management and ICT strategy, consultant for the European Commission, the European Research Council and other international conveyors as well as chief editor of the Journal for Web Engineering (JWE). He also supports WebID and social web activities at the World Wide Web Consortium (W3C) for a secure Web.
When exactly he has spare time, we really can’t tell you.


What does quality control / quality assurance mean in relation to the use of semantic data?

In a nutshell, data quality describes whether data is fit for its purpose. A simple example can be illustrated with customer data. The so-called master data are data about our customers, such as the address or e-mail address. If a customer support staff travels to a customer to fix a device, the address should be right in our database. This may sound trivial, but it is not. Even those very simple quality features often are not realized, because customer addresses or the master data are not systematically maintained. Or multiple databases with master data are in use that are not kept in sync. This is where quality control and quality assurance start - by establishing quality standards to monitor that these are complied with. In addition, processes need to be established to ensure this compliance with the quality standards. Of course, quality control / quality assurance considers much more than just addresses or master data - but addresses several dimensions of data quality, in particular with the four dimensions:

  • intrinsic quality of data that describes the value of the data itself,
  • context-related data quality, i.e. data requirements for specific situations,
  • and presentation and
  • availability aspects that touches system-dependent properties of the data quality.

What is the status quo when it comes to quality control / quality assurance?

In the last 10 years there has been talked a lot about the problem of data quality. The slogan "fitness for use" has prevailed in the literature for data quality and was versatile illuminated. A study from 2002 indicates that the United States annually lose estimated 600 billion US dollars due to poor data quality. Poor data quality has a direct impact on business success. In the scenario outlined above it is easy to see that customer satisfaction decreases when the customer service can not repair their Internet connection simply because the address is wrong. Additionally, unsuccessful business trips cost a lot of money.

Poor data quality is also typical on webpages where a customer can not find the right products because they use another terminology. For example, a few years ago customers couldn’t filter cars that can be fueled with E10 on an automobile manufacturer webpage, because the engineers had defined the fuels in the correct DIN designation, in this case, RON 95 E DIN 51626-1 instead of E10.
Poor data quality due to poor or often none quality control / quality assurance at all not only hurts customer satisfaction and creates unnecessary costs, but also leads to serious wrong decisions in regards to the market and long-term business strategies. Worryingly, the situation has not changed significantly since 2002, but the technology has become more complex and its penetration even more intense. In the past, many companies relied on their hierarchical structures to implement quality control / quality assurance. But the trend today is decentralization, that is the data gets widely distributed - for example for storage in clouds, or the processing by means of Software as a Service. This requires more considerations and regulations, for example on the legal issues of data storage. Furthermore, the amount of data increases significantly by the Internet of Things and the Internet of Services. In the past you might have had one data value of a specified quality grade. Now you are dealing with big data, which often means a lot of data of varying quality. That's not necessarily a bad thing. Quite on the contrary. But it requires a conscious rethinking as well as real changes in terms of processes and the strategic concept to establish data quality as a "competitive advantage".

What challenges do we need to tackle?

The key challenge for many companies is to further and faster continue the transformation into the information age. Companies need to understand how to collect data of their products, services and value chains. This can be achieved, for example, through the integration of information and communication technology in their products. For example the internet connection at the coffee machine, the internet-based protocolling of all relevant parameters during the implementation of a service, or the nearly complete linkage of industrial production with Information and Communication Technologies (ICT), as described by the concept of Industry 4.0. This is a considerable expense, if one restricts his data usage to the optimization of customer understanding, sales and production. The real challenge and chance is to use the data and gained knowledge to transform the current business model of the company so that the data can be facilitated for the central value proposition and the utilization itself.

The 2015 Data Management Industry Benchmark Report of the EDM Council draws a not very good picture of the implementation of quality control in (financial) companies. Even basic data management is mainly implemented just because of regulations (z. B. BCBS 239) and not because of its usefulness and necessity for business success.

Why are companies so reluctant?

The general perception of regulations is still mostly that they tell one what to do (mostly in a rather obscure language); and that means that they cause costs. Rarely, the "why" of such regulations becomes clear, namely the improvement of market conditions, transparency etc. - which, if properly implemented, can lead to cost reduction or market advantages. In addition, to many companies regulations are restrictions to development. They have been dealing with digital transformation for a short period of time only and are just beginning to discover and experiment with the new "possibilities of the information age". In this situation, regulations initially are a hindrance, as they complicate experiment and learning.


I think, overall, this can be explained with the fact that we (including us data experts) just slowly start to understand how precious our data are in different contexts - that is, which data quality can be very useful and valuable for some industries. Data quality is multi-dimensional, hence elusive and hardly comparable.

How are these challenges addressed in LEDS project?

Within LEDS an entire work package deals exclusively with the issue of data quality. The concept of data quality and public understanding of data quality has been extensively researched, analyzed and investigated. Of course, the focus of the work is how data quality in the field of Linked Data can be described, controlled and secured. For this purpose, we develop models and procedures to support the development of new software which will enable the use and high-quality distribution of data by means of Linked Data technologies. Technical support is also supplemented by components allowing for the necessary aspects of quality control, classification and link discovery.
These models, procedures and components are the supporting instruments for quality assurance in the LEDS Framework. This not only should enable companies to adapt their business models to Linked-enterprise data and utilize them. Realizing the idea of a comprehensive quality management should also keep their enterprise data "fit" for any business scenario that might prevail.

Partners

LEDS is a joint research project addressing the evolution of classic enterprise IT infrastructure to semantically linked data services. The research partners are the Leipzig University and Technical University Chemnitz as well as the semantic technology providers Netresearch, Ontos, brox IT-Solutions, Lecos and eccenca. 

brox IT-Solutions GmbH

Leipzig University

Ontos GmbH

TU Chemnitz

Netresearch GmbH & Co. KG

Lecos GmbH

eccenca GmbH

Supported by