Exchange and bringing together overlapping or complementing information from various sources, applications, and perspectives has been a major issue in both commercial and scientific domains. Several aspects can be identified that complicate the realization of a solution to the problem one of the largest being the way in which information is offered. Differences in formats or structures of information as well as differences in vocabulary provide hurdles for the interoperability and integration of information.
In the past 5-10 years, Semmtech has been working on a so called SEMMweb Data Cloud-solution, making full use of Semantic Web-technology in an advanced and innovative manner. A Data Cloud is a coherent set of information that can be used by different software from different suppliers unbound by the location where the information is stored. A Data Cloud can be a single autonomous set of information or a collection of different sets which link information into a coherent whole. A Data Cloud can describe any business domain, like for instance civil engineering products, throughout its life-cycle, while different parts (subsets) of the data are managed by different parties in the supply chain, with different software from different vendors. Information in a Data Cloud is retrievable via the Internet and can be (re)used by different parties to add data about any object, contributing to one big ’cloud’ of data that covers parts of or even the whole range of references and attributes related to an object.
An interesting case that combines some of the most relevant components of the Data Cloud-solution in a working prototype is the V-con project. The prime focus of this prototype is to tackle a set of interoperability challenges as set by two National Road Authorities (i.e. Rijkswaterstaat and Trafikverket) in a European project by the name of Virtual Construction for Roads (V-Con). Specifically in this project, Semmtech is partnering with the global engineering firm Arcadis.
The Open PHACTS project ( http://www.openphacts.org/) has built a platform for drug discovery that integrates data over diverse sets of public chemistry and biological data. It currently connects linked open data from 12 different data sources, including chemical compounds, protein targets, biological pathways and tissues, and diseases. The diversity and size and of the Open PHACTS data are growing rapidly, and it contains currently more than 3 billion triples. The Open PHACTS project is a unique collaboration between European academic groups, small businesses and large pharmaceutical companies, partially funded by the EU. The driver for the project is to enable scientists to easily access and process data from multiple sources to solve real-world drug discovery problems that were very difficult to solve before. These drug discovery problems formed the basis for selecting what public data sources were integrated in the Open PHACTS project. Anyone can freely access the Open PHACTS data through a well documented API, and numerous workflows to answer specific biomedical questions have been developed and published using the KNIME and Pipeline Pilot pipelining tools. In addition, several custom applications have been built using the API. Open PHACTS has shown that Linked Open Data in the form of RDF triples can be used effectively by the scientific community, and allows queries that were previously very difficult or impossible to run. Future directions include the integration of additional public data sources, integration of internal company data with Open PHACTS data, and the continued development of workflows for scientific questions that can only be answered using linked data.
This work was done as a part of MSc thesis research.