We are happy to introduce another speaker at SEMANTiCS 2015. Fabian Heinemann from Roche will share his professional insights on semantic technologies in the pharma sector. Gain a first impression on his talk in this interview conducted by Nika Mizerski.
You are a distinguished semantic web expert and currently active as data scientist at Roche. Could you please describe how semantic technologies enable Big Data projects?
Fabian Heinemann: First of all I'd like to say that we often tend to avoid the term 'Big Data', since - in my opinion - it has been overused. Probably only the minority of projects with the 'Big Data' label fulfil its dogmatic definition. However, the underlying problems (and opportunities) are of course real and the term 'Big Data' certainly was (and still is) useful for drawing attention to these issues. One of my current use-cases for semantic technologies is data integration and normalization. This deals with the problem that data in a large cooperation is often scattered over various tools, comes in different formats and with different levels of quality. For data integration tasks, an advantage of semantic technology and in particular RDF is that it expresses information in a very simple and standardized manner. Therefore, it is very useful to combine information from various relational databases with different table schemas. Also, in many cases the content is not normalized but comes in the variety of natural language: Simple things such as "Diabetes mellitus" can be expressed in ten different spelling variants. Here, semantic technologies are well suited for data normalization tasks and allow mapping the ten different spelling variants to a single concept.
What makes it challenging today to be a data scientist? How would perfect working conditions look like to provide business-relevant data value?
Fabian Heinemann: In a field where you are surrounded by many buzzwords and high expectations, it is very important to separate the realistic technological promises from unrealistic ones and manage expectations of the customers. As a data scientist, you have to understand a large variety of fields. In my role, I have to cope with varying levels of project management, programming, machine learning, semantic technology, text-analytics and visualization. With respect to the perfect working conditions, it is important to have easy access to a variety of tools. Here, we prefer solution providers that apply standards, such as the semantic web standards by the W3C, in order to allow an easy transfer of data and reduce familiarization times. More generally, an open, discussion-friendly atmosphere is important, as well as a good mix of people ranging from technological experts to people who are good at understanding the business needs.
The pharma industry is knowledge-intense, data-driven and very diverse. How can semantic technologies provide corporations with a competitive advantage?
Fabian Heinemann: In my opinion the competitive advantage still needs to be shown, but the potential is definitely there. One possibility is to use semantic technologies to support text analytics systems. The healthcare industry requires automated tools to cope with the fast growth in publications in order retrieve the relevant information written in natural language. Also data enrichment is an important topic for the healthcare industry, for example to annotate target molecule data with information from various sources. In large health-care corporations like Roche, semantic technologies may also help building tools that bridge between the organizational units like Roche Pharmaceuticals, Diagnostics and the pre-clinical unit Roche pRED. And besides these, there are of course more visionary topics (but probably also more risky) such as ontology reasoning in biological networks.
Many thanks for this insights. We are looking forward to your full speech at the SEMANTiCs conference from 15th to 17th September in Vienna. Register now!