Michele Pasin


Linked data experience at Springer Nature

Building discovery services for scientific and scholarly content on top of a semantic data model

This talk provides a summary and reflection on how we think that Semantic Technologies are an effective way to do enterprise metadata management at web scale – essentially, being able to bring some order to the chaos resulting from multiple applications working on similar data domains.

Springer Nature is a leading publisher the Science & Scholarly area, which includes flagship journals like Nature and Scientific American, several titles under the Nature and Nature Reviews brands, plus several other products such as the Springer Book collections, Springer Journals and Springer Corporate Databases. It's a very diversified scenario which includes more than 3000 journals plus of course many other publication types like books, blogs or podcasts.

As a result of the digital revolution and the internet, new products are being created at a much faster rate than it has ever happened before - which has led the company in the last years to recognize the need to develop an integration layer that can bring together data from any of the applications that power the specific products we have on offer. In particular, we need interoperability both at the level of naming conventions - so to facilitate communication within the enterprise when people talk about articles or subject areas - and at a more formal semantic level, via a shared metadata model implemented as a set of ontologies.

To this end, we have been using Semantic and Linked Data technologies since 2012, when we lauched a prototype open platform called data.nature.com. Subsequently, we have been working on various other projects including the nature ontologies portal (http://www.nature.com/ontologies, 2015) and the springer conferences portal (http://lod.springer.com, 2015). More generally, our focus has increasingly shifted from external data publishing to our internal systems – in particular we aimed at creating an architecture where RDF is core to the publishing workflow as much as XML is. In this talk we would like to provide an overview of this exciting and interesting journey, which is now taking us to the creation of a new platform that combines content, science and people data from across Springer Nature and that will be launched in late 2016.


The challenges and lessons learned we will touch on include:   

  • how to build knowledge models and data architectures which aim at leveraging sem tech within a traditionally XML-based publishing workflow: in other words, how to introduce these new technologies in such a way that the solve real problems without disrupting established workflows?
  • the importance a coherent metadata management and semantic integration solution can have for any enterprise looking at maintaining their competitive advantage in the knowledge society.
  • the value of making available open data to the scientific community (and the larger public), as a way to promote innovation and making it easier for scientists to do reseach. At the same time, the difficulties encountered in getting non data specialists involved with data which are instead encoded using Linked data standards.
  • challenges involved in identifying, indexing and transforming data coming from heterogeneous sources into a flexible yet coherent web-scalable metadata management layer.



Michele Pasin is an information and data architect with a focus on enterprise metadata management and semantic technologies. Michele currently works for Springer Nature, a publishing company resulting from the May 2015 merger of Springer Science+Business Media and Holtzbrinck Publishing Group's Nature Publishing Group, Palgrave Macmillan, and Macmillan Education.

He has recently taken up the role of lead data architect for the knowledge graph project, an initiative whose goal is to bring together various preexisting linked data repositories, plus a number of other structured and unstructured data sources, into a unified, highly integrated knowledge discovery platform. Before that, he worked on projects like nature.com’s subject pages (a dynamic section of the website that allow users to navigate content by topic) and the nature.com ontologies portal (a public repository of linked open data).

He holds a PhD in semantic web technologies from the Knowledge Media Institute (The Open University, UK) and advanced degrees in logic and philosophy of language from the University of Venice (Italy). Previously, he was a research associate at King's College Department of Digital Humanities (London), where he developed on a number of cultural informatics projects such as the People of Medieval Scotland and the Art of Making in Antiquity.


Online Portfolio: http://www.michelepasin.org/projects/


Lead data architect