How to integrate user data and semantic information into product data - 3+1 Feature with Keynote Speaker Cathy Dolbear

June 24, 2016

Cathy Dolbear has been around for some time. Holding a DPhil in Information Engineering she has helped Sharp as well as Motorola to develop patented personalisation and multimedia technologies. In between she developed semantic web technologies in the geospatial domain for the British mapping agency Ordnance Survey where she also researched ontology design and multi-database interoperability. In early 2012 she joined the well-established Oxford University Press (OPU) as Senior Link Architect where she works on the semantic enrichment of journals and book content, to optimise OPU’s linking, and search engine results. The search engine optimisation of OPU’s services have become a crucial over the years since access analysis has shown that most OPU products are discovered by search engines instead of library discovery services (only 0.7 %).

Hence, at the SEMANTiCS 2016, Cathy will deliver her keynote on the prevailing topic of “Enriching content with user data and semantic information”.

Q1: Which application areas for semantic technologies do you perceive as most promising?

I think semantic technologies have a really important role to play in data integration, which of course is a key issue for many different applications. Making explicit the assumptions that have been made, and the context in which the joined-up data will be used, reduces misunderstandings and improves accuracy when combining data from different sources. One promising area of semantic integration in the publishing industry is the intersection between bibliographic and product data, the content semantics and the data generated by user interaction with the content. Combined, this information can drive more meaningful and personalised search, more targeted advertising, and for publishers themselves, a better understanding of what really holds our users’ interest.

In my view, the other promising application in journals publishing, particularly in the sciences, is the ability to understand and harness not only what the article is and where it’s been published (bibliographic data) but what it’s about (hypotheses, methods, findings, conclusions, and supporting research data). Rather than discovering an article based on its title, authors, and keywords in the traditional way, giving a journal article some semantic structure enables researchers to search for those articles that support a particular conclusion, or have used a particular scientific method. The challenge is to capture that structure accurately and in a way that doesn’t put too much burden on the author whilst writing.

Q2: What is your vision of semantic technologies and artificial intelligence?

Certain AI techniques such as machine learning and natural language processing are really useful for semantic technologies, for example to extract concepts from textual data. However, since we are dealing with such high volumes of data in the applications I work with, the role of logic and reasoning is very limited. So I see the two disciplines of semantic technologies and AI as overlapping, but distinct.

Q3: How do you personally contribute to the advancement of semantic technologies?

My current role is really about moving semantic technologies into the mainstream of Oxford University Press’ publishing process, which has included a lot of outreach to explain the business benefits, as well as more technical development work to realise those benefits. For example, I’ve been working on an ontology of the concepts that Oxford University Press publishes about, and contributed to the RDFa tags embedded in many of the content pages of our online products. We’ve also just installed a Pool Party taxonomy manager, which has really helped integrate the 70-odd taxonomies being used in our business together, and enabled our team to map between them, and consolidate in several cases.

Q4: You've been actively developing solutions for multimedia & personalisation. You even hold some patents. Which semantic multimedia application would you love to see to become reality?

I started off my career in multimedia applications research, later moving into semantic technologies. Right now, I’d love to see uptake of semantic technologies with Digital Asset Management systems – current systems can struggle with the need to annotate media assets in different ways depending on the context, which is something that semantics are tailor-made for!