FREME: Services for Multilingual Enrichment of Digital Content

Time: 
Monday, September 12, 2016 - 09:00 to 17:00
Place: 
Campus Augustinum

Contact Person:  Milan Dojchinovski, milan.dojchinovski@informatik.uni-leipzig.de

 

Motivation and Topics of Interest

Linked Data (LD) and Language Technologies (LT) gained great attention in the last decade. Solutions such as machine translation, terminology extraction and entity recognition already achieved certain maturity level for exploitment in different business scenarios. In spite of the increased popularity there are several problems with the current solutions that are starting to surface; integration cost is normally high due to lack of interoperable solutions, a lack of support for various content formats, lack of adaptability to real-world scenarios and last but not least, localization of the current solutions is not well understood or met. FREME addresses these challenges and provides framework for multilingual and semantic enrichment of digital content. The FREME framework comprises a set of e-services designed from the perspective of real world business needs and validated into four business cases.

In this tutorial, we guide you thourgh the framework and e-services design and architecture, and provide the basis for semantic and multilingual processing of digital content with real-world scenarios. In particular, the tutorial will tackle the following questions:

  • What is FREME and what are the key advantages of using FREME?
  • What is NIF, how NIF plays an important role in FREME, and what are the latest developments in NIF 2.1?
  • How to consume each FREME e-service and process digital content?
  • How to adapt FREME to your specific business scenario?
  • How to contribute to FREME?

We will answer these questions in a practical way, by means of examples and hands-on exercises. The tutorial will be organized in several sessions, each one covering one of such topics. Each section will be divided in a theoretical introduction and a practical session. The practical work will consist in completing some short guided examples proposed by the speakers. All the instructional material, data and software required to follow the session will be available online beforehand in the tutorial webpage.

It is welcome, but not requirement for the attendees to be familiar with the basic notions of RDF. No previous experience on Linked Data is required and no prior knowledge on NLP techniques or computational linguistics is required.

Schedule

Time Slot Title Scope
09:00 - 09:30 Session 0 Welcome and general introduction to FREME, by Milan Dojchinovski (INFAI), and Felix Sasaki (DFKI) Hellos and brief introduction to the FREME project and the framework.
09:30 - 10:00 Session 1 The NIF format - General overview and its role in FREME, by Milan Dojchinovski The NIF format play an important role in the FREME framework. In this session we will briefly introduce NIF and show how it is used across the FREME services.
10:00 - 10:30 Session 2 FREME NER and its adaptability,by Milan Dojchinovski, INFAI In this session we will present FREME NER. We will describe its main features and show how it can be adapted according to the dataset and domain needs. During the session the participants will get chance to try its core features.
10:30 - 11:00 coffee break  
11:00 - 11:30 Session 3 E-Translation and e-Terminology, by Andis Lagzdiņš, Tilde The session will provide an introduction to enriching data with e-Terminology and e-Translation. We will show usage cases and introduce potential benefits of these e-Services as well as inform about technical details of using both e-services. As translations can benefit from terminology, the session will demonstrate how to improve the e-Translation result with e-Terminology service. Alongside the theoretical part there will be also practical tasks – preparing and posting data for enrichment with e-Translation and e-Terminology.
11:30 - 12:30 Session 4 FREME Ease of Integration, by Jan Nehring, DFKI We will talk about FREME’s capabilities to rapidly develop NLP workflows. FREME applies various approaches to reduce the complexity of NLP workflows which allows easy integration. Further it implements methods to allow a clear separation of roles between NLP experts and API users to reduce complexity NLP newbies. Also the presentation shows how FREMEs flexibility proved to be valuable in applications outside of the FREME consortium. Participants of this session will also create their own FREME pipeline during a hands-on session.
12:30 - 13:30 lunch break  
13:30 - 14:00 Session 5 e-Internationalization and Ocelot, by Phil Ritchie and Katia Iacomussi, Vistatec This session will explain and show how FREME uses a localization best practice: internationalization, to provide an easy way to translate and enrich structured web document formats such as HTML and XML. The session will demonstrate how this can be done in an informal setting using the FREME API directly, and then how the same can be acheived in a commercial industrial setting through an integration with the open source editor, Ocelot and an industrial standard and interoperable format called XLIFF.
14:00 - 15:00 Session 6 e-Publishing, CKEditor and Batch Processing, by Gerald Haesendonck, iMinds This hands-on session focuses on three applications on top of FREME that arose from the publishing world. We will create an E-book from web pages using e-Publishing. We will demonstrate a FREME plugin for CKEditor, an online text editor, that allows you to translate the text using e-Translation or enrich it using e-Entity and e-Link. Finally, we will use a tool that applies FREME services on files, without you having to call the services.
15:00 - 15:30 coffee break  
15:30 - 16:30 Session 6 NIF 2.1 - latest developments and standardization discussion, by Markus Ackermann, INFAI During the lifetime of FREME NIF has undergone an extension which makes it more flexible. This session will drive through the improvements and finally we will have discussion on standardization of NIF.
16:30 - 17:00 Final words and wrap-up Feedback, summary and wrap-up of the tutorial!

 

Tutorial Speakers

Milan Dojchinovski, works as a senior researcher at the Institute for Applied Informatics (InfAI) in Germany and as a assistant professor at the Czech Technical University in Prague (CTU). Milan has strong experience in the computer industry in the Czech Republic, Germany and Slovenia. His research interests are in Semantic Web, Web services and NLP technologies. He was working on FP7 EU projects LinkedTV and LOD2 and the H2020 FREME project. Milan is member of the W3C LD4LT and Open Annotation groups. He holds a MSc. degree in Computer Science from the University of Maribor in Slovenia.

Felix Sasaki, joined the W3C in 2005 to work in the Internationalization Activity until March 2009. In 2012 he rejoined the W3C team as a fellow on behalf of DFKI (German Research Center for Artificial Intelligence). He was co-chair of the MultilingualWeb-LT Working Group and co-editor of the Internationalization Tag Set (ITS) 2.0 specification. He is currently engaged in the DKT and FREME projects. His main field of interest is the application of Web technologies for representation and processing of multilingual information.

Phil Ritchie, is Chief Technology Officer at Vistatec, an indigenous, privately owned Language Services Provider headquartered in Dublin. Phil directs and drives all Language Technology and Research and Development activities. Phil has a Batchelor of Science Degree and 20 years of industry experience at senior management and director levels. Phil is a frequent speaker at industry conferences and an active member of several industry associations. Phil was a founding industrial member of the ADAPT Centre and serves as Chairman of its Industrial Advisory Board. Phil is a partner in the European Commission funded FREME Project. Phil is the lead architect of, and is lead contributor of the Ocelot open source project. Current priority research topics for Phil are Natural Language Parsing, Linked Data and Semantics.

Markus Ackermann, works as research assistant at the AKSW - KILT group in Leipzig. Before joining AKSW group, he was involved in several Digital Humanities projects by the NLP department and the DH department at Leipzig University as student assistant worker and also supplemented an introductory NLP lecture with hands-on sessions. He received his BSc. Computer Science at Leipzig University in 2013. Markus' research interests are in the area of NLP and Knowledge Representation & Reasoning. In the course of his activities he also developed interests in Functional Programming Approaches, Machine Learning, Software-Container Virtualisation (Docker) and Comparative Morphology and Syntax.

Katia Iacomussi, graduated from the University La Sapienza in Aerospace Engineering with a dissertation in Computational Fluid Dynamics. After that she attended a post-graduated course in Space Transportation Systems at La Sapienza and collaborated with the company ELV, European Launch Vehicle, designing and implementing algorithms for simulating the behaviour of engineering systems. She then worked as a Java developer in Rome, Milan and Munich specializing in J2EE technologies and after that she moved to Dublin, Ireland where she works as a Java developer at Vistatec company, being engaged in the e-internationalization of FREME.

Gerald Haesendonck, works as a researcher in the Data Science Lab at Ghent University - iMinds in Belgium. His activities focus on Semantic Web, NLP technologies and knowledge processing in general. Gerald holds a MSc. degree in Computer Science and worked several years as a software engineer in private companies before joining the lab. He is currently involved in several projects including the European Commission  funded FREME project.

Jan Nehring works as a senior software developer at DFKI (German Research Centre for Artificial Intelligence) where he coordinates the softwared development efforts in the FREME project. He is interested in Natural Language Processing, Software Development and big data processing. Before coming to DFKI Jan worked as a researcher at Technical University of Berlin and as a freelance software developer. Jan holds a M.Sc. in Computer Science from the Technical University of Berlin and a B.Sc. in Applied System Sciences from the University of Osnabrück.

Andis Lagzdiņš is a developer at Tilde for more than 10 years working in multilingual language technology projects. For the last 4 years he has been a lead developer in the terminology group, leading the development of terminology services. His responsibility also is to ensure Tilde’s terminological and Machine Translation service availability via FREME server. Andis has a M.Sc. in Computer Science from the Riga Technical University.

 

Intended audience

Developers and industry representatives - providing insight into how language technologies can be used in business use cases and how the FREME framework can support such use cases.