Semantic enrichment of Volunteered Geographic Information using Linked Data: a use case scenario for disaster management

Previous year's Nominees

This work was done as a part of MSc thesis research.
Web 2.0 data deluge provoked by the development of collaborative tools has affected numerous domains. In the context of the crowdsourcing of geographic information, the concept of Volunteered Geographic Information (VGI) has emerged. However, the quality and usability of VGI is a subject of a debate. Data often comes unstructured with unknown accuracy and lacking reliability. Semantic integration of VGI with relevant entities in the Linked Open Data (LOD) cloud has been seen as a remedy to overcome weakness of a crowdsourced data. The LOD cloud makes it possible to semantically enrich unstructured user-generated content with structured information presented in the LOD resources.
This project questions to what extent the Linked Open Data cloud can help to semantically enrich volunteered geographic information in order to better answer queries in the context of crisis and disaster relief operations.
Data produced by the Ushahidi project during the Chilean earthquake of 2011 has been chosen as an example of a disaster related VGI. The data was plagued with drawbacks common to most user-generated content. Inherent in VGI messiness and inconsistency of the data, together with ambiguous georeferencing hampered consistent data management. Because of this, valuable EM information was locked in the initial data set.
In general, the work implied a construction of the proof of concept. The first two steps have included a conversion of the data into the Resource Description Framework (RDF) using vocabularies and establishing of semantic links to relevant LOD entities. Workflow diagram for these steps can be seen in Figures.pdf attached to the submission.
The third steps in the workflow dealt with evaluation of the data management techniques emerged as a result of the semantic enrichment. The use of the Management of a Crisis vocabulary has increased semantic interoperability of the original data. As a result, it became possible to apply multi criteria filtering to the Ushahidi reports based on their semantics.
Semantic enrichment achieved via established links to LinkedGeoData entities has helped to overcome ambiguous georeferencing of the data thus allowing a robust spatial dimension to the data. These improvements made it possible to access DBpedia entities via spatial relations. As a result, comprehensive queries could be constructed. For instance, it became possible to prioritize the reports based on the density of population or to extract useful information about local amenities, official names, infrastructural objects, etc. In other words, integration of VGI with LOD provides mechanism to access contextual information thus increasing situational awareness.
In addition, in order to investigate what tools are able to assist in construction and validation of a SPARQL query for a SPARQL endpoint several software products were tested. Many SPARQL endpoint implementations had such a service as a default capability, for instance SPARQLer in Apache Jena. It was very useful to know which part of a query had errors. However, despite of the fact that some endpoint implementations also provided a GUI (iSPARQL for instance) allowing the use of predefined types, query forms and predicates from common ontologies, such assistance was concluded to be useless or unnecessary. This conclusion was drawn from the experience of mastering SPARQL queries. The learning curve of SPARQL was quite steep and by the point when the author had understood the structure and mechanism of SPARQL it was not an issue to manipulate with needed prefixes and triple patterns without any assistance.
Nerveless, it was clear that for non-experts interaction with SPARQL endpoints would be a problem. For this reason, SPEX (Spatio-temporal content explorer) was tested. SPEX (Scheider et al., 2015) is a prototype software, which helped non-experts to explore the content of SPARQL endpoints or their syntax along all dimensions of space, time and theme. In contrast to existing query assistants such as IsaViz (SPARQLViz), ViziQuer, Sgvizler, SparqlFilterFlow, SPEX did not require any a-priory knowledge about SPARQL or the content of a data set to be queried.
The user interface of SPEX consists of three window parts, namely the query pane (upper left), which is used to construct query patterns, two display-filters (upper right) to set spatio-temporal constraints on nodes and the results pane (lower left). A user interacts only with graphical objects and experiences immediate feedback from the manipulations. As a result, data exploration and querying are done in parallel thus a user learns about the concepts and the data.
The work has shown that the LOD cloud can be perceived as a giant informational skeleton. Scattered and disconnected blobs of unstructured data, being attached to this skeleton, acquire an integrated dataspace where standardized methods of data access and manipulation such as SPARQL can be used. Moreover, it was concluded that non-experts required additional assistance in the interaction with SPARQL endpoints. SPEX software provided a good example of the system that allowed integrating exploration and querying of a data set in a tightly closed loop.
Despite of the fact, the work dealt with the disaster-related VGI, the demonstrated approach can be applied to any VGI. The workflow developed in this work can be automated or semi-automated in order to ensure efficient data conversion and further semantic enrichment.

Stanislav Ronzhin

ITC
Hengelosestraat 99
7514AE Enschede
Netherlands