URank: A Prolog System for Collecting and Merging University Rankings for Comparison Using Web Extraction Techniques and Entity Linking through DBpedia / Wikipedia


URank is a Prolog application that collects University ranking data from various ranking lists in the web, uniquely identifies University entities using DBpedia, merges ranking data in order to be used for comparisons and exports all of the above datasets as linked open data.
University rankings are conducted by various organizations, such as news media, websites, governments, academics and private corporations and have gained increasing attention due to huge financial and political interests. The rankings are based on different criteria and collect data in various ways. As a result, there is a large divergence in the specific rankings of different institutions. In order to compare rankings so that safe conclusions about their reliability are drawn, data from the sites of different such ranking lists must be collected and fused.
In this project we have developed such a Prolog application, called URank, using SWI-Prolog (http://www.swi-prolog.org/), that a) extracts the data them from the various ranking list web sites using web data extraction techniques, utilizing the DEiXTo (http://deixto.com/) web data extraction tool to build the extraction rules, b) uniquely identifies the University entities within the above lists by linking them to the DBpedia (http://wiki.dbpedia.org/) linked open data set, using a combination of the DBpedia lookup service, the DBpedia SPARQL endpoint and the Wikipedia text search, along with various domain-dependent geographical, temporal and University naming heuristics, and c) constructs a combined data set by merging the individual ranking list data sets using their DBpedia URI as a primary key. All data extracted from each individual dataset and the merged dataset are exported as linked open data (RDF).
More details about the project can be found at the following publication (also attached):
N. Bassiliades, “Collecting University Rankings for Comparison Using Web Extraction and Entity Linking Techniques”, ICT in Education, Research and Industrial Applications, V. Ermolayev et al. (Ed.), Springer-Verlag, CCIS, Vol. 469, pp. 23-46, 2014.
(Available at: http://link.springer.com/chapter/10.1007%2F978-3-319-13206-8_2#)
The system is available for testing at: http://lpis.csd.auth.gr/systems/URank.rar
UnRAR in a folder and read README.txt. Needs SWI-Prolog installation (http://www.swi-prolog.org/Download.html)

Nick Bassiliades

Aristotle University of Thessaloniki
Department of Informatics
54124 Thessaloniki