The 12th International Semantic Web Conference
and the 1st Australasian Semantic Web Conference
21-25 October 2013, Sydney, Australia

PigSPARQL: A SPARQL Query Processing Baseline for Big Data

Alexander Schätzle, Martin Przyjaciel-Zablocki, Thomas Hornung and Georg Lausen
In this paper we discuss PigSPARQL, a competitive yet easy to use SPARQL query processing system on MapReduce that allows adhoc SPARQL query processing on large RDF graphs out of the box. Instead of a direct mapping, PigSPARQL uses the query language of Pig, a data analysis platform on top of Hadoop MapReduce, as an intermediate layer between SPARQL and MapReduce. This additional level of abstraction makes our approach independent of the actual Hadoop version and thus ensures the compatibility to future changes of the Hadoop framework as they will be covered by the underlying Pig layer. We revisit PigSPARQL and demonstrate the performance improvement when simply switching the underlying version of Pig from 0.5.0 to 0.11.0 without any changes to PigSPARQL itself. Because of this sustainability, PigSPARQL is an attractive long-term baseline for comparing various MapReduce based SPARQL implementations which is also underpinned by its competitiveness with existing systems, e.g. HadoopRDF.
Voting ID: 
Paper Download: 
Poster Download: