SPARQL endpoint for SUNCAT

As we explored how to extend access to the metadata contributed by a set of libraries using the SUNCAT service in order to promote discovery and reuse of the data, it soon became clear that Linked Data was one of the preferred format to enable this.

The previous phase of this project developed a transformation to express the information on holdings in a RDF model. The XSLT produced converts MARC-XML into RDF/XML. This XSLT transformation was used to process over 1,000,000 holdings records made available by the British Library, the National Library of Scotland, the University of Bristol Library, the University of Nottingham Library, the University of Glasgow Library and the library of the Society of Antiquaries of London in order to make them available through a Linked Data SPARLQ endpoint interface.

Setting up the Triplestore

We build on previous experience at EDINA on providing SPARQL endpoints to set up the interface for the SUNCAT Linked Data.

We chose the 4Store application which is fully open source, efficient, scalable, and provides a stable RDF database. Our experience is that it is also simpler to install than other products. We installed 4Store on an independent host in order to keep this application separate from other services for security and easy maintenance.

Loading the data

The data contributed by each library was processed separately. First, the data was extracted from SUNCAT following any given restrictions placed by the specific library. It was then transformed into RDF/XML and finally loaded in the triplestore. Each of these steps can be fairly time consuming according to the size of the data file. Once the data from each library has been added to the triplestore, queries can be made accross the whole RDF database.

APIs

A HTTP server is required to provide external acces and allow querying of the triplestore. 4Store includes a simple SPARQL HTTP protocol server which answers SPARQL 1.1 queries. Once the server is running, you can query the triplestore using:

  1. A machine to machine  API at http://sparql1.edina.ac.uk:8181/sparql/.
  2. A basic GUI is available at: http://sparql1.edina.ac.uk:8181/test/. 

GUI

The functionality of the basic test GUI is rather limited and only enables SELECT, CONSTRUCT, ASK and DESCRIBE operations. In order to customise the interface and provide additional information like example queries, we used an open source SPARQL frontend designed by Dave Challis called SPARQLfront and available on github. SPARQLfront is a PHP and Javascript based frontend and can be installed on top of a default Apache2/PHP server. It supports SPARQL 1.0.

An improved GUI is available at: http://sparql1.edina.ac.uk:8181/endpoint/.

The DiscoverEDINA SUNCAT SPARQL endpoint GUI provides four sample queries to help the user with the format and syntax required to compose correct SPARQL queries. For example, one of the queries is:

Is the following title (i.e. archaeological reports) held anywhere in the UK? 

SELECT ?title ?holder
WHERE {
        ?j foaf:primaryTopic ?pt.
        ?pt dc:title ?title;
            lh:held ?h.
        ?h lh:holder ?holder.

        FILTER regex(str(?title), "archaeological reports", "i")
      }

The user is provided with a box in which to enter queries. Syntax highlight is provided to help with composition.  The user can also select whether to display the namespaces in the box or not. There is a range of output formats that can be selected:

  • SPARQL XML (the default)
  • JSON
  • Plain text
  • Serialized PHP
  • Turtle
  • RDF/XML
  • Query structure
  • HTML table
  • Tab Separated Values (TSV)
  • Comma Separated Values (CSV)
  • SQLite database

The SPARQL endpoint GUI is ideal for running interactive queries, developing or troubleshooting queries to be run by the m2m SPARQL API or used in conjunction with the SRU target.