SUNCAT open data

First problem: getting permission from contributing libraries to allow their data to be re-distributed.  Fortunately for me that’s not my problem, and some sterling work from other members of the team has allowed some data to be released without strings.

Libraries who allow some of their data out into the wild usually have a stipulation that it can be any record they’ve contributed that doesn’t originate from such-and-such source, or has been created by them, or similar.

In practice, this means using records from particular libraries that have a particular library code in 040$a or don’t have a particular code in 035$a.  These types of rules could be added automatically at a live filtering stage, but in order to be utterly sure nothing untoward is being released we have chosen to extract those data and build a separate database from those alone.

So, once you get past the problem of libraries allowing their data to be distributed freely (which we haven’t 😉 ) you then need to allow clients to usefully connect and retrieve the data.  Two approaches are being taken for this.

The first, is to produce an SRU target onto the database of (permitted) records.  We have a lot of experience with IndexData’s open source Zebra product which is a database and Z39.50/SRU frontend all in one.  It can be quite fiddly to configure (which is where the experience comes in handy!) but its performance (speed and reliability) is excellent.  It also allows multiple output formats for the records using XSLT.

One of the most useful outcomes from the Linked Data Focus project was an XSLT produced by Will Waites that converts MARC-XML into RDF/XML.  We can use this as one of the outputs from the SRU target, alongside MARC-XML (although some libraries have a requirement that their records not be released in MARC-XML, in which case the XSLT just blanks these records when requested in MARC-XML), a rudimentary MODS transformation, and a JSON transformation might be a possibility too.

Perhaps more usefully for the RDF/XML data, the second approach is to feed these into a SPARQL endpoint.  This should allow anyone interested in the linked data RDF to query in a language more familiar to the linked data world.

We’ll be providing more information on how to connect to the SRU target and the SPARQL endpoint once we’ve polished them up a bit for you.