SUNCAT Open Data – SRU

As part of the /open/ data strand of the SUNCAT bit of Discover EDINA, we have made available the individual library records that we have agreement to release.  At the time of writing this is:

National Library of Scotland, Glasgow University, British Library, Bristol University, Society of Antiquaries of London, Nottingham University.

In order to make these records available, we’ve opted for an SRU target, which is REST-ful.  In the first instance we’re intending users to use the SPARQL interface to run searches (see other post) and use the linked part of the data in the RDF incarnation of the records, and then use the SUNCAT ID to link through to the SRU target to extract the full MARC record (in most cases) should that be needed.

Since the target is a full blown SRU server there are actually a plethora of indices which are made over the MARC-XML records, but the one we anticipate being used most is that for the SUNCAT ID.  However, users are welcome to use the other indexes which will be detailed below.

In the first instance, the DiscoverEDINA SUNCAT SRU target can be found at

http://suncatdev.edina.ac.uk:31001/de_suncat

[EDIT 2014-05-13. The above URL should work but it is now preferred to use

http://m2m.edina.ac.uk/sru/de_suncat ]

so in order to get the MARC-XML format of a record with SUNCAT ID of “SC00374927310” you should send a CQL query of sc.id=SC00374927310 which goes into an SRU searchRetrieve request as:

http://suncatdev.edina.ac.uk:31001/de_suncat?operation=searchRetrieve&version=1.1&startRecord=1&maximumRecords=1&query=sc.id%3DSC00374927310

Remember that the number of records released under the Open Data umbrella is limited, so you won’t find every SUNCAT ID here, but you will find every one that’s in the SPARQL endpoint.

The response will be a bunch of XML that is an SRU Response, and it may contain records (about the same item) from multiple libraries. These records can be found in the Xpath zs:searchRetrieveResponse/za:records/zs:record/zs:recordData. The number of records found is always sent in the zs:searchRetrieveResponse/zs:numberOfRecords element and you can specify which and how many records to retrieve by varying the startRecord and maximumRecords parameters in the HTTP query string.

By default, records will be returned in MARC-XML, with the exception of British Library records, which (due to licensing issues) will always be returned in the RDF transformed version of the record.

Okay, so that’s the basics of grabbing a full MARC-XML record with a SUNCAT ID.  Now for the fun stuff (I’m using ‘fun’ in quite a broad sense of the word).

You can grab a (non-BL) record in five (yes, five) different XML schemata!  To do so, just append the parameter recordSchema=X where X is one of marc (also the default), rdf, mods, mads, dc.  This transforms the MARC-XML into one of the other formats using an XSLT transform.  The rdf one was created in our previous project, and the mods, mads and dc ones are from Indexdata’s zebra software (freely available from http://www.indexdata.com/zebra).  These are relatively simple but might be useful.

Even more fun: obviously we’re making the records search-and-retriev-able on the SUNCAT ID since the perceived workflow is to use SPARQL to query the SPARQL endpoint, obtain the links in the RDF records (including a SUNCAT ID), use that SUNCAT ID to obtain the full records of anything you’re interested in from the SRU server.  However, since this is a full-blown SRU server, we’ve actually got a full set of indexes, and you can use any valid CQL query combining the lot of them!

These indexes are designed to be as close as possible to the existing SUNCAT service Z39.50 target indexes.  In the SRU server some are prefixed with the “bib1“namespace and the rest with the “sc” namespace.  Here is a table of the bib1 indexes and their equivalent Z39.50 BIB-1 index:

bib1.date/time-last-modified = Date/time-last-modified
bib1.lc-card-number = LC-card-number
bib1.isbn = ISBN
bib1.number-music-publisher = Number-music-publisher
bib1.name = Name
bib1.author = Author
bib1.author-name-personal = Author-name-personal
bib1.dewey-classification = Dewey-classification
bib1.issn = ISSN
bib1.lc-call-number = LC-call-number
bib1.nlm-call-number = NLM-call-number
bib1.place-publication = Place-publication
bib1.publisher = Publisher
bib1.title-series = Title-series
bib1.identifier-standard = Identifier-standard
bib1.subject-heading = Subject-heading
bib1.number-govt-pub = Number-govt-pub
bib1.title = Title
bib1.any = Any
bib1.server-choice = Server-choice
bib1.date = Date
bib1.date-of-publication = Date-of-publication
bib1.title = Title
bib1.name = Name
bib1.author = Author
bib1.author-name-personal = Author-name-personal
bib1.title-uniform = Title-uniform
bib1.code-institution = Code-institution
bib1.note = Note
bib1.code-language = Code-language
bib1.publisher = Publisher
bib1.place-publication = Place-publication
bib1.code-geographic = Code-geographic
bib1.subject-heading = Subject-heading

These are the sc ones mapped to their equivalent SUNCAT service index, which are not well documented here and some will be duplicates of the bib1 indexes, but you’re free to play!  Almost certainly the mainly useful two are the SUNCAT ID index, SC_ID and the contributing library code index, SC_WIS.  The values for SC_WIS can be:

StEdNL (National Library of Scotland)
StGlU (Glasgow University)
Uk (British Library)
UkBrU-I (Bristol University)
UkLSAL (Society of Antiquaries of London)
UkNtU (Nottingham University)

Here are all the other sc indexes:

sc.id = SC_ID
sc.005 = SC_005
sc.010 = SC_010
sc.020 = SC_020
sc.022 = SC_022
sc.028 = SC_028
sc.035 = SC_035
sc.049 = SC_049
sc.aut = SC_AUT
sc.awt = SC_AWT
sc.ddc = SC_DDC
sc.gvd = SC_GVD
sc.ismn = SC_ISMN
sc.issn = SC_ISSN
sc.lcc = SC_LCC
sc.nlm = SC_NLM
sc.pla = SC_PLA
sc.pub = SC_PUB
sc.sbd = SC_SBD
sc.sgn = SC_SGN
sc.sici = SC_SICI
sc.sid = SC_SID
sc.srs = SC_SRS
sc.ssn = SC_SSN
sc.stidn = SC_STIDN
sc.stmd = SC_STMD
sc.sub = SC_SUB
sc.sud = SC_SUD
sc.sul = SC_SUL
sc.sum = SC_SUM
sc.tit = SC_TIT
sc.ttl = SC_TTL
sc.wrd = SC_WRD
sc.wyr = SC_WYR
sc.wti = SC_WTI
sc.wau = SC_WAU
sc.wut = SC_WUT
sc.wur = SC_WUR
sc.wnc = SC_WNC
sc.wfm = SC_WFM
sc.wtp = SC_WTP
sc.wgo = SC_WGO
sc.wct = SC_WCT
sc.wid = SC_WID
sc.wsd = SC_WSD
sc.ntl = SC_NTL
sc.wis = SC_WIS
sc.wst = SC_WST
sc.wuc = SC_WUC
sc.wucx = SC_WUCX
sc.wuco = SC_WUCO
sc.wno = SC_WNO
sc.wln = SC_WLN
sc.wpu = SC_WPU
sc.wpl = SC_WPL
sc.wsrs1 = SC_WSRS1
sc.wsrs2 = SC_WSRS2
sc.wga = SC_WGA
sc.wsu = SC_WSU
sc.wsm = SC_WSM