As part of the UK vision for supporting open discovery principles in relation to education materials, JISC has sponsored a number of projects to assist in discovering and enriching existing resources.Tagger (variously referred to previously as GTR or geotagger), was one strand of the JISC funded umbrella DiscoverEDINA project. For the two other strands to this work see here.
The primary purpose of Tagger is to assist in enriching and exposing ‘hidden’ metadata within resources – primarily images and multimedia files. Images for example embed a lot of descriptive and technical metadata within the file itself and very often it is not obvious that the main focus of interest – the image, is carrying a ‘secret’ payload of information some of which may be potentially compromising. For example, the recent embarrassment suffered by Dell after a family member uploaded images to social media sites with embedded location information, thus frustrating the efforts of a multi-million pound security operation. Or take the case of the US military when an innocently uploaded photograph of a new assignment of Apache helicopters led to their destruction when insurgents used the location information embedded in the image to precisely locate and destroy the helicopters.
There are many other instances of people being innocently or naively caught out by these ‘hidden’ signposts in resources that they distribute or curate. Tagger helps by providing tools to expose those hidden features and makes it easy to review, edit and manage the intrinsic metadata routinely bundled in resources. It has concentrated on, but not been limited to, geotags.
Tagger has delivered three main things:
- A basic web service API based around ExifTool, suitable for 3rd party use to geo-tag/geo-code image, audio, and video metadata.
- A demo web site enabling user upload, metadata parsing (from resource) and metadata enrichment (map based geo-tagging/geo-coding);
- An Open Metadata corpus of geo-tagged/geo-coded enriched records with a REST based query interface. Currently, this corpus consists of approximately a quarter of a million creative commons licensed geotagged images mainly bootstrapped from Geograph.
Tagger supports the open discovery metadata principles and has made extensive use of open licensing models.
Along the way we started thinking about specific use cases. The ‘anonymise my location’ seemed an obvious case and Tagger’s API and website reflect that thinking. Additionally, in talking to colleagues involved in field trips it was clear that there was potential in providing integrated tooling and we experimented with Dropbox integration.
Taking this further and building on EDINA’s more general mobile development work, we then started to think about how Tagger could be used to assist and enrich in-field data capture use cases and post-trip reflective learning. We continue to explore this beyond the project funding as the enrichment facilities Tagger provides allows for flexible integration into 3rd party services and projects.
Of course, Tagger will never be a complete panacea for all the ills of metadata nor should it aim to be. However by building on best-of-breed opensource tools (Exiftool) Tagger, or more accurately the Tagger API, provides a facility for other service providers and projects to make use of to enable better manipulation and management of those ‘hidden’ metadata.
Therein lies the rub – the perennial question of embedding and take up.
That’s are next challenge.
First problem: getting permission from contributing libraries to allow their data to be re-distributed. Fortunately for me that’s not my problem, and some sterling work from other members of the team has allowed some data to be released without strings.
Libraries who allow some of their data out into the wild usually have a stipulation that it can be any record they’ve contributed that doesn’t originate from such-and-such source, or has been created by them, or similar.
In practice, this means using records from particular libraries that have a particular library code in 040$a or don’t have a particular code in 035$a. These types of rules could be added automatically at a live filtering stage, but in order to be utterly sure nothing untoward is being released we have chosen to extract those data and build a separate database from those alone.
So, once you get past the problem of libraries allowing their data to be distributed freely (which we haven’t 😉 ) you then need to allow clients to usefully connect and retrieve the data. Two approaches are being taken for this.
The first, is to produce an SRU target onto the database of (permitted) records. We have a lot of experience with IndexData’s open source Zebra product which is a database and Z39.50/SRU frontend all in one. It can be quite fiddly to configure (which is where the experience comes in handy!) but its performance (speed and reliability) is excellent. It also allows multiple output formats for the records using XSLT.
One of the most useful outcomes from the Linked Data Focus project was an XSLT produced by Will Waites that converts MARC-XML into RDF/XML. We can use this as one of the outputs from the SRU target, alongside MARC-XML (although some libraries have a requirement that their records not be released in MARC-XML, in which case the XSLT just blanks these records when requested in MARC-XML), a rudimentary MODS transformation, and a JSON transformation might be a possibility too.
Perhaps more usefully for the RDF/XML data, the second approach is to feed these into a SPARQL endpoint. This should allow anyone interested in the linked data RDF to query in a language more familiar to the linked data world.
We’ll be providing more information on how to connect to the SRU target and the SPARQL endpoint once we’ve polished them up a bit for you.
After a little hiatus caused by the vacation season we are happy to announce a few updates:
- we have a little demo site that allows users to upload their images and to expose/view/edit the embedded metadata. You can update any writeable metadata tags back into he image or export is as a sidecar XMP file. The website also exposes previously uploaded images and these can be browsed on the map by right clicking and choosing ‘Nearby Images’. We’ve also been experimenting with Dropbox integration allowing users to save their images into their own Dropbox accounts. This is still rough around the edges but flags up the direction of travel. As always, we are happy to receive feedback..
- the website is ultimately powered by a backend web service. The API for that is now at version 1 and includes the new Dropbox methods.
Our primary goal with 1. above has been to showcase the middleware API of 2. show its not intended to be a GeoFlickr competitor, rather an illustration of what you can do using the API. Our Dropbox thinking has been influenced by talking to people in the community who run field work courses where often students need to take photographs of things in the field and then have access to them at a later stage for annotation or reflective study.
One last change of note. The name. We have now settled for the more generic name ‘Tagger’. Its taken a while (the full project!!) but I think we are all mostly happy with the choice – its not too explicitly ‘geo’ and yet it remains descriptive of capabilities.