Posts Tagged ‘Jerome’

It’s son of Jerome! (Basic BibJSON in data.lincoln.ac.uk)

Posted on February 1st, 2013 by Paul Stainthorp

It’s been a while since we ran the Jerome project at the University of Lincoln, but it’s far from dead, and thanks to the recent leaps forward in establishing a proper data.lincoln.ac.uk (and data.ac.uk) portal, you can now access a permanent copy of our open catalogue data, at:

Screenshot from Data.Lincoln

Just as in the original Jerome application, this data is constantly harvested from our catalogue over a number of days, one record at a time in an endless cycle.

It’s a ‘minimally invasive’ method that doesn’t put too heavy a load on the catalogue itself, or require us to run any additional software on our catalogue server – and it means that, on average, no record in the open data is more than a couple of days out of date. The data harvested is stored in Nucleus before being processed and published to data.lincoln.ac.uk.

If you have any technical questions about the process, it’s worth contacting LNCD (specifically, Nick Jackson).

The biggest difference between the original Jerome and this new process is that Jerome scraped XML views of catalogue records from our web OPAC, while son-of-Jerome harvests the records one at a time over Z39.50, using the YAZ PHP extension. We’re also publishing the data this time as BibJSON, rather than MakeItUpAsWeGoAlongJSON.

There’s a lot more data to come, including:

  1. Richer bibliographic data on each item (it’s somewhat bare-bones at the minute!)
  2. Library item data (i.e. copies of particular works)
  3. Reading lists
  4. Repository records
  5. Usage and activity data

 

On open data licensing and sustainability

Posted on May 17th, 2012 by Paul Stainthorp
Last week I attended a free ‘licensing clinic‘ in Birmingham, organised by the Discovery programme – mainly as a means of kick-starting my brain into considering the copyright/licensing issues around the CLOCK project. Here are my notes.
  1. The Jerome project addressed licensing in April, 2011, and the situation hasn’t really changed for us: we’re still intending to expose as much of our bibliographic data as possible using a properly open licence such as CC0.
    • “The licensing of data is an interesting one, since we run into a whole bunch of questions around who actually owns the information in our catalogue. Since it’s all factual information (and you can’t copyright a fact) then surely it’s a free for all – except that EU law introduces a curve ball in the form of database right. Broadly speaking this provides specific protection for collections of records, but not the records themselves.”
  2. Ed Chamberlain and the COMET project also addressed licensing and the ownership of MARC records: work that we should revisit.
  3. The JISC Open Bibliographic Data Guide (obd.jisc.ac.uk) provides very clear advice and information useful in creating an open data business case. E.g.:
    • “[…]if we presume that the rationale for publication is to ensure the widest possible dissemination then adoption of a generic open data license (such as Open Data Commons or CC0) is the most effective way to make the set of potential uses unambiguous. Restrictive licenses are counter-productive[…]“
  4. There is some very helpful guidance coming out of the Discovery project around building a business case for open discovery. This was summarised at the recent Discovery programme meeting (also in Birmingham) by David Kay –
    • N.B. I’ll revisit this in a future blog post. I’m getting almost surprisingly interested in the problem of ‘selling’ the idea of open bib data to an institution, and I’ve found the Discovery work on business cases increasingly useful.
  5. At Lincoln in March, 2012, we had a very useful visit from Sander van der Waal of OSS Watch where we discussed the University of Lincoln’s approach to openness (Open Source, Open Access, as well as Open Data). Joss Winn is following this work up with the University’s IP manager with a view to writing a University policy on open licensing of our IP.
  6. Related to the ‘business case’ aspect is the work of LNCD (and also discussions I’ve had with Ed Chamberlain recently) about how to ensure sustainability of open services in a technical sense – what sort of systems architecture and processes do we need in place, and how do we work with university ICT support departments to ensure that projects become institutionally-supported services when it’s important for them to do so?
  7. At this, Birmingham event, Chris Banks of the University of Aberdeen presented about the benefits and challenges of sharing from a library director’s perspective. I was particularly interested in the metaphor of “metadata as currency”: how are aggregators creating value based on the mass accumulation of metadata, and how are they selling that value back to libraries? See Chris’s blog for more. Aberdeen are clearly doing a lot around the analysis of e-resources usage and relating it back to their library strategy / information literacy, etc.
  8. Paul Miller (Cloud of Data): one key quote “amateurs tend to do a better job of aggregating content than institutions” (e.g. collections of images on Flickr). This may be in part because individuals don’t have the same risk-averse approach, but whatever the reason
  9. Barrister Frances Davey gave us a quick run-through of IP law as it relates to data. Key quote: “the legal repercussions of publishing data openly are pretty much nil“. Fear and uncertainty poisons initiative! Frances also touched on the business / reputation-management arguments for having an active approach to open data: people might well be getting bad copies of your data already (via screenscraping) – release it yourself and take control of the quality. Example of the British Library choosing a CC0 licence precisely because of the lack of an attribution clause – then any subsequent re-use is “nothing more to do with us”.
  10. Then, after lunch, copyright consultant Naomi Korn ran a workshop on the practical aspects of choosing a licence for your data. Naomi spoke about the need to start by deciding how open you want to be as an institution (noting that institutions with a dedicated © person tend to have a greater appetite for risk) – then consider whether you have the resources in place to get where you want to be. Key quote: “Let’s do some attribution mapping!” Some link from Naomi’s workshop:
  11. At the Birmingham clinic we also discussed the risks (including the risk of doing nothing) and benefits of taking an open approach. My contribution: open bibliographic data enables high-level services to be sold back to universities (c.f. Chris Banks’ notes on metadata aggregation, above). We shouldn’t be scared of this or see it as a reason to not open up our data (we can’t compete with those companies; we want their services and we’re prepared to pay for them!); but we can build lower-level, locally-relevant services as a result of releasing our own open data, and play on the web by web rules – if we don’t make our data open for re-use on the web, we can’t even have the conversation. Lincoln’s approach is entirely around open data as a means to an end: it’s the best and most natural way of sparking off new, innovative services based on unexpected combinations of our own and other people’s data.
    • The best example of this so far are the new data-driven staff profiles at Lincoln: but we’re going to need more and more convincing examples if we’re going to make a convincing business case.
  12. Final overall quote of the day: “Writing your own open licence is an unpleasant form of vanity“.

Imminent domain

Posted on May 4th, 2012 by Paul Stainthorp

With various new services arising out of the ongoing Library ICT systems review, we’re amassing a nice little collection of library-related 2nd-level subdomains. Here’s a list, which I’ll edit as they become live.

  1. http://library.lincoln.ac.uk/ (i.e. the ‘bare’ library subdomain: this isn’t used at the moment, but we intend that it will become the Library’s ‘root’ web presence)
  2. http://www.library.lincoln.ac.uk/ (currently used for our SirsiDynix Horizon Information Portal OPAC, which we intend to move to catalogue.library… in order to free up www for our web pages hosted on WordPress)
  3. http://catalogue.library.lincoln.ac.uk/ (the future home of the library catalogue)
  4. http://catalog.library.lincoln.ac.uk/ (an alternative/US spelling of catalogue)
  5. http://findit.library.lincoln.ac.uk/ (a launch point for our new Discovery system, still to be announced, and with a name yet to be decided!)
  6. http://lists.library.lincoln.ac.uk/ (Talis Aspire reading lists, currently being developed)
  7. http://archives.library.lincoln.ac.uk/ (Axiell Calm archives and special collections software)
  8. http://jerome.library.lincoln.ac.uk/ (Jerome is our innovation platform and a home for experimental search services, being re-developed as part of the CLOCK project)
  9. http://auth.library.lincoln.ac.uk/ (OpenAthens LA v2.1 authentication software)
  10. http://proxy.library.lincoln.ac.uk/ (EZProxy authentication software)
  11. http://guides.library.lincoln.ac.uk/ (LibGuides software)

We also have two core systems which aren’t on the library subdomain:

  1. http://eprints.lincoln.ac.uk/ (the Lincoln Repository on EPrints – it’s appropriate that this isn’t on library, as we’ve always managed the Repository as a shared/collaborative project between CERD, ICT services, the Library, and the Research Office)
  2. http://ill.lincoln.ac.uk/ (CLIO inter-library loans software)

The technical approach: a CLOCK dev stack

Posted on May 2nd, 2012 by Paul Stainthorp

A note on technical development:

We’re beginning to make some progress towards a framework for development in the CLOCK project. Project developers Trevor Jones and Andrew Beeken, with the support of the other developers in LNCD, now have the following at their fingertips:

That list should give you an idea of LNCD’s approach to development. [N.B. some links may not be publicly accessible.]

Tick tock we don’t stop. Introducing CLOCK, a new JISC-funded resource discovery project at the universities of Lincoln and Cambridge

Posted on December 10th, 2011 by Paul Stainthorp

Cambridge CLOCKThe title says it all, really. The University of Lincoln, working in consortium with Cambridge University Library and Owen Stephens Consulting, has been awarded £49,877 by JISC to investigate ways of driving innovation in libraries’ interactions with Open Bibliographic Data, through a project we’re calling CLOCK (Cambridge-Lincoln Open Catalogue Knowledgebase).

CLOCK is a continuation of and elaboration upon the work of two recent JISC Discovery projects—Jerome at the University of Lincoln and COMET at the University of Cambridge—via a programme of development work shared between the two institutions, and with library consultant Owen Stephens. JISC were impressed enough with the work of both projects, and sufficiently interested in the potential for collaboration, that they encouraged our joint bid for follow-up funding.

Between now and the end of July, 2012, the CLOCK project will provide us with a framework to:

…[1] exploit through real-world applications the significant amount of data released openly by Cambridge University Library; [2] apply the Jerome database architecture, iterative development methodology, and API framework to a bibliographic dataset an order of magnitude greater than the University of Lincoln’s; and [3] to build and enable a new set of tools and demonstrator services which will enable the future development of public Open Bib Data web applications of practical utility to libraries and end-users.

You can read the full bid document, here.

I’m very much looking forward to working with Ed Chamberlain, Systems Librarian in the University Library at the University of Cambridge, along with Owen Stephens, veteran of a number of campaigns to open up access to library data, and Chris Leach (Systems Librarian) and Ian Snowley (University Librarian) from the University of Lincoln. Thanks are due to all of them for their help in writing the successful bid; to the Research & Enterprise Development office at Lincoln for their invaluable assistance in putting together the project budget; and to the LNCD group at the University of Lincoln for providing the kind of supportive development platform that makes these kind of projects possible.

Finally, a big thank you to Andy McGregor and the JISC Digital Infrastructure: Information and library infrastructure: Resource discovery programme, for this opportunity to further explore the blossoming environment of open bibliographic data/open discovery in libraries. If you haven’t done so already, you might like to take a look at the following websites:

As with all our projects, we’ll be blogging it comprehensively (so stand by for a steady stream of awful clock-related puns used as blog post titles). Although there’s little to see there yet, the CLOCK project blog is at: http://clock.blogs.lincoln.ac.uk/ – along with its own RSS feed RSS feed icon. Watch that space!

It’s the end of Jerome as we know it (but I feel fine)

Posted on November 28th, 2011 by Paul Stainthorp

The University of Lincoln’s Jerome project finished in August with the successful release of more than 240,000 openly-licensed bibliographic records, available over developer APIs, and a joint hack day with Cambridge University Library‘s COMET project.

Now, encouraged by positive JISC feedback, both institutions—Cambridge and Lincoln jointly—have applied for follow-up project funding under the project title CLOCK. If our bid is successful, the new project will run between December 2011–July 2012, employing a web developer based at the University of Lincoln, and distilling the work of both institutions into the development of new innovative library metadata discovery services for the scholarly community.

You can read the project proposal for CLOCK at http://lncn.eu/ijt4 – the introductory section is below.

The University of Lincoln and Cambridge University Library both delivered successful projects (Jerome and COMET) for the JISC Infrastructure for Resource Discovery Programme in 2011. This is a proposal for the continuation of and elaboration upon the work of both projects, via a programme of development work shared between the two institutions.

Throughout both projects (COMET-Jerome), parallel approaches in technology and data structure were noted and commented upon. A ‘mash day’ workshop event held in Cambridge in August aimed to explore these differences as well as areas of potential synergy. Here project members identified several points of interest to take forward.

Both projects produced outputs of interest to researchers, students, librarians, developers, and designers of bibliographic discovery environments. The CLOCK project will harness the success of these two complementary initiatives and investigate new approaches to data creation and discovery in the library domain. In particular, it will investigate, propose, and develop new, web-based bibliographic tools/APIs which will make it easier for developers, academic libraries and library end-users (esp. researchers) to find Open Bibliographic Data and incorporate that data into systems and workflows.

This project is an opportunity to [1] exploit through real-world applications the significant amount of data released openly by Cambridge University Library; [2] apply the Jerome database architecture, iterative development methodology, and API framework to a bibliographic dataset an order of magnitude greater than the University of Lincoln’s; and [3] to build and enable a new set of tools and demonstrator services which will enable the future development of public Open Bib Data web applications of practical utility to libraries and end-users.

The project will be supported by library consultant Owen Stephens, who will help to put the work into a national context, relating CLOCK to the wider movement toward Open Bib Data and the work of the JISC Discovery initiative. It will take place in an environment (Lincoln/Cambridge) where a culture of developer inquiry and experimentation is encouraged and nurtured. It is also endorsed by senior library management at both universities.

Both universities are involved in complementary development work which will  both inform and be informed by CLOCK: at Cambridge, Ed Chamberlain is guiding the development of the JISC Open Bibliography 2 project; in Lincoln, Paul Stainthorp is lead researcher on the #jiscmrd Orbital project, which is investigating the management of research data, with some areas of overlap.

CLOCK will operate as part of the wider JISC Digital Infrastructure: Information and library infrastructure: Resource discovery, and support the recent concerted effort to move toward openly licensed library discovery in UK Higher Education and beyond.

Technology in the Library’s annual review 2011

Posted on November 23rd, 2011 by Paul Stainthorp

The Library has published its 2011 Annual Review, including short reports on the following techie items:

You can read the annual review here.

The Library: Annual Review 2011

Electronic Resources Librarian: priorities 2011/2012

Posted on November 17th, 2011 by Paul Stainthorp

I’ve had a useful meeting with my new boss to agree my priorities for the next 12 months of development work in the Library. Here are my top 4, in order of importance.

  1. Discovery selection & implementation;
  2. JISC Orbital project (0.3FTE) – based mainly in CERD until March 2013;
  3. Possible JISC-funded Jerome follow-on work;
  4. Development of the Lincoln Repository – working closely with the Library Institutional Repository Officer (BJ), the Research & Enterprise Office + the subject librarians on the following areas:
    • Metadata workflow and service development
    • Advocacy/training
    • Building a “Research Showcase”
    • CRIS-like development, bibliometrics, and supporting the REF
    • Developing staff profiles on the University’s website
    • E-theses
    • Helpdesk integration (…possibly)

The following are projects—part of the current Library I.T. strategy—that I’ll contribute to but probably won’t lead, and/or work that’s going on in the background that I need to stay abreast of:

  1. Reading list development (project);
  2. Authentication (project);
  3. Participation in various JISC working groups as well as UKCoRR and LISN;
  4. Working with the Acquisitions team on new team rôles/areas of work;
  5. Monitoring and guiding e-resource management (ERM), authentication, and responding to user problems (this area of work will be looked after day-to-day by the Library (E-resources) Assistant (EV), supported by other staff, as part of the cover for my JISC project work);
  6. Supporting the subject librarian for technology in a review of the Library’s presence on the University Portal;
  7. Supporting the subject librarians in promoting and supporting the use of RefWorks 2.0;
  8. Supporting the HELS in administering copyright/digitisation services and the use of Blackboard.
  9. Initiating a new CALM user group.
  10. Co-ordinating LIG (the Library Innovation Group).
  11. Participating in the work of LNCD.

G’won then: what have I forgotten about?

Jerome/COMET hack day: Fun in the Fens

Posted on August 10th, 2011 by Paul Stainthorp

Here’s a photo of the CARET (Centre for Applied Research in Educational Technologies) offices at the University of Cambridge, where we held our log-awaited joint Jerome/COMET hack day, on Monday 8 August. Actually, in the end, it turned out to be a kind of Jerome/COMET/SALDA/synthesis/OUseful mashup-AH!

Jerome/COMET

In attendance (for the record):

Train mayhem aside (in the end the Lincoln contingent didn’t arrive until nearly midday), it was a really useful day and well worth doing. Particular thanks to Ed Chamberlain and his colleagues for hosting the event and for arranging the food and refreshments. Thanks also to everyone who travelled from afar for no other reason than they love a good mashup.

Typically, the ever-prolific Tony Hirst has already managed to write up not one, but two blog posts about ideas that came out of the day:

  • Getting Library Catalogue Searches Out There…
  • Open Data Processes: the Open Metadata Laundry (N.B. this one relates specifically to Jerome – in particular, our notion of ‘scrubbing’ dodgy MARC records by taking only the identifiers plus the bare citation-only fields, and using that minimal set to grab additional free and Open data from the web, automatically creating new full versions of records that are inherently Open. ‘Metadata laundry’, me like.)

Here are three more ideas/conversations we had in Cambridge that I thought were going somewhere interesting. Yeah, we might get around to actually doing these, sometime…

1. Using COMET data to enhance Jerome

The ideaSimilar to the ‘metadata laundry’, above, and to the way Jerome already uses data from the Open Library, JournalTOCs, LibraryThing, etc., to enhance its book records with additional metadata. Jerome constructs a URL in the form http://data.lib.cam.ac.uk/isbn/_______, with the ISBN from the Jerome record dropped in at the end. COMET responds with a link to an open record in RDF and/or JSON, which Jerome gladly sucks in, adding any additional fields to its original source record. Enrichment ensues.

2. Using Jerome search to ‘skin’ COMET

I called this one ”Jerome Scholar” ;-) …we make use of the search aspects of Jerome (in particular, the speed of Sphinx, the ‘mixing desk‘ idea, the neat record presentation, to provide a really smooth way of interacting with the much more well-structured (hence “Scholar”) data that resides in COMET.

3. Using the differences between the two datasets to tell us something interesting

I have a notion that there’s something inherently useful about being able to compare two versions of a record for the ‘same’ object. If we could use Jerome+COMET to generate a web application/data feed – one that other discovery services could themselves consume, we’d have ways of ‘sparking off’ whole new avenues of discovery: from misspelled names, variant titles, different subject terms assigned by different cataloguing practices, etc. Like xISBN, but for non-standardised data(?). All right, that’s the fuzziest of the three ideas. And as the eminiently sensible Owen Stephens kept asking me, “…what’s the use case?”.

And then we went to the pub.

And then we went to the pub.

This is the end: Jerome project

Posted on August 1st, 2011 by Paul Stainthorp

The JISC-funded Jerome project ended on 31 July 2011. Here are the final few project blog posts:

Jerome record pageThe Jerome search portal itself is [still] at http://jerome.library.lincoln.ac.uk/, and the open data APIs are all being documented on http://data.lincoln.ac.uk/ – we’ll not be switching any of it off any time soon :-)