Posts Tagged ‘Jerome blog’

It’s the end of Jerome as we know it (but I feel fine)

Posted on November 28th, 2011 by Paul Stainthorp

The University of Lincoln’s Jerome project finished in August with the successful release of more than 240,000 openly-licensed bibliographic records, available over developer APIs, and a joint hack day with Cambridge University Library‘s COMET project.

Now, encouraged by positive JISC feedback, both institutions—Cambridge and Lincoln jointly—have applied for follow-up project funding under the project title CLOCK. If our bid is successful, the new project will run between December 2011–July 2012, employing a web developer based at the University of Lincoln, and distilling the work of both institutions into the development of new innovative library metadata discovery services for the scholarly community.

You can read the project proposal for CLOCK at http://lncn.eu/ijt4 – the introductory section is below.

The University of Lincoln and Cambridge University Library both delivered successful projects (Jerome and COMET) for the JISC Infrastructure for Resource Discovery Programme in 2011. This is a proposal for the continuation of and elaboration upon the work of both projects, via a programme of development work shared between the two institutions.

Throughout both projects (COMET-Jerome), parallel approaches in technology and data structure were noted and commented upon. A ‘mash day’ workshop event held in Cambridge in August aimed to explore these differences as well as areas of potential synergy. Here project members identified several points of interest to take forward.

Both projects produced outputs of interest to researchers, students, librarians, developers, and designers of bibliographic discovery environments. The CLOCK project will harness the success of these two complementary initiatives and investigate new approaches to data creation and discovery in the library domain. In particular, it will investigate, propose, and develop new, web-based bibliographic tools/APIs which will make it easier for developers, academic libraries and library end-users (esp. researchers) to find Open Bibliographic Data and incorporate that data into systems and workflows.

This project is an opportunity to [1] exploit through real-world applications the significant amount of data released openly by Cambridge University Library; [2] apply the Jerome database architecture, iterative development methodology, and API framework to a bibliographic dataset an order of magnitude greater than the University of Lincoln’s; and [3] to build and enable a new set of tools and demonstrator services which will enable the future development of public Open Bib Data web applications of practical utility to libraries and end-users.

The project will be supported by library consultant Owen Stephens, who will help to put the work into a national context, relating CLOCK to the wider movement toward Open Bib Data and the work of the JISC Discovery initiative. It will take place in an environment (Lincoln/Cambridge) where a culture of developer inquiry and experimentation is encouraged and nurtured. It is also endorsed by senior library management at both universities.

Both universities are involved in complementary development work which will  both inform and be informed by CLOCK: at Cambridge, Ed Chamberlain is guiding the development of the JISC Open Bibliography 2 project; in Lincoln, Paul Stainthorp is lead researcher on the #jiscmrd Orbital project, which is investigating the management of research data, with some areas of overlap.

CLOCK will operate as part of the wider JISC Digital Infrastructure: Information and library infrastructure: Resource discovery, and support the recent concerted effort to move toward openly licensed library discovery in UK Higher Education and beyond.

Jerome/COMET hack day: Fun in the Fens

Posted on August 10th, 2011 by Paul Stainthorp

Here’s a photo of the CARET (Centre for Applied Research in Educational Technologies) offices at the University of Cambridge, where we held our log-awaited joint Jerome/COMET hack day, on Monday 8 August. Actually, in the end, it turned out to be a kind of Jerome/COMET/SALDA/synthesis/OUseful mashup-AH!

Jerome/COMET

In attendance (for the record):

Train mayhem aside (in the end the Lincoln contingent didn’t arrive until nearly midday), it was a really useful day and well worth doing. Particular thanks to Ed Chamberlain and his colleagues for hosting the event and for arranging the food and refreshments. Thanks also to everyone who travelled from afar for no other reason than they love a good mashup.

Typically, the ever-prolific Tony Hirst has already managed to write up not one, but two blog posts about ideas that came out of the day:

  • Getting Library Catalogue Searches Out There…
  • Open Data Processes: the Open Metadata Laundry (N.B. this one relates specifically to Jerome – in particular, our notion of ‘scrubbing’ dodgy MARC records by taking only the identifiers plus the bare citation-only fields, and using that minimal set to grab additional free and Open data from the web, automatically creating new full versions of records that are inherently Open. ‘Metadata laundry’, me like.)

Here are three more ideas/conversations we had in Cambridge that I thought were going somewhere interesting. Yeah, we might get around to actually doing these, sometime…

1. Using COMET data to enhance Jerome

The ideaSimilar to the ‘metadata laundry’, above, and to the way Jerome already uses data from the Open Library, JournalTOCs, LibraryThing, etc., to enhance its book records with additional metadata. Jerome constructs a URL in the form http://data.lib.cam.ac.uk/isbn/_______, with the ISBN from the Jerome record dropped in at the end. COMET responds with a link to an open record in RDF and/or JSON, which Jerome gladly sucks in, adding any additional fields to its original source record. Enrichment ensues.

2. Using Jerome search to ‘skin’ COMET

I called this one ”Jerome Scholar” ;-) …we make use of the search aspects of Jerome (in particular, the speed of Sphinx, the ‘mixing desk‘ idea, the neat record presentation, to provide a really smooth way of interacting with the much more well-structured (hence “Scholar”) data that resides in COMET.

3. Using the differences between the two datasets to tell us something interesting

I have a notion that there’s something inherently useful about being able to compare two versions of a record for the ‘same’ object. If we could use Jerome+COMET to generate a web application/data feed – one that other discovery services could themselves consume, we’d have ways of ‘sparking off’ whole new avenues of discovery: from misspelled names, variant titles, different subject terms assigned by different cataloguing practices, etc. Like xISBN, but for non-standardised data(?). All right, that’s the fuzziest of the three ideas. And as the eminiently sensible Owen Stephens kept asking me, “…what’s the use case?”.

And then we went to the pub.

And then we went to the pub.

#discodev: worldwide software development competition using open library data

Posted on July 5th, 2011 by Paul Stainthorp

Copied verbatim (and under licence!) from the UK Discovery website:

Discovery logo

UK Discovery (http://discovery.ac.uk/) and the Developer Community Supporting Innovation (DevCSI) project based at UKOLN are running a global Developer Competition throughout July 2011 to build open source software applications / tools, using at least one of our 10 open data sources collected from libraries, museums and archives.

…and one of the 10 open data sources is the Jerome API we announced last week!

Enter simply by blogging about your application and emailing the blog post URI to joy.palmer@manchester.ac.uk by the deadline of 2359 (your local time) on Monday 1 August 2011.

Full details of the competition, the data sets and how to enter are at http://discovery.ac.uk/developers/competition/

Follow #discodev on Twitter to see what people are up to.

Is that a Jerome open data API I spy?

Posted on June 28th, 2011 by Paul Stainthorp

Yes. Yes, it is.

http://data.online.lincoln.ac.uk/documentation.html#bib

This is only the initial, bare-bones JSON-only service. A complete (and fully-documented) API will be released in stages over the next month, providing data in a range of output formats. We’re keeping all API and open institutional data documentation in the one place, on our open data site.

Jerome writeup in Discovery newsletter

Posted on June 8th, 2011 by Paul Stainthorp

This article appears in the current (May) issue of the Discovery newsletter, along with a nice photo of the GCW. Thanks to Helen Harrop of SERO for the writeup!

Great Central Icehouse

The stated purpose of the Jerome project is an ambitious one: to “develop a sustainable, institutional service for open bibliographic metadata, complemented with well documented APIs and an intelligent personalised interface for library users.” Not much there then!

The project started life as an internal ‘un-project’ which aimed to deliver “an amazing way to interact” with the University of Lincoln’s library services in the wider context of the University’s user services and in the face of limited resources.

The funding as a JISC RDTF project has enabled the team to make much swifter progress with their aspirations and to document achievements so that they can share their expertise and developments with the wider community.

The key outputs for this current, JISC-funded, phase of Jerome are:

  • A developers’ toolkit which will include APIs, web services, a technical ‘cook book’, user journeys and other documentation which will allow other developers to build and implement their own search tools.
  • Bibliographic records of books, journals and e-prints released as open data.
  • A user-controlled, personalised search interface.

The project has already gone live with the first implementation of a Jerome search interface [http://jerome.library.lincoln.ac.uk/] at the end of March.

In the background at Discovery event

Posted on May 26th, 2011 by Paul Stainthorp

A few of the Jerome project team are at the JISC/RLUK event in London: ‘Discovery – building a UK metadata ecology‘. Our slides are running on a screen in the foyer; I’ll be hanging around to talk about them.

How commercial next-generation library discovery tools have *nearly* got it right

Posted on May 17th, 2011 by Paul Stainthorp

In Huddersfield (again – I’m barely away from the place!), yesterday, at a CILIP UC&R (University, College and Research Group) Yorkshire & Humberside [catchy name] training event on ‘Discovering Discovery Tools‘. Librarians from four different UK universities gave practical, pros-and-cons descriptions of how they implemented and are now running four different commercial next-gen resource-discovery tools:

Five (count ‘em!) people from Lincoln were in the audience. I was wearing two hats: one for project Jerome for thinking about design concepts in resource discovery tools; the other for my day job – Lincoln is in the middle of a strategic review of Library ICT systems, which may well end up recommending that we buy one of these products.

It was all good stuff. First off, libraries need to hear the honest, warts and all counterpoint to the glowing terms in which each discovery product is described by its vendor. Secondly, it’s useful to subject all four* resource discovery platforms to the same amount of daylight, and see where the common problems lie, as well as where one tool outperforms another. Thirdly—and even though there’s a lot of resource discovery hyperbole to be heard—this is still a big shift for academic libraries, and I think we should discuss implications that are wider than the costs/benefits for an individual institution.

(*Yes, I know there are a few other tools. But they weren’t in the room yesterday.)

Lockside
What’s stopping us? (Canal lock gate at the University of Huddersfield.)

Things that jumped out at me:

Commercial resource discovery has reached a level of maturity that was absent a couple of years ago. That’s not to say that all next-gen resource discovery tools are perfect (because they aren’t), or that there aren’t any problems (because there are; see below), but academic libraries do now have a genuine choice between several different, viable commercial products.

Here’s a heresy: the differences between these four products are not that significant. I think that anyone who went away from yesterday’s event thinking that out of the four discovery tools on display there are some ‘good’ and some ‘bad’ …is probably wrong. It’s not really about the product, it’s about the willingness of the vendor to overcome problems, and about their attitude to their customers. Do you buy a slightly-less slick product, but from a company you feel you can have a more productive relationship with?

In fact, most of the real problems with resource discovery seem to be common to all four of the products on show yesterday. De-duping via FRBR reckons to be a bit of an Achilles’ heel. (A shame. FRBRisation is one of those things you either need to get right, or not do at all. A half-arsed attempt is worse than not bothering.)

Also broken: known-item search. This ought to be trivial to fix, and it needs to be sorted now now now.  I find it particularly sinister that some commercial resource-discovery tools rank their search results according to secret, proprietary algorithms that can’t be inspected or challenged by their users, let alone altered/improved. This is a problem. What’s the point of a library that can’t justify how its resource discovery system actually works? Are we just here to sign the cheques?

Libraries still have a tendency to overcomplicate things for their users. Sometimes they do this because they have no choice (perhaps their shiny new discovery tool doesn’t quite work they way it should); but often they seem just too ready to accept a situation where users are inconvenienced sooner than address an underlying problem. Lincoln included in this sweeping generalisation.

There’s no point pretending that a library can make two independent decisions to purchase [a] a next-gen resource discovery platform, and [b] a journals knowledgebase/link resolver. The two things are all tied up together. To pick a random example: you want Summon, you’d better want 360.

Why can’t we just buy access to a search index? If I want to pay to provide my users with the benefits of a lovely big central index of content, why do I have to buy into your discovery algorithm and web front-end as well? (Whither JISC collections?)

Related, and finally – we really shouldn’t have to replace our search and discovery interfaces every time we want/need to use a different content provider, and we shouldn’t be placed in the situation of having to make collection/subscription decisions in order to ‘feed’ our discovery tool. It may be temptingly easy, cost aside, to pick up and put down different next-gen discovery products (“…it’s just a subscription!”) but there’s too much at stake for our users.

Three quarks for Muster MARC!

Posted on April 21st, 2011 by Paul Stainthorp

My esteemed, gracious and talented colleague Mr. Jackson is not happy.

He’s not happy because I’ve asked him to do something which he thinks is an awful, depressing, retrograde step. I’ve asked him to add a MARC export function to Jerome.

Nick’s argument in a nutshell (he won’t mind me paraphrasing):

  • MARC is awful: truly awful. It’s holding back humanity’s (and libraries’) progress. We shouldn’t be doing anything to prolong its life. #marcmustdie

My argument in a nutshell:

  • For better or worse, libraries still use MARC, and this will be a useful facility for libraries who want to consume our open data straight into their existing Library Management Systems.

What does the studio audience think? Should Jerome serve up MARC (actually, MARCXML. I’m not a monster.) because someone, somewhere might want to consume it, or should we take a stand and insist on providing only decent, sane data formats from now on?

For anyone who’s blissfully unaware of MARC (MAchine-Readable Cataloging) formats, read this. Then read this, this, and this. Then go and have a lie down in a darkened room.

I don’t love MARC. More than anything, I don’t really understand it (I have a cataloguer to do that for me). But it still has currency in libraries. #shouldmarcdie?

The Jerome mind map

Posted on April 20th, 2011 by Paul Stainthorp

We’re using a free, preview version of a web-based mind mapping tool called MindMeister to plan and make notes for Jerome. Each week, the notes are copied across to our project tracking app (Pivotal Tracker) to form the development iteration for the week.

It’s very rough and ready, but you’re more than welcome to take a look at the Jerome mind map at: http://www.mindmeister.com/92308610/jerome