- The Jerome project addressed licensing in April, 2011, and the situation hasn’t really changed for us: we’re still intending to expose as much of our bibliographic data as possible using a properly open licence such as CC0.
- “The licensing of data is an interesting one, since we run into a whole bunch of questions around who actually owns the information in our catalogue. Since it’s all factual information (and you can’t copyright a fact) then surely it’s a free for all – except that EU law introduces a curve ball in the form of database right. Broadly speaking this provides specific protection for collections of records, but not the records themselves.”
- Ed Chamberlain and the COMET project also addressed licensing and the ownership of MARC records: work that we should revisit.
- The JISC Open Bibliographic Data Guide (obd.jisc.ac.uk) provides very clear advice and information useful in creating an open data business case. E.g.:
- “[…]if we presume that the rationale for publication is to ensure the widest possible dissemination then adoption of a generic open data license (such as Open Data Commons or CC0) is the most effective way to make the set of potential uses unambiguous. Restrictive licenses are counter-productive[…]“
- There is some very helpful guidance coming out of the Discovery project around building a business case for open discovery. This was summarised at the recent Discovery programme meeting (also in Birmingham) by David Kay –
- N.B. I’ll revisit this in a future blog post. I’m getting almost surprisingly interested in the problem of ‘selling’ the idea of open bib data to an institution, and I’ve found the Discovery work on business cases increasingly useful.
- At Lincoln in March, 2012, we had a very useful visit from Sander van der Waal of OSS Watch where we discussed the University of Lincoln’s approach to openness (Open Source, Open Access, as well as Open Data). Joss Winn is following this work up with the University’s IP manager with a view to writing a University policy on open licensing of our IP.
- Related to the ‘business case’ aspect is the work of LNCD (and also discussions I’ve had with Ed Chamberlain recently) about how to ensure sustainability of open services in a technical sense – what sort of systems architecture and processes do we need in place, and how do we work with university ICT support departments to ensure that projects become institutionally-supported services when it’s important for them to do so?
- At this, Birmingham event, Chris Banks of the University of Aberdeen presented about the benefits and challenges of sharing from a library director’s perspective. I was particularly interested in the metaphor of “metadata as currency”: how are aggregators creating value based on the mass accumulation of metadata, and how are they selling that value back to libraries? See Chris’s blog for more. Aberdeen are clearly doing a lot around the analysis of e-resources usage and relating it back to their library strategy / information literacy, etc.
- Paul Miller (Cloud of Data): one key quote “amateurs tend to do a better job of aggregating content than institutions” (e.g. collections of images on Flickr). This may be in part because individuals don’t have the same risk-averse approach, but whatever the reason
- Barrister Frances Davey gave us a quick run-through of IP law as it relates to data. Key quote: “the legal repercussions of publishing data openly are pretty much nil“. Fear and uncertainty poisons initiative! Frances also touched on the business / reputation-management arguments for having an active approach to open data: people might well be getting bad copies of your data already (via screenscraping) – release it yourself and take control of the quality. Example of the British Library choosing a CC0 licence precisely because of the lack of an attribution clause – then any subsequent re-use is “nothing more to do with us”.
- Then, after lunch, copyright consultant Naomi Korn ran a workshop on the practical aspects of choosing a licence for your data. Naomi spoke about the need to start by deciding how open you want to be as an institution (noting that institutions with a dedicated © person tend to have a greater appetite for risk) – then consider whether you have the resources in place to get where you want to be. Key quote: “Let’s do some attribution mapping!” Some link from Naomi’s workshop:
- Paper by Naomi Korn and Charles Oppenheim: Licensing Open Data: A Practical Guide [PDF] available from the Discovery website: http://lncn.eu/efgk
- Useful tool: ‘will it blend?’ – Creative Commons compatibility wizard: http://www.web2rights.com/creativecommons/
- Another wizard: ‘how open are you?’ – help in placing your institution along the openness scale! http://www.web2rights.com/OERIPRSupport/howopenareyou/
- The IWM (Imperial War Museum)’s open content licence was also mentioned in passing: the IWM is significant as the first national museum to release significant amounts of content openly.
- At the Birmingham clinic we also discussed the risks (including the risk of doing nothing) and benefits of taking an open approach. My contribution: open bibliographic data enables high-level services to be sold back to universities (c.f. Chris Banks’ notes on metadata aggregation, above). We shouldn’t be scared of this or see it as a reason to not open up our data (we can’t compete with those companies; we want their services and we’re prepared to pay for them!); but we can build lower-level, locally-relevant services as a result of releasing our own open data, and play on the web by web rules – if we don’t make our data open for re-use on the web, we can’t even have the conversation. Lincoln’s approach is entirely around open data as a means to an end: it’s the best and most natural way of sparking off new, innovative services based on unexpected combinations of our own and other people’s data.
- The best example of this so far are the new data-driven staff profiles at Lincoln: but we’re going to need more and more convincing examples if we’re going to make a convincing business case.
- Final overall quote of the day: “Writing your own open licence is an unpleasant form of vanity“.
Posts Tagged ‘COMET’
On open data licensing and sustainability
Posted on May 17th, 2012 by Paul StainthorpTick tock we don’t stop. Introducing CLOCK, a new JISC-funded resource discovery project at the universities of Lincoln and Cambridge
Posted on December 10th, 2011 by Paul Stainthorp
The title says it all, really. The University of Lincoln, working in consortium with Cambridge University Library and Owen Stephens Consulting, has been awarded £49,877 by JISC to investigate ways of driving innovation in libraries’ interactions with Open Bibliographic Data, through a project we’re calling CLOCK (Cambridge-Lincoln Open Catalogue Knowledgebase).
CLOCK is a continuation of and elaboration upon the work of two recent JISC Discovery projects—Jerome at the University of Lincoln and COMET at the University of Cambridge—via a programme of development work shared between the two institutions, and with library consultant Owen Stephens. JISC were impressed enough with the work of both projects, and sufficiently interested in the potential for collaboration, that they encouraged our joint bid for follow-up funding.
Between now and the end of July, 2012, the CLOCK project will provide us with a framework to:
…[1] exploit through real-world applications the significant amount of data released openly by Cambridge University Library; [2] apply the Jerome database architecture, iterative development methodology, and API framework to a bibliographic dataset an order of magnitude greater than the University of Lincoln’s; and [3] to build and enable a new set of tools and demonstrator services which will enable the future development of public Open Bib Data web applications of practical utility to libraries and end-users.
You can read the full bid document, here.
I’m very much looking forward to working with Ed Chamberlain, Systems Librarian in the University Library at the University of Cambridge, along with Owen Stephens, veteran of a number of campaigns to open up access to library data, and Chris Leach (Systems Librarian) and Ian Snowley (University Librarian) from the University of Lincoln. Thanks are due to all of them for their help in writing the successful bid; to the Research & Enterprise Development office at Lincoln for their invaluable assistance in putting together the project budget; and to the LNCD group at the University of Lincoln for providing the kind of supportive development platform that makes these kind of projects possible.
Finally, a big thank you to Andy McGregor and the JISC Digital Infrastructure: Information and library infrastructure: Resource discovery programme, for this opportunity to further explore the blossoming environment of open bibliographic data/open discovery in libraries. If you haven’t done so already, you might like to take a look at the following websites:
- Discovery: a metadata ecology for UK education and research
- Open Bibliographic Data Guide
- data.lib.cam.ac.uk (beta)
- data.lincoln.ac.uk
- JISC-funded Open Bibliography 2 (Ed Chamberlain is also involved in this project)
- LNCD projects
- JISC resource discovery programme
As with all our projects, we’ll be blogging it comprehensively (so stand by for a steady stream of awful clock-related puns used as blog post titles). Although there’s little to see there yet, the CLOCK project blog is at: http://clock.blogs.lincoln.ac.uk/ – along with its own RSS feed
. Watch that space!
It’s the end of Jerome as we know it (but I feel fine)
Posted on November 28th, 2011 by Paul StainthorpThe University of Lincoln’s Jerome project finished in August with the successful release of more than 240,000 openly-licensed bibliographic records, available over developer APIs, and a joint hack day with Cambridge University Library‘s COMET project.
Now, encouraged by positive JISC feedback, both institutions—Cambridge and Lincoln jointly—have applied for follow-up project funding under the project title CLOCK. If our bid is successful, the new project will run between December 2011–July 2012, employing a web developer based at the University of Lincoln, and distilling the work of both institutions into the development of new innovative library metadata discovery services for the scholarly community.
You can read the project proposal for CLOCK at http://lncn.eu/ijt4 – the introductory section is below.
The University of Lincoln and Cambridge University Library both delivered successful projects (Jerome and COMET) for the JISC Infrastructure for Resource Discovery Programme in 2011. This is a proposal for the continuation of and elaboration upon the work of both projects, via a programme of development work shared between the two institutions.
Throughout both projects (COMET-Jerome), parallel approaches in technology and data structure were noted and commented upon. A ‘mash day’ workshop event held in Cambridge in August aimed to explore these differences as well as areas of potential synergy. Here project members identified several points of interest to take forward.
Both projects produced outputs of interest to researchers, students, librarians, developers, and designers of bibliographic discovery environments. The CLOCK project will harness the success of these two complementary initiatives and investigate new approaches to data creation and discovery in the library domain. In particular, it will investigate, propose, and develop new, web-based bibliographic tools/APIs which will make it easier for developers, academic libraries and library end-users (esp. researchers) to find Open Bibliographic Data and incorporate that data into systems and workflows.
This project is an opportunity to [1] exploit through real-world applications the significant amount of data released openly by Cambridge University Library; [2] apply the Jerome database architecture, iterative development methodology, and API framework to a bibliographic dataset an order of magnitude greater than the University of Lincoln’s; and [3] to build and enable a new set of tools and demonstrator services which will enable the future development of public Open Bib Data web applications of practical utility to libraries and end-users.
The project will be supported by library consultant Owen Stephens, who will help to put the work into a national context, relating CLOCK to the wider movement toward Open Bib Data and the work of the JISC Discovery initiative. It will take place in an environment (Lincoln/Cambridge) where a culture of developer inquiry and experimentation is encouraged and nurtured. It is also endorsed by senior library management at both universities.
Both universities are involved in complementary development work which will both inform and be informed by CLOCK: at Cambridge, Ed Chamberlain is guiding the development of the JISC Open Bibliography 2 project; in Lincoln, Paul Stainthorp is lead researcher on the #jiscmrd Orbital project, which is investigating the management of research data, with some areas of overlap.
CLOCK will operate as part of the wider JISC Digital Infrastructure: Information and library infrastructure: Resource discovery, and support the recent concerted effort to move toward openly licensed library discovery in UK Higher Education and beyond.
Jerome/COMET hack day: Fun in the Fens
Posted on August 10th, 2011 by Paul StainthorpHere’s a photo of the CARET (Centre for Applied Research in Educational Technologies) offices at the University of Cambridge, where we held our log-awaited joint Jerome/COMET hack day, on Monday 8 August. Actually, in the end, it turned out to be a kind of Jerome/COMET/SALDA/synthesis/OUseful mashup-AH!
In attendance (for the record):
- Alex Bilbie (University of Lincoln)
- Ed Chamberlain (University of Cambridge)
- Nick Jackson (University of Lincoln)
- Chris Keene (University of Sussex)
- Phillip Heels (University of Lincoln)
- Tony Hirst (The Open University)
- Huw Jones (University of Cambridge)
- Chris Leach (University of Lincoln)
- Dan Sheppard (University of Cambridge)
- Paul Stainthorp (University of Lincoln)
- Owen Stephens (Owen Stephens Consulting)
- Laura Waldoch (University of Cambridge)
- Lihua Zhu (University of Cambridge)
Train mayhem aside (in the end the Lincoln contingent didn’t arrive until nearly midday), it was a really useful day and well worth doing. Particular thanks to Ed Chamberlain and his colleagues for hosting the event and for arranging the food and refreshments. Thanks also to everyone who travelled from afar for no other reason than they love a good mashup.
Typically, the ever-prolific Tony Hirst has already managed to write up not one, but two blog posts about ideas that came out of the day:
- Getting Library Catalogue Searches Out There…
- Open Data Processes: the Open Metadata Laundry (N.B. this one relates specifically to Jerome – in particular, our notion of ‘scrubbing’ dodgy MARC records by taking only the identifiers plus the bare citation-only fields, and using that minimal set to grab additional free and Open data from the web, automatically creating new full versions of records that are inherently Open. ‘Metadata laundry’, me like.)
Here are three more ideas/conversations we had in Cambridge that I thought were going somewhere interesting. Yeah, we might get around to actually doing these, sometime…
1. Using COMET data to enhance Jerome
Similar to the ‘metadata laundry’, above, and to the way Jerome already uses data from the Open Library, JournalTOCs, LibraryThing, etc., to enhance its book records with additional metadata. Jerome constructs a URL in the form http://data.lib.cam.ac.uk/isbn/_______, with the ISBN from the Jerome record dropped in at the end. COMET responds with a link to an open record in RDF and/or JSON, which Jerome gladly sucks in, adding any additional fields to its original source record. Enrichment ensues.
2. Using Jerome search to ‘skin’ COMET
I called this one ”Jerome Scholar”
…we make use of the search aspects of Jerome (in particular, the speed of Sphinx, the ‘mixing desk‘ idea, the neat record presentation, to provide a really smooth way of interacting with the much more well-structured (hence “Scholar”) data that resides in COMET.
3. Using the differences between the two datasets to tell us something interesting
I have a notion that there’s something inherently useful about being able to compare two versions of a record for the ‘same’ object. If we could use Jerome+COMET to generate a web application/data feed – one that other discovery services could themselves consume, we’d have ways of ‘sparking off’ whole new avenues of discovery: from misspelled names, variant titles, different subject terms assigned by different cataloguing practices, etc. Like xISBN, but for non-standardised data(?). All right, that’s the fuzziest of the three ideas. And as the eminiently sensible Owen Stephens kept asking me, “…what’s the use case?”.
And then we went to the pub.












