Archive for the ‘Uncategorized’ Category

CLOCK and a summary of 2 other Discovery projects

Posted on May 17th, 2012 by Paul Stainthorp

Ed Chamberlain, who is on the CLOCK project team as a researcher, is involved in two other projects under the Discovery strand: OEM-UK and Open Bibliography 2. We’re looking for ways in which CLOCK can re-use data, code, processes and ideas from these projects (and elsewhere) – also what CLOCK could offer in return.

Notes:

  • Open Biblio project over the last few years; aim to aggregate large amounts of bibliographic data for scientific discovery.
  • Data collected from Cambridge University, the BL, PubMed and held as RDF, used to power an open catalogue called “Bibliographica“.
  • Problems around scaling the data/system led to the current JISC-funded Open Biblio 2 project (in the meantime, Cambridge and the BL had started to publish their data openly).
  • Open Biblio 2 started looking at a NoSQL approach (CouchDB, Lucene/Solr) – eventually settling on Elastic Search.
  • The approach of Open Biblio is to build bottom-up, community tools: BibServer and BibSoup (“Like Wikimedia for bib data”). Raises interesting questions about data quality in an open community-driven system.
  • Also looking at JSON as lightweight way of sharing bib data: emerging BibJSON convention for representing bibliographic record as a JSON object (Ed wrote a MARC-to-BibJSON-parser in Perl). N.B. BibJSON is not a million miles away from the JSON that Jerome spits out! There are three hack days taking place next month in London to look specifically at BibJSON.
  • Open Biblio 2 is also looking at JSON-LD (JSON for Linking Data), a ‘real’ JSON standard which does a lot of the things that RDF does.

tl;dr = use their JSON standards and BibSoup as a data source.

  • The second project, OEM-UK (Open Education Metadata UK), based at the IoE in London, is focusing on cataloguing workflows.
  • Data from the IoE’s SirsiDynix catalogue, plus EPrints is drawn into a Drupal framework; forms to create data (autopopulation of forms); “cataloguing the Drupal way”.
    • Thought from Andrew Beeken: could we replicate this approach, using WordPress custom post types to store and display structured content? Shades of the OPACPress project which Joss Winn and I proposed—but that was not funded—several years ago.
  • Some evidence that this approach is capable of speeding up the cataloguing process considerably: the more data you put in the faster it gets! Ed has some screencapture videos from OEM-UK showing workflow, including grabbing data via Zotero.

td;dr = OEM-UK are also successfully disrupting cataloguing workflows.

Hack da Fens: open bib hack day objectives!

Posted on May 17th, 2012 by Paul Stainthorp

Most of the CLOCK project team (AB, EC, CL, TJ, PS) are at CARET in Cambridge today and tomorrow (17-18 May 2012) to generally hack bibliographic data and try and point the way for the remaining 2 months’ technical development for the CLOCK project.

After coffee on day 1 we agreed our objectives for the next two days. They are:

  1. To review what we’ve done so far and what we need to do. To play with the SPARQL and JSON-parsing search tools that Andrew Beeken has started to develop and to incorporate more data (BL, etc.)
  2. To think about the user interface for CLOCK: how do we present open bib data from multiple sources (Lincoln, Cambridge, Harvard, BL, OpenLibrary, other) in a single UI in a way which helps our users (cataloguers. researchers) solve problems?
  3. What’s the high level architecture for CLOCK? How does data flow thru’ the system – can we draw a meaningful diagram?
  4. A comparison of open data / Discovery projects that Ed Chamberlain is involved in! What can we take and re-use from OpenBiblio2 and the OEM-UK project? What might those projects be able to take and re-use from CLOCK?
  5. What are we going to do with all this data? A plan for http://data.lincoln.ac.uk/http://data.lib.cam.ac.uk/, and http://data.ac.uk/library (or http://library.data.ac.uk/).
  6. To run interviews and live cognitive workthroughs with cataloguers in Cambridge and Lincoln.

On open data licensing and sustainability

Posted on May 17th, 2012 by Paul Stainthorp
Last week I attended a free ‘licensing clinic‘ in Birmingham, organised by the Discovery programme – mainly as a means of kick-starting my brain into considering the copyright/licensing issues around the CLOCK project. Here are my notes.
  1. The Jerome project addressed licensing in April, 2011, and the situation hasn’t really changed for us: we’re still intending to expose as much of our bibliographic data as possible using a properly open licence such as CC0.
    • “The licensing of data is an interesting one, since we run into a whole bunch of questions around who actually owns the information in our catalogue. Since it’s all factual information (and you can’t copyright a fact) then surely it’s a free for all – except that EU law introduces a curve ball in the form of database right. Broadly speaking this provides specific protection for collections of records, but not the records themselves.”
  2. Ed Chamberlain and the COMET project also addressed licensing and the ownership of MARC records: work that we should revisit.
  3. The JISC Open Bibliographic Data Guide (obd.jisc.ac.uk) provides very clear advice and information useful in creating an open data business case. E.g.:
    • “[…]if we presume that the rationale for publication is to ensure the widest possible dissemination then adoption of a generic open data license (such as Open Data Commons or CC0) is the most effective way to make the set of potential uses unambiguous. Restrictive licenses are counter-productive[…]“
  4. There is some very helpful guidance coming out of the Discovery project around building a business case for open discovery. This was summarised at the recent Discovery programme meeting (also in Birmingham) by David Kay –
    • N.B. I’ll revisit this in a future blog post. I’m getting almost surprisingly interested in the problem of ‘selling’ the idea of open bib data to an institution, and I’ve found the Discovery work on business cases increasingly useful.
  5. At Lincoln in March, 2012, we had a very useful visit from Sander van der Waal of OSS Watch where we discussed the University of Lincoln’s approach to openness (Open Source, Open Access, as well as Open Data). Joss Winn is following this work up with the University’s IP manager with a view to writing a University policy on open licensing of our IP.
  6. Related to the ‘business case’ aspect is the work of LNCD (and also discussions I’ve had with Ed Chamberlain recently) about how to ensure sustainability of open services in a technical sense – what sort of systems architecture and processes do we need in place, and how do we work with university ICT support departments to ensure that projects become institutionally-supported services when it’s important for them to do so?
  7. At this, Birmingham event, Chris Banks of the University of Aberdeen presented about the benefits and challenges of sharing from a library director’s perspective. I was particularly interested in the metaphor of “metadata as currency”: how are aggregators creating value based on the mass accumulation of metadata, and how are they selling that value back to libraries? See Chris’s blog for more. Aberdeen are clearly doing a lot around the analysis of e-resources usage and relating it back to their library strategy / information literacy, etc.
  8. Paul Miller (Cloud of Data): one key quote “amateurs tend to do a better job of aggregating content than institutions” (e.g. collections of images on Flickr). This may be in part because individuals don’t have the same risk-averse approach, but whatever the reason
  9. Barrister Frances Davey gave us a quick run-through of IP law as it relates to data. Key quote: “the legal repercussions of publishing data openly are pretty much nil“. Fear and uncertainty poisons initiative! Frances also touched on the business / reputation-management arguments for having an active approach to open data: people might well be getting bad copies of your data already (via screenscraping) – release it yourself and take control of the quality. Example of the British Library choosing a CC0 licence precisely because of the lack of an attribution clause – then any subsequent re-use is “nothing more to do with us”.
  10. Then, after lunch, copyright consultant Naomi Korn ran a workshop on the practical aspects of choosing a licence for your data. Naomi spoke about the need to start by deciding how open you want to be as an institution (noting that institutions with a dedicated © person tend to have a greater appetite for risk) – then consider whether you have the resources in place to get where you want to be. Key quote: “Let’s do some attribution mapping!” Some link from Naomi’s workshop:
  11. At the Birmingham clinic we also discussed the risks (including the risk of doing nothing) and benefits of taking an open approach. My contribution: open bibliographic data enables high-level services to be sold back to universities (c.f. Chris Banks’ notes on metadata aggregation, above). We shouldn’t be scared of this or see it as a reason to not open up our data (we can’t compete with those companies; we want their services and we’re prepared to pay for them!); but we can build lower-level, locally-relevant services as a result of releasing our own open data, and play on the web by web rules – if we don’t make our data open for re-use on the web, we can’t even have the conversation. Lincoln’s approach is entirely around open data as a means to an end: it’s the best and most natural way of sparking off new, innovative services based on unexpected combinations of our own and other people’s data.
    • The best example of this so far are the new data-driven staff profiles at Lincoln: but we’re going to need more and more convincing examples if we’re going to make a convincing business case.
  12. Final overall quote of the day: “Writing your own open licence is an unpleasant form of vanity“.

1.8 million library loans from the University of Lincoln under CC0 – Copac Activity Data/SALT2 project

Posted on May 16th, 2012 by Paul Stainthorp

Today we published data on approximately 1.8 million items loaned from the University of Lincoln’s libraries since 2001. The data is available to re-use under a CC0 licence, and can be downloaded from:

We’ve done this as part of our involvement in the Copac Activity Data Project, a.k.a. SALT2. Along with data from the universities of Manchester, Sussex, Cambridge and Huddersfield, our circulation data will be used to power a ‘recommender API‘, which libraries will be able to use to build “People who borrowed X also borrowed Y“-type services. The API will benefit from the power of aggregated data from multiple institutions of different types, containing tens of millions of circulation events.

You’ll notice as well that we’ve chosen to host the data on our brand-new Orbital (v0.1) research data management application. Each dataset has a persistent citable URI. We’ll be keeping the data up-to-date, and generating a new activity data file from our library circulation logs shortly after the end of each academic year.

The data consists of a number of CSV files (one for each academic year since 2000-01, plus a huge file of all the data), containing the following fields:

Field index Field name Description
0 CREATE_DATE The date and time of the loan event, in the format: dd/mm/yyyy hh:mm
1 BORROWER_ID A cryptographic hash of the internal system ID associated with the borrower of the item, as used in the University of Lincoln’s library system.
2 WORK_ID A cryptographic hash of the internal system ID associated with the bibliographic work borrowed, as used in the University of Lincoln’s library system.
3 CONTROL_NUMBER The ISBN of the work borrowed (10 or 13 digits).
4 AUTHOR_DISPLAY The main author of the work borrowed.
5 TITLE_DISPLAY The title of the work.
6 PUB_DATE The publication year of the work in the form: yyyy

I’ll blog in detail another time about exactly how we created the data extracts. In short:

  1. There is a table in the SirsiDynix Horizon library management system called circ_tran which records every instance of item number X borrowed by user number Y at time Z. [#1]
  2. There is another table which provides a lookup between item numbers and the numbers of the bibliographic works of which they are a copy. [#2]
  3. Dave Pattern at the University of Huddersfield wrote a Perl script which scrapes all the bibliographic data (title, author, ISBN) for each work from our OPAC (Horizon Information Portal) and writes it to a text file. [#3]
  4. Developer, Jamie Mahoney of CERD/LNCD then stepped in, using some pretty heavy SQL on the original 3 data extracts, to:
    • Hash the internal Horizon user and work ID numbers to provide anonymity;
    • Convert the internal Horizon date and time stamps in extract [#1] from a version of Unix time into a readable datestamp (formula hint: cko_date*86400 + cko_time*60);
    • Used the item/work lookup table [#2] to pull in the bibliographic details for each loan in [#1] from the bibliographic table [#3] (an epic SQL JOIN query), removing items which are no longer represented in our library system;
    • Removed any items without an ISBN, which are of no use to the SALT recommender API;
    • Tweaked the punctuation and formatting;
    • Split the data into separate files for each year.

Once again, the data is at:

Thanks are due to Chris Leach and Dave Pattern for Horizon-fu, and to Jamie Mahoney for his patient wrangling of several millions of lines of data!

You can find out more about the Copac Activity Data Project/SALT2, at: http://copac.ac.uk/innovations/activity-data/

Over the moon (again): Orbital v0.1 released

Posted on May 16th, 2012 by Paul Stainthorp

Screenshot of OrbitalJoss Winn has blogged this morning about a signficant milestone in the Orbital project. Today we released Orbital v0.1, and, from today, invited researchers at the University of Lincoln have access to an alpha ‘minimum viable product‘ environment for managing their research data.

For the time being, sign-in access to Orbital (http://orbital.lincoln.ac.uk/) is restricted to invited individuals only.

Orbital v0.1 allows a researcher to:

  • sign in using their University of Lincoln credentials
  • create and describe a project
  • upload their data to the project
  • choose a license for the data
  • add a Google Analytics code to measure project analytics
  • published data at a persistent URI (id.lincoln.ac.uk)
  • leave feedback on the Orbital application

You can read more about this first release on the Orbital blog. We’ve also written a basic development roadmap for Orbital which gives an idea of the kind of features you should see becoming available between now and Christmas 2012.

User testing with Orbital v0.1

Posted on May 16th, 2012 by Paul Stainthorp

Orbital v0.1 was released on 16 May 2012. Every two weeks, staff working on Orbital meet with Dr Bingo Wing-Kuen Ling and Dr Chunmei Qing to discuss their research and RDM practice. Until now these meetings have been all about requirements-gathering – today was the first opportunity for some real, hands-on user testing with the alpha release of Orbital.

The notes below have been turned into tasks on the Orbital project Pivotal Tracker site.

BL = Bingo Wing-Kuen Ling.

  1. BL successfully viewed Orbital v0.1 in Internet Explorer 7 on the UoL corporate desktop and was able to sign in and grant access to the application using his UoL credentials. BL was able to create and describe a new project.
  2. BL tried to upload a file from his desktop to Orbital using IE7 and received an error (this is a known bug with Orbital in Internet Explorer). He was then unable to delete this file.
  3. Switching to Firefox, BL uploaded multiple files from his desktop to his project in Orbital (it wasn’t clear from the page that this was possible). This completed successfully: but because the files sizes were small, he did not receive any feedback on his upload.
  4. Returning to the original file upload screen, BL had to manually refresh the page to view the changes made (files uploaded). Files scheduled for processing are marked as ‘queued’ however this status does not update automatically without refreshing.
  5. Joss Winn demonstrated the file and project metadata pages, citable URLs for files, and Google Analytics on projects. The display of file metadata needs to be more complete, and G.A. needs a better explanation and links to sources of help.
  6. The group discussed BL’s requirements around project calendars/timelines. BL wants to be able to view project events (meetings, deadlines, etc.) for each project (but not aggregated) and is not particularly concerned about notifications on activity/changes to files. The group discussed this and will explore ways of presenting timelines made up of three sorts of events (project events, activity stream, and comments) with each type of event suppressible in the timeline. A timeline overview will be displayed on the Orbital ‘front page’ once a user has logged in.
  7. BL also would like to be able to organise project and data files in all Orbital workspaces using folders/tags, and to allow bundled file download by organising files into collections.

You can read about Orbital v0.1 in this blog post, and about the roadmap for development and release of future versions, here.

Quick update on the Lincoln Repository

Posted on May 15th, 2012 by Paul Stainthorp

I wrote this short update on the Lincoln Repository for a University committee this week: you might as well see it.

  1. The Repository now contains more than 3,700 research output records, with 30% available as Open Access full text. A report on outputs published during Q3 of 2011 was submitted to the RIEC committee in March 2012.
  2. The list of University departments on the Repository has recently been updated to reflect the new college structure[1]: future quarterly research output reports will reflect this change.
  3. An outline plan for technical development of the Repository (to ensure a ‘REF ready’, stable platform for integration with other University Research & Enterprise services) has been submitted to the PVC for Research.
  4. The Library will be taking a subscription to SciVerse Scopus[2], the citation data service which is to be used in the REF[3]. Scopus data will be integrated into Repository records.
  5. Individual authors’ Repository publication lists are now displayed on individual web profiles in the staff directory[4] and on the University website. This is helping to improve the quality of Repository metadata as authors request corrections and additions to their web profiles.
  6. There have been a number of recent public statements in support of Open Access to published research outputs, from: the Department for BIS (speech by David Willetts)[5]; the Guardian[6]; Harvard University[7]; RCUK[8]; STM[9]; UK Council of Research Repositories[10].
  7. The Lincoln Repository team can be contacted on: eprints@lincoln.ac.uk

Article finder form on the e-journals A-to-Z

Posted on May 11th, 2012 by Paul Stainthorp

The e-journals A-to-Z website now includes an article finder.

Fill in the form with the details of the article you are trying to locate, and the A-to-Z will display links to available electronic full-text copies (or—if the full text isn’t available at the University of Lincoln—information about inter-library loans and other services).

Screenshot of the A-to-Z article finder

If you are presented with a login screen and the message: “We could not authenticate your request. Please sign in“, please click on the ‘ATHENS Login’ link to see the links to available full-text copies. If you access the A-to-Z via the University Portal, you should not see this message.

Screenshot of the login page

If you have any problems accessing or using the article finder service, please let the Library know.

Find it @ Lincoln: looking forward to a new EBSCO discovery service in the Library

Posted on May 11th, 2012 by Paul Stainthorp

Following long, looong discussions, we have finally chosen a next-generation library discovery service for the University of Lincoln Library.

After reviewing the four major commercially-available discovery products (from EBSCO, Ex Libris, OCLC and Serials Solutions), and after making several reference visits to see the various products in action in UK university libraries…

(…drum roll…)

EDS logo…we decided upon, and have now bought access to, the EBSCO Discovery Service. Over the summer we’ll be configuring and testing the new system, and in September 2012 it’ll be launched as the new front-end search and discovery platform for the Library at the University of Lincoln.

This new service will provide a single point of search and discovery across nearly all of the Library’s collections, including our ‘traditional’ library catalogue, e-books & e-journals, the Lincoln Repository, archives & special collections, reading lists, and a wide range of specialist and general electronic databases. (N.B. it might not search all of these collections right from day one!) We hope that—along with some of the other new and improved services that are being introduced as part of the Library’s review of ICT systems—it will make it significantly easier and more straightforward to find and use the University’s library resources.

According to the SCONUL HE Library Technology wiki, the EBSCO Discovery Service is also used by:

We decided that EBSCO Discovery Service provided us with a familiar (yet flexible, powerful and ‘serious’) research interface, as well as a good fit with our existing and planned electronic database collections. We were also influenced by EBSCO’s plans to develop and integrate the A-to-Z e-journals knowledgebase and link resolver into the discovery environment.

We’ll be spending the next month or so configuring the system to search all of our collections, designing/branding the interface, training library staff, and working with other University departments on getting the most out of the new tools. We anticipate that early access to the system will be possible from the end of July onwards (though this is subject to change), with a ‘soft’ launch in time for student induction in September, and a formal launch/discovery party with free coffee for all, later in the year.

We have also decided that the service will be branded under the title “Find it @ Lincoln“. (Eagle-eyed readers will spot that this is the name we’ve been using for a while for our EBSCO LinkSource OpenURL link resolver.) Information about the new Find it @ Lincoln service, and about the project to develop and launch it at the University of Lincoln, will soon be available at: http://findit.library.lincoln.ac.uk/

I’d like to thank the staff of all four discovery software companies, for all the presentations, demonstrations & visits, for the information they made available to the University of Lincoln over the past few months about their products, and for the demonstrations and supporting materials they provided which were of such use in informing this first selection phase of our discovery project.

Many thanks also, to the several universities who received staff from Library for discovery-themed visits, and who patiently described their use of their own search tools and answered our many questions profound and otherwise.

Now watch this space :-)