LAMP project: a let’s-pretend Post-it post-mortem

Posted on July 12th, 2013 by Paul Stainthorp

Photo of the Post-it post-mortem exerciseI was in Birmingham on Monday for the second meeting of the Jisc LAMP project community group. I missed the first meeting.

LAMP (Library Analytics and Metrics project) is a partnership between Jisc, Mimas (at the University of Manchester) and the University of Huddersfield. The aim of the project is to spec’ and develop a prototype shared library analytics service for UK academic libraries.

It’s carrying on the work of previous library activity data projects, including several to which the University of Lincoln contributed data:

Lincoln intends to contribute some of our data to the LAMP project, too.

At Monday’s meeting, David Kay of SERO Consulting spoke about some work being done to categorise the use cases for the use of analytics data in HE, and identified tools using library analytics data in stages from proposals [many] through pilots [some] to production services [few. In fact, probably one].

He also discussed the legal / ethical position in using student activity data to build [shared] services, and how we need to move beyond a position of fear of making use of this data in case “someone” objects. In fact, as Richard Nurse (Open University) highlighted, the attitude from universities (edit: and the attitude of students!) is more likely to be:

“You’ve had all of this data for ages, so why haven’t you used it to improve the student experience?”

A number of people from MIMAS spoke about some of the technical and data challenges. Some useful recent blog posts:

There are a lot of issues around data normalisation (combining and comparing data from multiple HE institutions), and around the problems of dividing user data into a large enough number of categories to allow for meaningful use of the data, but few enough categories to permit statistical analysis.

Some potential problems for Lincoln are going to be:

  1. How do we convince sceptical parts of the University that this is a legitimate use of student data? What are the ‘norms’ for anonymisation and sharing? Is there a role for some targeted advice from Jisc Legal?
  2. Related: how much should we be normalising our own data before we release/publish/share it? It needs to be genericised (is that a word?) enough to reassure the University and individuals, but detailed and ‘real’ enough to allow normalisation to be done centrally, and to permit useful comparative uses of the data across institutions.
  3. Is there a standard for generating anonymised unique IDs? If so, where’s it documented? How do we make sure that the same processes for extracting and processing data are used year in, year out, so that individual (anonymised) users can be identified, and comparisons made, across multiple years?
  4. We’re currently suffering a (temporary, I hope) developer ‘dry patch’ at Lincoln. How do we make sure we get and keep the technical skills to maintain this data and the extraction/manipulation/publishing processes?

Some of my concerns I fed into a useful and interesting mock ‘post-mortem’ exercise after lunch.

I’ll let the LAMP blog publish the detail of the exercise, but in summary we all imagined ourselves in the year 2015, looking back on a LAMP project which had failed catastrophically. Then we tried to capture as many reasons as possible why it went (hypothetically!) wrong.

Everything from data quality/UI problems to legal challenges against our use of the data, and alien invasions disrupting Jisc’s plans… ideas all went onto Post-it notes and were sorted into the following broad categories on a wall (see photo):

  • Technical
  • Data quality
  • Commercial
  • Sustainability
  • Costs & processes
  • Buy-in
  • Usability
  • Legal
  • Doomsday (the aforementioned alien invasion)

These presumably will feed into the project management, to try and design out or mitigate against as many problems as possible, and have plans for dealing with them if they do occur.

EMALINK seminar on activity data and the LIDP

Posted on June 15th, 2012 by Paul Stainthorp

The details of a workshop I’m speaking at in July; relates to the University of Huddersfield-led Library Impact Data Project (LIDP), in which Lincoln and DMU participated last year.


Library Data Impact Project

De Montfort University
Kimberlin Library
Lecture Theatre, 00.11
Tuesday 17th July 2012
2.00 – 4.00 pm
(Light refreshments available from 1.45pm)

The JISC Library Data Impact Project proved a statistically significant correlation between library usage and student attainment. Two universities in the region, De Montfort and Lincoln, participated in the project and will present on their approaches to the collection of library activity data and the analysis and dissemination of the results. There will also be an opportunity for participants to discuss the practicalities and value of gathering and using such data, within our libraries and the wider institution.

Session leaders:
Phil Adams, Senior Assistant Librarian, De Montfort University

Marie Letzgus, Senior Assistant Librarian, De Montfort University

Paul Stainthorp, Electronic Resources Librarian, University of Lincoln

Contact your EMALINK representative to book a place by Wednesday 11th July. There are three places available per institution.

The DMU campus is a 15-20 minute walk from Leicester train station. Limited visitor parking is available on campus – please advise your EMALINK rep on booking if you wish to request a parking space. Campus maps available from:

1.8 million library loans from the University of Lincoln under CC0 – Copac Activity Data/SALT2 project

Posted on May 16th, 2012 by Paul Stainthorp

Today we published data on approximately 1.8 million items loaned from the University of Lincoln’s libraries since 2001. The data is available to re-use under a CC0 licence, and can be downloaded from:

We’ve done this as part of our involvement in the Copac Activity Data Project, a.k.a. SALT2. Along with data from the universities of Manchester, Sussex, Cambridge and Huddersfield, our circulation data will be used to power a ‘recommender API‘, which libraries will be able to use to build “People who borrowed X also borrowed Y“-type services. The API will benefit from the power of aggregated data from multiple institutions of different types, containing tens of millions of circulation events.

You’ll notice as well that we’ve chosen to host the data on our brand-new Orbital (v0.1) research data management application. Each dataset has a persistent citable URI. We’ll be keeping the data up-to-date, and generating a new activity data file from our library circulation logs shortly after the end of each academic year.

The data consists of a number of CSV files (one for each academic year since 2000-01, plus a huge file of all the data), containing the following fields:

Field index Field name Description
0 CREATE_DATE The date and time of the loan event, in the format: dd/mm/yyyy hh:mm
1 BORROWER_ID A cryptographic hash of the internal system ID associated with the borrower of the item, as used in the University of Lincoln’s library system.
2 WORK_ID A cryptographic hash of the internal system ID associated with the bibliographic work borrowed, as used in the University of Lincoln’s library system.
3 CONTROL_NUMBER The ISBN of the work borrowed (10 or 13 digits).
4 AUTHOR_DISPLAY The main author of the work borrowed.
5 TITLE_DISPLAY The title of the work.
6 PUB_DATE The publication year of the work in the form: yyyy

I’ll blog in detail another time about exactly how we created the data extracts. In short:

  1. There is a table in the SirsiDynix Horizon library management system called circ_tran which records every instance of item number X borrowed by user number Y at time Z. [#1]
  2. There is another table which provides a lookup between item numbers and the numbers of the bibliographic works of which they are a copy. [#2]
  3. Dave Pattern at the University of Huddersfield wrote a Perl script which scrapes all the bibliographic data (title, author, ISBN) for each work from our OPAC (Horizon Information Portal) and writes it to a text file. [#3]
  4. Developer, Jamie Mahoney of CERD/LNCD then stepped in, using some pretty heavy SQL on the original 3 data extracts, to:
    • Hash the internal Horizon user and work ID numbers to provide anonymity;
    • Convert the internal Horizon date and time stamps in extract [#1] from a version of Unix time into a readable datestamp (formula hint: cko_date*86400 + cko_time*60);
    • Used the item/work lookup table [#2] to pull in the bibliographic details for each loan in [#1] from the bibliographic table [#3] (an epic SQL JOIN query), removing items which are no longer represented in our library system;
    • Removed any items without an ISBN, which are of no use to the SALT recommender API;
    • Tweaked the punctuation and formatting;
    • Split the data into separate files for each year.

Once again, the data is at:

Thanks are due to Chris Leach and Dave Pattern for Horizon-fu, and to Jamie Mahoney for his patient wrangling of several millions of lines of data!

You can find out more about the Copac Activity Data Project/SALT2, at:

The data! The data!

Posted on October 3rd, 2011 by Paul Stainthorp

The Library Impact Data Project (LIDP), which ran from February-July this year, and in which the University of Lincoln took part, has now released a subset of the library activity data used in the analysis (which, you’ll remember, showed a statistically significant correlation across a number of universities between library activity data and student attainment).

Lincoln’s data is included in the release, which is available for re-use under an open licence, from:

This data set is made available under the Open Data Commons Attribution License

The data contains final grade and library usage figures for 33,074 students studying undergraduate degrees at UK universities. More information on the data, and how it’s been generalised in order to preserve students’ anonymity, on the LIDP project blog.

  • There’s also a detailed report about the statistical breakdown of Lincoln’s own share of the data (this wasn’t published as part of the project reports, as it was down to each individual institution whether to make it public or not) – I’ve made the report available here [PDF].

The LIDP blog also contains information about the project ‘toolkit‘, developed to assist other institutions who may want to test their own data against the LIDP’s hypothesis, here and here.

Thanks again to Graham, Bryony and Dave at the University of Huddersfield for inviting Lincoln to take part in the project, and for their help along the way!

On to the next one…

LIDP: end of project. Using libraries = good.

Posted on July 28th, 2011 by Paul Stainthorp

I was in Huddersfield last week for the final project meeting of the Library Impact Data Project (LIDP).

LIDP was successful in proving that:

There is statistically significant relationship between both book loans and e-resources use and student attainment. And this is true across all of the universities in the study that provided data in these areas.

“We want to stress here again that we realise THIS IS NOT A CAUSAL RELATIONSHIP!  Other factors make a difference to student achievement, and there are always exceptions to the rule, but we have been able to link use of library resources to academic achievement.”

An initial (outline) report on how the University of Lincoln’s own activity-attainment holds up to this same statistical inspection is available to download from here [PDF]. As much as possible of the library activity data used in the project will be released under an Open Data Commons Attribution License in the near future, and hosted on the project blog.

LIDP [old photo]Thanks are due to Graham Stone, Dave Pattern, Bryony Ramsden, and all the project partners for the opportunity for Lincoln to participate in this project. We had fun getting our together. The end-of-project blog post for LIDP is here – it suggests some very interesting areas for further investigation.

Personally, I’m very interested in looking for cross-institutional comparisons – perhaps trying to explain particular levels of activity-attainment attached to individual subject areas, irrespective of which university the student is at (i.e. does a Lincoln computing student have more in common with a Lincoln business student, or with a Huddersfield computing student?). I’d also be interested in looking particularly at those students whose library activity behaviour changes through the life of their course, and who then go on to get a better degree than they might have been predicted based on their library activity in their first year.

“Finally, we have been astonished by how much interest there has been in our project. To date we have two articles ready for publication imminently and have another 2 in the pipeline. In addition by the end of October we will have delivered 11 conference papers on the project. All articles and conference presentations are accessibly at:

I can see this project getting cited, and cited again, simply every time anyone wants to argue that academic libraries are A Good Thing.

What I been up to

Posted on July 7th, 2011 by Paul Stainthorp

Apologies: this is one of those generic catch-all blog posts. I attended four separate events last week: here’s a short report from each one.


Kimberlin1. CILIP UC&R Members’ Day: Making an Impact

De Montfort University, Leicester. 28 June, 2011

This workshop for CILIP members was looking at various ways in which libraries can have (and can measure) their ‘impact’. I spoke first about Lincoln’s involvement in the University of Huddersfield’s Library Impact Data Project (LIDP), and how that project is trying (successfully, it seems) to measure the relationship between students’ library use and their degree ‘success’.

Then DMU subject librarian Jason Eyre talked about his PITSTOP project, which built a mediated forum for online discussion between Social Work students on placement, their lecturers, and their practice educators (in the NHS and local authorities). Jason explained that while the online discussion forum itself was not very well used, the impact of the project was that is acted as a catalyst for building a better relationship between students, academics, practice educators, and the library.

After a very well-run World Café session, where we moved around between different tables, each themed with a different aspect of ‘impact’ in libraries – and then lunch, information management consultant David Streatfield presented on the difficulties of measuring and evaluating the impact that academic libraries can have. He outlined some of the different approaches that have been taken in the past, and how those approaches can be less than successful in an environment of government pressure to control public service provision.

Lastly, Maria Cotera, former president of the CILIP Career Development Group, told us several anecdotes about the ways she has seen library workers make an impact themselves, through their involvement in staff development, social, and extra-professional activities. In an exercise, all the delegates came up with an example of a shared pressure or circumstance in our home institutions that could be turned into an opportunity for staff development.

Thanks to Marie Nicholson and the UC&R East Midlands committee for inviting me to speak! Twitter hashtag: #UCREMimpact.


Great Central Icehouse2. EMALINK event on collection development

University of Lincoln. 29 June, 2011

This was another East Midlands event, and the first EMALINK event held in Lincoln since we joined that network. It was organised, jointly, by the University of Lincoln, our neighbours Bishop Grosseteste University College, and Nottingham Trent University (NTU). The theme was the lifecycle of collection management: from selection and acquisition, through analysis and review of collections, and finally disposal.

NTU kicked off with a look at their work to incorporate Talis Aspire into the DNA of their library: they’re building a set of resource selection and allocation processes that are strongly driven by the resource lists built by academics using Aspire. Lincoln responded with two short presentations about collection analysis: our project to compare the strengths and weaknesses (in size, breadth, and age) of the various subject collections in our physical bookstock with the relative sizes of the student body in different subject areas; and our work to determine value for money in ‘Big Deal’ database subscriptions. Finally, Susan Rodda from Bishop Grosseteste talked about the options for disposing of unwanted physical library stock, and how BG have managed, for several years, to weed their collection without sending any paper to landfill.


Goodenough library (detail)3. JISC Managing Research Data Programme (#jiscmrd) community briefing event

Goodenough College, London. 1 July, 2011

On Friday, I attended this briefing event for the current JISC research data funding call for proposals, on Joss Winn‘s behalf. The JISC programme manager ran through the requirements and expectations for the various strands of this current call. Kevin Ashley of the Digital Curation Centre also presented: about how the DCC can support and work with institutions who are running research data management projects. See hashtag: #jiscmrd for information about the programme.


OU Library4. JISC Innovations in Activity Data workshop

The Open University, Milton Keynes. 4 July, 2011

After a long, Sunday-afternoon train journey to Milton Keynes, I paid my first ever visit to the OU’s Walton Hall campus for another activity data-related event, this time organised and hosted by the team behind the JISC-funded RISE (“Recommendations Improve the Search Experience”) project.

The day began with three presentations from projects funded under the current JISC activity data strand:

  1. Joy Palmer of MIMAS and the SALT project (“Surfacing the Academic Long Tail”: MIMAS working with the John Rylands University Library of the University of Manchester);
  2. RISE themselves (Richard Nurse of the OU) talking about how they are using EZProxy log data to power a recommendation service (“…users who looked at this, also looked at these…“);
  3. Via video link, live from Huddersfield: Dave Pattern talking about LIDP.

Then, another World Café-type exercise (two in one week!). We moved about the room, scribbling on the tablecloths, making notes about: [a] what activity data universities have at their disposal; [b] what use we might put it to; and [c] what barriers are in our way.

In the afternoon: two more presentations. The OU’s Tony Hirst (a.k.a. @psychemedia), rattling and rambling through various techniques for visualising activity data. This is really valuable stuff… what I’m less clear about is: where’s the first rung of the dataviz ladder? How does a muggle start thinking about data visualisation? Tony says that many of the techniques he writes about are things he “didn’t know how to do a couple of hours before…“, but that doesn’t necessarily mean that the rest of us will find them as easy to pick up! Tony’s coming to Lincoln soon, so I’m going to try and talk to him about data visualisation a bit more then.

Last of all, David Kay (of SERO and the JISC activity data Synthesis Project: kind of an umbrella for all of these separate activity data initiatives) summed things up nicely: including an excellent slide listing the kinds of skills library workers are going to have to develop in order to do justice to activity data: including data visualisation, again! I’ll post that slide here, if and when I can find it.

There was a little bit of activity on Twitter for this workshop: look for the hashtag #iad11.


I’ll be making an impact in Leicester tomorrow

Posted on June 27th, 2011 by Paul Stainthorp

I’m at De Montfort University in Leicester tomorrow, giving a presentation at the CILIP UC&R East Midlands members’ event: Making an Impact. My presentation is about our involvement in the JISC-funded Library Impact Data Project (LIDP) with the University of Huddersfield. My slides are online.

If you want to skip the monkey and head straight for the organ-grinders, my presentation borrows fairly heavily from two documents produced by the LIDP project team at Huddersfield:

Library Impact Data Project: good news, everybody!

Posted on June 18th, 2011 by Paul Stainthorp

I think this is worth re-posting from the LIDP blog:

LIDP graphicWe are very pleased to report that we have now received all of the data from our partner organisations and have processed all but two already!

Early results are looking positive and our next step is to report back with a brief analysis to each institution. We are planning to give them our data and a general set of data so that they can compare and contrast. There have been some issues with the data, some of which has been described in previous blogs, however, we are confident we have enough to prove the hypothesis one way or another!

In our final project meeting in July we hope to make a decision on what form the data will take when released under an Open Data Commons Licence. If all the partners agree, we will release the data individually; otherwise we will release the general set for other to analyse further.

I submitted Lincoln’s data on 13 June. It consists of fully anonymised entries for 4,268 students who graduated from the University of Lincoln with a named award, at all levels of study, at the end of the academic year 2009/10 – along with a selection of their library activity over three* years (2007/08, 2008/09, 2009/10).

The library activity data represents:

  1. The number of library items (book loans etc.) issued to each student in each of the three years; taken from the circ_tran (“circulation transactions”, presumably) table within our SirsiDynix Horizon Library Management System (LMS). We also needed a copy of Horizon’s borrower table to associate each transaction with an identifiable student.
  2. The number of times each student visited our main GCW University Library, using their student ID card to pass through the Library’s access control gates in each of the three* years; taken directly from our ‘Sentry’ access control/turnstile system. These data apply only to the main GCW University Library: there is no access control at the University of Lincoln’s other four campus libraries, so many students have ’0′ for these data. Thanks are due to my colleague Dave Masterson from the Hull Campus Library, who came in early one day, well before any students arrived, in order to break in to the Sentry system and extract this data!
  3. The number of times each student was authenticated against an electronic resource via AthensDA; taken from our Portal server access logs. Although by no means all of our e-resources go via Athens, we’re relying on it as a sort of proxy for e-resource usage more generally. Thanks to Tim Simmonds of the Online Services Team (ICT) for recovering these logs from the UL data archive.

I had also hoped to provide numbers of PC/network logins for the same students for the same three years (as Huddersfield themselves have done), but this proved impossible. We do have network login data from 2007-, but while we can associate logins with PCs in the Library for our current PCs, we can’t say with any confidence whether a login to the network in 2007-2010 occurred within the Library or elsewhere: PCs have just been moved around too much in the last four years.

Student data itself—including the ‘primary key’ of the student account ID—was kindly supplied by our Registry department from the University’s QLS student records management system.

Once we’d gathered all these various datasets together, I prevailed upon Alex Bilbie to collate them into one huge .csv file: this he did by knocking up a quick SQL database on his laptop (he’s that kind of developer), rather than the laborious Excel-heavy approach using nested COUNTIF statements which would have been my solution. (I did have a go at this method—it clearly worked well for at least one of the other LIDP partners—but it my PC nearly melted under the strain.)

The final .csv data has gone to Huddersfield for analysis and a copy is lodged in our Repository for safe keeping. Once the agreement has been made to release the LIDP data under an open licence, I’ll make the Repository copy publicly accessible.

*N.B. In the end, there was no visitor data for the year 2007/08: the access control / visitor data for that year was missing for almost all students. This may correspond to a re-issuing of library access cards for all users around that time, or the data may be missing for some other reason.

Options for reading list management: LIG

Posted on June 18th, 2011 by Paul Stainthorp

InnovationAt our Library Innovation Group (LIG) meeting this coming Monday (20 June), we’re going to be taking a fresh look at how we support the use of online reading lists in the University of Lincoln.

At the moment, we use a reading list product called LearnBuild LibraryLink, which integrates nicely with our Blackboard VLE and allows subject librarians to keep on top of multiple lists. However, it’s fair to say it’s not always the easiest software to use. Here are my instructions on maintaining reading lists in LibraryLink [PDF].

When I gave a presentation about our experiences of using reading list software at the second ‘Innovations in Reference Management‘ event last year (#irm10), Owen Stephens the event organiser liveblogged our situation quite nicely:

Paul reflecting that Lincoln only partially successful in implementing ‘reading lists’.

University of Lincoln – bought reading list system, funds were only available for short period, so had limited time to assess full requirements and how far chosen product met their requirements.


  • filled a void
  • improved consistency
  • gave library an ‘in’ on launch of new VLE (Blackboard)
  • hundreds of modules linked in by 2000
  • students are using them – have usage stats from both LearnBuild and Blackboard
  • some simple stock-demand prediction

Unfortunately there were quite a few areas not so successful:

  • not intuitive; time-consuming
  • software not being developed
  • no community of users
  • competing developements (EPrints, digitisation, OPAC, RefWorks)
  • too closely linked to Blackboard module system
  • Subject libraries don’t like it, but lack of uptake from academics means that it is the subject librarians who end up doing the work.

However, unless library can demonstrate success, unlikely to get money to buy better system… So library putting more effort into make it work.

So: on Monday, I’m hoping to kick off a discussion by giving a quick run-through of the various online reading list management options available to UK Higher Education libraries. These screenshot slides (which are a visual aid / aide mémoire rather than a proper presentation) list the various products and approaches to reading list management. Some are commercial software projects; others are Open Source projects; still others are being developed in-house at various universities (and are not necessarily available for the University of Lincoln to use – e.g. the University of Huddersfield’s MyReading Project); there are a couple of wildcard solutions in there too.

Here are the slides:

10 practical & accessible library technology blogs

Posted on June 17th, 2011 by Paul Stainthorp

