Library Data Impact Project
De Montfort University
Lecture Theatre, 00.11
Tuesday 17th July 2012
2.00 – 4.00 pm
(Light refreshments available from 1.45pm)
The JISC Library Data Impact Project proved a statistically significant correlation between library usage and student attainment. Two universities in the region, De Montfort and Lincoln, participated in the project and will present on their approaches to the collection of library activity data and the analysis and dissemination of the results. There will also be an opportunity for participants to discuss the practicalities and value of gathering and using such data, within our libraries and the wider institution.
Phil Adams, Senior Assistant Librarian, De Montfort University
Marie Letzgus, Senior Assistant Librarian, De Montfort University
Paul Stainthorp, Electronic Resources Librarian, University of Lincoln
Contact your EMALINK representative to book a place by Wednesday 11th July. There are three places available per institution.
The DMU campus is a 15-20 minute walk from Leicester train station. Limited visitor parking is available on campus – please advise your EMALINK rep on booking if you wish to request a parking space. Campus maps available from: www.dmu.ac.uk/about-dmu/how-to-find-us.aspx
Posts Tagged ‘University of Huddersfield’
1.8 million library loans from the University of Lincoln under CC0 – Copac Activity Data/SALT2 projectPosted on May 16th, 2012 by Paul Stainthorp
Today we published data on approximately 1.8 million items loaned from the University of Lincoln’s libraries since 2001. The data is available to re-use under a CC0 licence, and can be downloaded from:
We’ve done this as part of our involvement in the Copac Activity Data Project, a.k.a. SALT2. Along with data from the universities of Manchester, Sussex, Cambridge and Huddersfield, our circulation data will be used to power a ‘recommender API‘, which libraries will be able to use to build “People who borrowed X also borrowed Y“-type services. The API will benefit from the power of aggregated data from multiple institutions of different types, containing tens of millions of circulation events.
You’ll notice as well that we’ve chosen to host the data on our brand-new Orbital (v0.1) research data management application. Each dataset has a persistent citable URI. We’ll be keeping the data up-to-date, and generating a new activity data file from our library circulation logs shortly after the end of each academic year.
The data consists of a number of CSV files (one for each academic year since 2000-01, plus a huge file of all the data), containing the following fields:
|Field index||Field name||Description|
|0||CREATE_DATE||The date and time of the loan event, in the format: dd/mm/yyyy hh:mm|
|1||BORROWER_ID||A cryptographic hash of the internal system ID associated with the borrower of the item, as used in the University of Lincoln’s library system.|
|2||WORK_ID||A cryptographic hash of the internal system ID associated with the bibliographic work borrowed, as used in the University of Lincoln’s library system.|
|3||CONTROL_NUMBER||The ISBN of the work borrowed (10 or 13 digits).|
|4||AUTHOR_DISPLAY||The main author of the work borrowed.|
|5||TITLE_DISPLAY||The title of the work.|
|6||PUB_DATE||The publication year of the work in the form: yyyy|
I’ll blog in detail another time about exactly how we created the data extracts. In short:
- There is a table in the SirsiDynix Horizon library management system called circ_tran which records every instance of item number X borrowed by user number Y at time Z. [#1]
- There is another table which provides a lookup between item numbers and the numbers of the bibliographic works of which they are a copy. [#2]
- Dave Pattern at the University of Huddersfield wrote a Perl script which scrapes all the bibliographic data (title, author, ISBN) for each work from our OPAC (Horizon Information Portal) and writes it to a text file. [#3]
- Developer, Jamie Mahoney of CERD/LNCD then stepped in, using some pretty heavy SQL on the original 3 data extracts, to:
- Hash the internal Horizon user and work ID numbers to provide anonymity;
- Convert the internal Horizon date and time stamps in extract [#1] from a version of Unix time into a readable datestamp (formula hint: cko_date*86400 + cko_time*60);
- Used the item/work lookup table [#2] to pull in the bibliographic details for each loan in [#1] from the bibliographic table [#3] (an epic SQL JOIN query), removing items which are no longer represented in our library system;
- Removed any items without an ISBN, which are of no use to the SALT recommender API;
- Tweaked the punctuation and formatting;
- Split the data into separate files for each year.
Once again, the data is at:
Thanks are due to Chris Leach and Dave Pattern for Horizon-fu, and to Jamie Mahoney for his patient wrangling of several millions of lines of data!
You can find out more about the Copac Activity Data Project/SALT2, at: http://copac.ac.uk/innovations/activity-data/
The Library Impact Data Project (LIDP), which ran from February-July this year, and in which the University of Lincoln took part, has now released a subset of the library activity data used in the analysis (which, you’ll remember, showed a statistically significant correlation across a number of universities between library activity data and student attainment).
Lincoln’s data is included in the release, which is available for re-use under an open licence, from:
This data set is made available under the Open Data Commons Attribution License
The data contains final grade and library usage figures for 33,074 students studying undergraduate degrees at UK universities. More information on the data, and how it’s been generalised in order to preserve students’ anonymity, on the LIDP project blog.
- There’s also a detailed report about the statistical breakdown of Lincoln’s own share of the data (this wasn’t published as part of the project reports, as it was down to each individual institution whether to make it public or not) – I’ve made the report available here [PDF].
Thanks again to Graham, Bryony and Dave at the University of Huddersfield for inviting Lincoln to take part in the project, and for their help along the way!
On to the next one…
LIDP was successful in proving that:
“There is statistically significant relationship between both book loans and e-resources use and student attainment. And this is true across all of the universities in the study that provided data in these areas.
“We want to stress here again that we realise THIS IS NOT A CAUSAL RELATIONSHIP! Other factors make a difference to student achievement, and there are always exceptions to the rule, but we have been able to link use of library resources to academic achievement.”
An initial (outline) report on how the University of Lincoln’s own activity-attainment holds up to this same statistical inspection is available to download from here [PDF]. As much as possible of the library activity data used in the project will be released under an Open Data Commons Attribution License in the near future, and hosted on the project blog.
Thanks are due to Graham Stone, Dave Pattern, Bryony Ramsden, and all the project partners for the opportunity for Lincoln to participate in this project. We had fun getting our together. The end-of-project blog post for LIDP is here – it suggests some very interesting areas for further investigation.
Personally, I’m very interested in looking for cross-institutional comparisons – perhaps trying to explain particular levels of activity-attainment attached to individual subject areas, irrespective of which university the student is at (i.e. does a Lincoln computing student have more in common with a Lincoln business student, or with a Huddersfield computing student?). I’d also be interested in looking particularly at those students whose library activity behaviour changes through the life of their course, and who then go on to get a better degree than they might have been predicted based on their library activity in their first year.
“Finally, we have been astonished by how much interest there has been in our project. To date we have two articles ready for publication imminently and have another 2 in the pipeline. In addition by the end of October we will have delivered 11 conference papers on the project. All articles and conference presentations are accessibly at: http://library.hud.ac.uk/blogs/projects/lidp/articles-and-conference-papers/“
I can see this project getting cited, and cited again, simply every time anyone wants to argue that academic libraries are A Good Thing.
I’m at De Montfort University in Leicester tomorrow, giving a presentation at the CILIP UC&R East Midlands members’ event: Making an Impact. My presentation is about our involvement in the JISC-funded Library Impact Data Project (LIDP) with the University of Huddersfield. My slides are online.
If you want to skip the monkey and head straight for the organ-grinders, my presentation borrows fairly heavily from two documents produced by the LIDP project team at Huddersfield:
- Stone, G. (2011) Looking for the link between library usage and student attainment. In: CILIPS Annual Conference, 7 June 2011, University of Glasgow. (Unpublished)
- Pattern, D. (2011) If you want to get laid, go to college… In: Welsh Libraries, Archives and Museums Conference, 12-13 May 2011, The Metropole, Llandrindod, Wales. (Unpublished)
I think this is worth re-posting from the LIDP blog:
We are very pleased to report that we have now received all of the data from our partner organisations and have processed all but two already!
Early results are looking positive and our next step is to report back with a brief analysis to each institution. We are planning to give them our data and a general set of data so that they can compare and contrast. There have been some issues with the data, some of which has been described in previous blogs, however, we are confident we have enough to prove the hypothesis one way or another!
In our final project meeting in July we hope to make a decision on what form the data will take when released under an Open Data Commons Licence. If all the partners agree, we will release the data individually; otherwise we will release the general set for other to analyse further.
I submitted Lincoln’s data on 13 June. It consists of fully anonymised entries for 4,268 students who graduated from the University of Lincoln with a named award, at all levels of study, at the end of the academic year 2009/10 – along with a selection of their library activity over three* years (2007/08, 2008/09, 2009/10).
The library activity data represents:
- The number of library items (book loans etc.) issued to each student in each of the three years; taken from the circ_tran (“circulation transactions”, presumably) table within our SirsiDynix Horizon Library Management System (LMS). We also needed a copy of Horizon’s borrower table to associate each transaction with an identifiable student.
- The number of times each student visited our main GCW University Library, using their student ID card to pass through the Library’s access control gates in each of the three* years; taken directly from our ‘Sentry’ access control/turnstile system. These data apply only to the main GCW University Library: there is no access control at the University of Lincoln’s other four campus libraries, so many students have ’0′ for these data. Thanks are due to my colleague Dave Masterson from the Hull Campus Library, who came in early one day, well before any students arrived, in order to break in to the Sentry system and extract this data!
- The number of times each student was authenticated against an electronic resource via AthensDA; taken from our Portal server access logs. Although by no means all of our e-resources go via Athens, we’re relying on it as a sort of proxy for e-resource usage more generally. Thanks to Tim Simmonds of the Online Services Team (ICT) for recovering these logs from the UL data archive.
I had also hoped to provide numbers of PC/network logins for the same students for the same three years (as Huddersfield themselves have done), but this proved impossible. We do have network login data from 2007-, but while we can associate logins with PCs in the Library for our current PCs, we can’t say with any confidence whether a login to the network in 2007-2010 occurred within the Library or elsewhere: PCs have just been moved around too much in the last four years.
Student data itself—including the ‘primary key’ of the student account ID—was kindly supplied by our Registry department from the University’s QLS student records management system.
Once we’d gathered all these various datasets together, I prevailed upon Alex Bilbie to collate them into one huge .csv file: this he did by knocking up a quick SQL database on his laptop (he’s that kind of developer), rather than the laborious Excel-heavy approach using nested COUNTIF statements which would have been my solution. (I did have a go at this method—it clearly worked well for at least one of the other LIDP partners—but it my PC nearly melted under the strain.)
The final .csv data has gone to Huddersfield for analysis and a copy is lodged in our Repository for safe keeping. Once the agreement has been made to release the LIDP data under an open licence, I’ll make the Repository copy publicly accessible.
*N.B. In the end, there was no visitor data for the year 2007/08: the access control / visitor data for that year was missing for almost all students. This may correspond to a re-issuing of library access cards for all users around that time, or the data may be missing for some other reason.
Here are ten of the best practical library tech blogs that I follow. They’re all about technology (ish), but they’re not geeky or inaccessible. Most but not all, are written by people in of UK Higher Education libraries. In case you want to subscribe to them en masse, I’ve bundled them up into an OPML file which you should be able to import into a feed reader (e.g. Google Reader).
Q. Have you got a good library technology blog? Care to share?
- Copac Developments
What’s happening behind the scenes at Copac
- Electronic Resources Blog
Library Services, University of Huddersfield
eLibrary team, Birmingham City University
- Fulup’s blog
A librarian at De Montfort University
- Musings around librarianship
Aaron Tay, a librarian at the National University of Singapore
- NewT Bham – where technology and libraries meet
New Technologies Group at the University of Birmingham Library
- Phil Bradley’s Weblog
Internet consultant and (2011) CILIP Vice-president
- ResourceShelf ResourceBlog
“We find the sources; you get the credit!“
- “Self-plagiarism is style” – Dave Pattern’s blog
Library Systems Manager at the University of Huddersfield
- UoL Library Blog – develop, debate, innovate
University of Leicester
I’m speaking at a CILIP UC&R East Midlands members’ event called “Making an Impact“, on Tuesday, 28 June, at De Montfort University in Leicester, about our involvement in Huddersfield’s JISC-funded Library Impact Data Project (LIDP).
Making an impact: The JISC Library Impact Data Project
Paul Stainthorp will give an overview of the JISC-funded Library Impact Data Project (LIDP). This project, led by the University of Huddersfield, is testing the hypothesis that there is ‘a statistically significant correlation across a number of universities between library activity data and student attainment’. To do this, the team is gathering and analysing library activity data (book loans, gate count figures, e-resource accesses, PC logins) from eight UK university libraries, and comparing that data with student attainment. Paul is the electronic resources librarian at the University of Lincoln and currently project manager for the JISC-funded resource discovery project ‘Jerome’.