Library Data Impact Project
De Montfort University
Lecture Theatre, 00.11
Tuesday 17th July 2012
2.00 – 4.00 pm
(Light refreshments available from 1.45pm)
The JISC Library Data Impact Project proved a statistically significant correlation between library usage and student attainment. Two universities in the region, De Montfort and Lincoln, participated in the project and will present on their approaches to the collection of library activity data and the analysis and dissemination of the results. There will also be an opportunity for participants to discuss the practicalities and value of gathering and using such data, within our libraries and the wider institution.
Phil Adams, Senior Assistant Librarian, De Montfort University
Marie Letzgus, Senior Assistant Librarian, De Montfort University
Paul Stainthorp, Electronic Resources Librarian, University of Lincoln
Contact your EMALINK representative to book a place by Wednesday 11th July. There are three places available per institution.
The DMU campus is a 15-20 minute walk from Leicester train station. Limited visitor parking is available on campus – please advise your EMALINK rep on booking if you wish to request a parking space. Campus maps available from: www.dmu.ac.uk/about-dmu/how-to-find-us.aspx
Posts Tagged ‘activity data’
1.8 million library loans from the University of Lincoln under CC0 – Copac Activity Data/SALT2 projectPosted on May 16th, 2012 by Paul Stainthorp
Today we published data on approximately 1.8 million items loaned from the University of Lincoln’s libraries since 2001. The data is available to re-use under a CC0 licence, and can be downloaded from:
We’ve done this as part of our involvement in the Copac Activity Data Project, a.k.a. SALT2. Along with data from the universities of Manchester, Sussex, Cambridge and Huddersfield, our circulation data will be used to power a ‘recommender API‘, which libraries will be able to use to build “People who borrowed X also borrowed Y“-type services. The API will benefit from the power of aggregated data from multiple institutions of different types, containing tens of millions of circulation events.
You’ll notice as well that we’ve chosen to host the data on our brand-new Orbital (v0.1) research data management application. Each dataset has a persistent citable URI. We’ll be keeping the data up-to-date, and generating a new activity data file from our library circulation logs shortly after the end of each academic year.
The data consists of a number of CSV files (one for each academic year since 2000-01, plus a huge file of all the data), containing the following fields:
|Field index||Field name||Description|
|0||CREATE_DATE||The date and time of the loan event, in the format: dd/mm/yyyy hh:mm|
|1||BORROWER_ID||A cryptographic hash of the internal system ID associated with the borrower of the item, as used in the University of Lincoln’s library system.|
|2||WORK_ID||A cryptographic hash of the internal system ID associated with the bibliographic work borrowed, as used in the University of Lincoln’s library system.|
|3||CONTROL_NUMBER||The ISBN of the work borrowed (10 or 13 digits).|
|4||AUTHOR_DISPLAY||The main author of the work borrowed.|
|5||TITLE_DISPLAY||The title of the work.|
|6||PUB_DATE||The publication year of the work in the form: yyyy|
I’ll blog in detail another time about exactly how we created the data extracts. In short:
- There is a table in the SirsiDynix Horizon library management system called circ_tran which records every instance of item number X borrowed by user number Y at time Z. [#1]
- There is another table which provides a lookup between item numbers and the numbers of the bibliographic works of which they are a copy. [#2]
- Dave Pattern at the University of Huddersfield wrote a Perl script which scrapes all the bibliographic data (title, author, ISBN) for each work from our OPAC (Horizon Information Portal) and writes it to a text file. [#3]
- Developer, Jamie Mahoney of CERD/LNCD then stepped in, using some pretty heavy SQL on the original 3 data extracts, to:
- Hash the internal Horizon user and work ID numbers to provide anonymity;
- Convert the internal Horizon date and time stamps in extract [#1] from a version of Unix time into a readable datestamp (formula hint: cko_date*86400 + cko_time*60);
- Used the item/work lookup table [#2] to pull in the bibliographic details for each loan in [#1] from the bibliographic table [#3] (an epic SQL JOIN query), removing items which are no longer represented in our library system;
- Removed any items without an ISBN, which are of no use to the SALT recommender API;
- Tweaked the punctuation and formatting;
- Split the data into separate files for each year.
Once again, the data is at:
Thanks are due to Chris Leach and Dave Pattern for Horizon-fu, and to Jamie Mahoney for his patient wrangling of several millions of lines of data!
You can find out more about the Copac Activity Data Project/SALT2, at: http://copac.ac.uk/innovations/activity-data/
Four students from the University of Liverpool calling themselves Team Ook Nog took the prize for the best use of library activity data at last weekend’s DevXS student hackathon in Lincoln. Their application used the openly-licensed national OpenURL router data from EDINA and used it to build a search/recommendation tool for scholarly journal articles. You can see the fruits of their labour here…
Jude-Thaddeus Ojiaku, Andrew Collins, Arnoud Pastink and Thomas Gorry built the Ook Nog site in a marathon development session over 30 hours in the Engine Shed. A simple Google-like search box (very Google-like!) displays results of articles and books derived solely from the OpenURL router data (example); each result has context-sensitive links out to dx.doi.org, OCLC firstsearch, CORE repository search, and Google Scholar. Clicking on any search result shows a chart of activity for that article, along with “See Also…” suggestions for other articles accessed by the same user in a similar timeframe. Take a look at the results.
From the DevXS wiki:
“Ook Nog is an interface for the data provided by openurl allowing you to search all of the data for any term and find search terms within their archive. By selecting any prior search term, you can then browse all search terms that were also performed by that user(s) within a small time period.
“All publications/searches are nodes. A node shares an edge with another node if a user has searched both nodes. We try to increase the chance of relevance by only showing neighbours of a node that were formed +- 90 days (a semester!).
“Despite no further tests of relevancy, the searches/publications found can be surprisingly similar (or amusing).”
The team from Liverpool pipped their traditional regional rivals to the library prize – Team MCR, made up of student developers from 3 different Manchester universities (University of Manchester, Manchester Metropolitan University, University of Salford). Team MCR built a working DevXS library app based around course reading lists with some interesting social ranking features, designed with great care using the Balsamiq wireframe UI tool, and making use of several open bibliographic datasets including the MOSAIC project data and Cambridge University Library’s search APIs. For their trouble, they picked up the #DevXS ‘social’ prize, awarded by the University of Lincoln Social Research Centre (LiSC).
Library activity data
The University of Lincoln Library (http://www.library.lincoln.ac.uk/) are sponsoring a £250 Amazon voucher prize (5 x £50 vouchers), which will be awarded to the team making the best use of library activity data as part of the application(s) they develop over the weekend. See the Data page on the wiki for examples of freely-available library activity data.
(DevXS is a developer marathon spread across three days, where students from across the UK and beyond are encouraged to team up and build cool things that contribute to university life. DevXS is about students sharing their ideas, mashing up data and building prototypes that improve, challenge and positively disrupt the research, teaching and learning landscapes of further and higher education.)
The Library Impact Data Project (LIDP), which ran from February-July this year, and in which the University of Lincoln took part, has now released a subset of the library activity data used in the analysis (which, you’ll remember, showed a statistically significant correlation across a number of universities between library activity data and student attainment).
Lincoln’s data is included in the release, which is available for re-use under an open licence, from:
This data set is made available under the Open Data Commons Attribution License
The data contains final grade and library usage figures for 33,074 students studying undergraduate degrees at UK universities. More information on the data, and how it’s been generalised in order to preserve students’ anonymity, on the LIDP project blog.
- There’s also a detailed report about the statistical breakdown of Lincoln’s own share of the data (this wasn’t published as part of the project reports, as it was down to each individual institution whether to make it public or not) – I’ve made the report available here [PDF].
Thanks again to Graham, Bryony and Dave at the University of Huddersfield for inviting Lincoln to take part in the project, and for their help along the way!
On to the next one…
At the suggestion of the University Librarian Ian Snowley, the University of Lincoln Library are sponsoring a £250 developer prize at the DevXS student developer hackathon in November. The moolah will go to the winners of a library-flavoured developer competition at DevXS, based around the best use of activity data (details tba).
DevXS is free! It’s open to all undergraduate and postgraduate students, and it’s taking place in Lincoln on the 11th, 12th and 13th of November. Registration is now open. Find out more at devxs.org or by following @devxsconf on Twitter.
DevXS is a developer marathon spread across three days, where students from across the UK and beyond are encouraged to team up and build cool things that contribute to university life.
DevXS is about students sharing their ideas, mashing up data and building prototypes that improve, challenge and positively disrupt the research, teaching and learning landscapes of further and higher education.
We’re going to award prizes to the best ideas, prototypes and collaborations and there are going to be developers from universities around the country hanging around to help you out.
Sound awesome? Register now! It’s free!
LIDP was successful in proving that:
“There is statistically significant relationship between both book loans and e-resources use and student attainment. And this is true across all of the universities in the study that provided data in these areas.
“We want to stress here again that we realise THIS IS NOT A CAUSAL RELATIONSHIP! Other factors make a difference to student achievement, and there are always exceptions to the rule, but we have been able to link use of library resources to academic achievement.”
An initial (outline) report on how the University of Lincoln’s own activity-attainment holds up to this same statistical inspection is available to download from here [PDF]. As much as possible of the library activity data used in the project will be released under an Open Data Commons Attribution License in the near future, and hosted on the project blog.
Thanks are due to Graham Stone, Dave Pattern, Bryony Ramsden, and all the project partners for the opportunity for Lincoln to participate in this project. We had fun getting our together. The end-of-project blog post for LIDP is here – it suggests some very interesting areas for further investigation.
Personally, I’m very interested in looking for cross-institutional comparisons – perhaps trying to explain particular levels of activity-attainment attached to individual subject areas, irrespective of which university the student is at (i.e. does a Lincoln computing student have more in common with a Lincoln business student, or with a Huddersfield computing student?). I’d also be interested in looking particularly at those students whose library activity behaviour changes through the life of their course, and who then go on to get a better degree than they might have been predicted based on their library activity in their first year.
“Finally, we have been astonished by how much interest there has been in our project. To date we have two articles ready for publication imminently and have another 2 in the pipeline. In addition by the end of October we will have delivered 11 conference papers on the project. All articles and conference presentations are accessibly at: http://library.hud.ac.uk/blogs/projects/lidp/articles-and-conference-papers/“
I can see this project getting cited, and cited again, simply every time anyone wants to argue that academic libraries are A Good Thing.
Anonymised library activity data for the academic years 2007/08, 2008/09 and 2009/10: collected for the JISC Library Impact Data ProjectPosted on June 13th, 2011 by Paul Stainthorp
These data consist of entries for 4,268 anonymised students who graduated from the University of Lincoln with a named award at the end of the academic year 2009/10, along with a selection of their library activity over three years (2007/08, 2008/09, 2009/10): library item circulation, visits to the main GCW University Library, and e-resources usage represented by authentication against AthensDA.
View this item on the University Repository: http://eprints.lincoln.ac.uk/4540/