Posts Tagged ‘recommendation’

1.8 million library loans from the University of Lincoln under CC0 – Copac Activity Data/SALT2 project

Posted on May 16th, 2012 by Paul Stainthorp

Today we published data on approximately 1.8 million items loaned from the University of Lincoln’s libraries since 2001. The data is available to re-use under a CC0 licence, and can be downloaded from:

We’ve done this as part of our involvement in the Copac Activity Data Project, a.k.a. SALT2. Along with data from the universities of Manchester, Sussex, Cambridge and Huddersfield, our circulation data will be used to power a ‘recommender API‘, which libraries will be able to use to build “People who borrowed X also borrowed Y“-type services. The API will benefit from the power of aggregated data from multiple institutions of different types, containing tens of millions of circulation events.

You’ll notice as well that we’ve chosen to host the data on our brand-new Orbital (v0.1) research data management application. Each dataset has a persistent citable URI. We’ll be keeping the data up-to-date, and generating a new activity data file from our library circulation logs shortly after the end of each academic year.

The data consists of a number of CSV files (one for each academic year since 2000-01, plus a huge file of all the data), containing the following fields:

Field index Field name Description
0 CREATE_DATE The date and time of the loan event, in the format: dd/mm/yyyy hh:mm
1 BORROWER_ID A cryptographic hash of the internal system ID associated with the borrower of the item, as used in the University of Lincoln’s library system.
2 WORK_ID A cryptographic hash of the internal system ID associated with the bibliographic work borrowed, as used in the University of Lincoln’s library system.
3 CONTROL_NUMBER The ISBN of the work borrowed (10 or 13 digits).
4 AUTHOR_DISPLAY The main author of the work borrowed.
5 TITLE_DISPLAY The title of the work.
6 PUB_DATE The publication year of the work in the form: yyyy

I’ll blog in detail another time about exactly how we created the data extracts. In short:

  1. There is a table in the SirsiDynix Horizon library management system called circ_tran which records every instance of item number X borrowed by user number Y at time Z. [#1]
  2. There is another table which provides a lookup between item numbers and the numbers of the bibliographic works of which they are a copy. [#2]
  3. Dave Pattern at the University of Huddersfield wrote a Perl script which scrapes all the bibliographic data (title, author, ISBN) for each work from our OPAC (Horizon Information Portal) and writes it to a text file. [#3]
  4. Developer, Jamie Mahoney of CERD/LNCD then stepped in, using some pretty heavy SQL on the original 3 data extracts, to:
    • Hash the internal Horizon user and work ID numbers to provide anonymity;
    • Convert the internal Horizon date and time stamps in extract [#1] from a version of Unix time into a readable datestamp (formula hint: cko_date*86400 + cko_time*60);
    • Used the item/work lookup table [#2] to pull in the bibliographic details for each loan in [#1] from the bibliographic table [#3] (an epic SQL JOIN query), removing items which are no longer represented in our library system;
    • Removed any items without an ISBN, which are of no use to the SALT recommender API;
    • Tweaked the punctuation and formatting;
    • Split the data into separate files for each year.

Once again, the data is at:

Thanks are due to Chris Leach and Dave Pattern for Horizon-fu, and to Jamie Mahoney for his patient wrangling of several millions of lines of data!

You can find out more about the Copac Activity Data Project/SALT2, at: http://copac.ac.uk/innovations/activity-data/

Ook Nog! Ook Nog! University of Liverpool student team win #DevXS library activity data prize

Posted on November 19th, 2011 by Paul Stainthorp

Four students from the University of Liverpool calling themselves Team Ook Nog took the prize for the best use of library activity data at last weekend’s DevXS student hackathon in Lincoln. Their application used the openly-licensed national OpenURL router data from EDINA and used it to build a search/recommendation tool for scholarly journal articles. You can see the fruits of their labour here

#DevXS - Team Boss Ook Nog

Jude-Thaddeus Ojiaku, Andrew Collins, Arnoud Pastink and Thomas Gorry built the Ook Nog site in a marathon development session over 30 hours in the Engine Shed. A simple Google-like search box (very Google-like!) displays results of articles and books derived solely from the OpenURL router data (example); each result has context-sensitive links out to dx.doi.org, OCLC firstsearch, CORE repository search, and Google Scholar. Clicking on any search result shows a chart of activity for that article, along with “See Also…” suggestions for other articles accessed by the same user in a similar timeframe. Take a look at the results.

From the DevXS wiki:

“Ook Nog is an interface for the data provided by openurl allowing you to search all of the data for any term and find search terms within their archive. By selecting any prior search term, you can then browse all search terms that were also performed by that user(s) within a small time period.

“All publications/searches are nodes. A node shares an edge with another node if a user has searched both nodes. We try to increase the chance of relevance by only showing neighbours of a node that were formed +- 90 days (a semester!).

“Despite no further tests of relevancy, the searches/publications found can be surprisingly similar (or amusing).”

The team from Liverpool pipped their traditional regional rivals to the library prize – Team MCR, made up of student developers from 3 different Manchester universities (University of Manchester, Manchester Metropolitan University, University of Salford). Team MCR built a working DevXS library app based around course reading lists with some interesting social ranking features, designed with great care using the Balsamiq wireframe UI tool, and making use of several open bibliographic datasets including the MOSAIC project data and Cambridge University Library’s search APIs. For their trouble, they picked up the #DevXS ‘social’ prize, awarded by the University of Lincoln Social Research Centre (LiSC).

DevXS was brilliant. Thanks again to Ian Snowley for the idea of donating a University of Lincoln Library prize. £250 in Amazon vouchers are on their way to Liverpool now.

Notes on: Ex Libris Primo

Posted on July 8th, 2011 by Paul Stainthorp

Primo is library software group Ex Libris‘s umbrella, “one-stop solution for the discovery and delivery of local and remote resources, such as books, journal articles, and digital objects.” It’s used by around 20 institutions in the UK, and ~800 worldwide.

Information about Primo is available at: http://www.exlibrisgroup.com/category/PrimoOverview

A couple of other useful links:

  • Slides – redacted for confidentiality
  • Discovery‘ on the SCONUL Higher Education Library Technology (HELibTech) wiki

The development of Primo marked a move away from the existing, Z39.50-intensive, metasearch model of unified resource discovery, to the use of a hosted, central metadata index of scholarly content (Ex Libris call this the Primo Central Index), characterised by unified discovery & delivery; faceted navigation; and usage-based recommendation.

Primo features include:

  • Import of local data data sources (catalogues; repositories) to a standardised XML format to allow cross-collection searching;
  • Ranking of printed, electronic and locally born-digital or digitised content, configurable by the subscribing library;
  • Integration with the OPAC – stronger integration for libraries that use one of Ex Libris’s own Library Management Systems; less-tight integration is possible for ‘foreign’ OPACs;
  • Integration with Ex Libris’s bX usage-based journal article recommendation service, which derives recommendations from the ‘user journey’ from article-to-article;
  • FRBRised grouping of similar titles in search results;
  • Facets derived from both the Primo Central Index and from locally-harvested data: for example, a facet could be configured to allow users to limit a search to only those items which are available in the OPAC;
  • Tools to embed the Primo search box in remote web sites (VLE, intranet, etc.);
  • An ‘open’ platform for development (including a suite of Primo APIs) – the EL Commons;
  • A mobile-friendly UI (e.g. this example from Germany).

Higher Education libraries in the UK using Primo include:

…and outside the UK:

Ex Libris are also developing Alma – which does for the ‘back end’ of library systems architecture what Primo does for the front end discovery UI – i.e. provides ‘umbrella’, unified management of print, electronic, and digitised/digital resources in the one system. In the UK, the University of York are ‘early adopters’ of Alma. Information about Alma is available at: http://www.exlibrisgroup.com/category/AlmaOverview