Posts Tagged ‘data’

The data! The data!

Posted on October 3rd, 2011 by Paul Stainthorp

The Library Impact Data Project (LIDP), which ran from February-July this year, and in which the University of Lincoln took part, has now released a subset of the library activity data used in the analysis (which, you’ll remember, showed a statistically significant correlation across a number of universities between library activity data and student attainment).

Lincoln’s data is included in the release, which is available for re-use under an open licence, from:

http://eprints.hud.ac.uk/11543/

This data set is made available under the Open Data Commons Attribution License
http://opendatacommons.org/licenses/by/1.0/

The data contains final grade and library usage figures for 33,074 students studying undergraduate degrees at UK universities. More information on the data, and how it’s been generalised in order to preserve students’ anonymity, on the LIDP project blog.

  • There’s also a detailed report about the statistical breakdown of Lincoln’s own share of the data (this wasn’t published as part of the project reports, as it was down to each individual institution whether to make it public or not) – I’ve made the report available here [PDF].

The LIDP blog also contains information about the project ‘toolkit‘, developed to assist other institutions who may want to test their own data against the LIDP’s hypothesis, here and here.

Thanks again to Graham, Bryony and Dave at the University of Huddersfield for inviting Lincoln to take part in the project, and for their help along the way!

On to the next one…

Library Impact Data Project: good news, everybody!

Posted on June 18th, 2011 by Paul Stainthorp

I think this is worth re-posting from the LIDP blog:

LIDP graphicWe are very pleased to report that we have now received all of the data from our partner organisations and have processed all but two already!

Early results are looking positive and our next step is to report back with a brief analysis to each institution. We are planning to give them our data and a general set of data so that they can compare and contrast. There have been some issues with the data, some of which has been described in previous blogs, however, we are confident we have enough to prove the hypothesis one way or another!

In our final project meeting in July we hope to make a decision on what form the data will take when released under an Open Data Commons Licence. If all the partners agree, we will release the data individually; otherwise we will release the general set for other to analyse further.

I submitted Lincoln’s data on 13 June. It consists of fully anonymised entries for 4,268 students who graduated from the University of Lincoln with a named award, at all levels of study, at the end of the academic year 2009/10 – along with a selection of their library activity over three* years (2007/08, 2008/09, 2009/10).

The library activity data represents:

  1. The number of library items (book loans etc.) issued to each student in each of the three years; taken from the circ_tran (“circulation transactions”, presumably) table within our SirsiDynix Horizon Library Management System (LMS). We also needed a copy of Horizon’s borrower table to associate each transaction with an identifiable student.
  2. The number of times each student visited our main GCW University Library, using their student ID card to pass through the Library’s access control gates in each of the three* years; taken directly from our ‘Sentry’ access control/turnstile system. These data apply only to the main GCW University Library: there is no access control at the University of Lincoln’s other four campus libraries, so many students have ’0′ for these data. Thanks are due to my colleague Dave Masterson from the Hull Campus Library, who came in early one day, well before any students arrived, in order to break in to the Sentry system and extract this data!
  3. The number of times each student was authenticated against an electronic resource via AthensDA; taken from our Portal server access logs. Although by no means all of our e-resources go via Athens, we’re relying on it as a sort of proxy for e-resource usage more generally. Thanks to Tim Simmonds of the Online Services Team (ICT) for recovering these logs from the UL data archive.

I had also hoped to provide numbers of PC/network logins for the same students for the same three years (as Huddersfield themselves have done), but this proved impossible. We do have network login data from 2007-, but while we can associate logins with PCs in the Library for our current PCs, we can’t say with any confidence whether a login to the network in 2007-2010 occurred within the Library or elsewhere: PCs have just been moved around too much in the last four years.

Student data itself—including the ‘primary key’ of the student account ID—was kindly supplied by our Registry department from the University’s QLS student records management system.

Once we’d gathered all these various datasets together, I prevailed upon Alex Bilbie to collate them into one huge .csv file: this he did by knocking up a quick SQL database on his laptop (he’s that kind of developer), rather than the laborious Excel-heavy approach using nested COUNTIF statements which would have been my solution. (I did have a go at this method—it clearly worked well for at least one of the other LIDP partners—but it my PC nearly melted under the strain.)

The final .csv data has gone to Huddersfield for analysis and a copy is lodged in our Repository for safe keeping. Once the agreement has been made to release the LIDP data under an open licence, I’ll make the Repository copy publicly accessible.

*N.B. In the end, there was no visitor data for the year 2007/08: the access control / visitor data for that year was missing for almost all students. This may correspond to a re-issuing of library access cards for all users around that time, or the data may be missing for some other reason.

Anonymised library activity data for the academic years 2007/08, 2008/09 and 2009/10: collected for the JISC Library Impact Data Project

Posted on June 13th, 2011 by Paul Stainthorp

These data consist of entries for 4,268 anonymised students who graduated from the University of Lincoln with a named award at the end of the academic year 2009/10, along with a selection of their library activity over three years (2007/08, 2008/09, 2009/10): library item circulation, visits to the main GCW University Library, and e-resources usage represented by authentication against AthensDA.

View this item on the University Repository: http://eprints.lincoln.ac.uk/4540/

Inclusive practice, digital data, and e-books

Posted on April 7th, 2011 by Paul Stainthorp

Screenshot of the Blackboard PIP communityI attended Sue Watling‘s workshop, ‘Promoting Inclusive Practice with Digital Data‘, today. (I know that Sue has delivered the same workshop in the past to groups of Library staff.) There’s also a Blackboard community to accompany the workshop.

My particular interest in usability / accessibility / inclusive design, as Sue knows, is around the accessible nature (or otherwise) of Library-digitised and born-digital library subscription resources: e-books, e-journals, and material scanned and digitised under the CLA’s comprehensive HE licence.

In particular, Sue and I have had a number of conversations about the frustrations we share around digital texts: which ought to be inherently accessible and a great asset, but which in practice are often only available in a form (or via a platform) covered in barriers to accessibility. Also around the lack of importance which the University can seem to place on accessibility, usability and access issues.

A little while ago, Sue and I made a start on an e-book usability/accessibility reference guide. To my shame (because I do think it’s important, it’s something that doesn’t get a lot of attention, and it’s something I’m interested in) …I let it fall by the wayside.

I’ve made a start again! It’s made up of a table containing information about the features of the three Library e-book platforms which are available at the University of Lincoln, plus a guide to using e-books. Both parts are publicly-editable Google documents, so feel free to edit them.

L to the I to the D to the P

Posted on March 17th, 2011 by Paul Stainthorp

Representatives of the eight partner institutions in the JISC Activity Data LIDP (Library Impact Data Project) met in person (and in Huddersfield) for the first time last week.

Denby Dale from the trainFrom the project blog:

“In a packed agenda we discussed the project in detail – we’ll be blogging the minutes soon.

“We also approved the project plan and discussed the hypothesis in some detail – look out for our first blog on that soon too! We are now working on getting the focus group questions out to everyone in the next few days.”

The original project hypothesis bears repeating here: if we can prove that it stands up, it’s obviously of some significance to libraries in UKHE.

There is a statistically significant correlation across a number of universities between library activity data and student attainment.

N.B. that’s the first and ‘official’ version of the hypothesis, taken from the project proposal. The language may be tightened up a little bit in the final project plan – i.e. what do we mean by “student attainment” – what measurement of attainment are we taking? (It’s degree classification, btw.)

Project Partners:

  1. University of Huddersfield
  2. University of Bradford
  3. De Montfort University
  4. University of Exeter
  5. University of Lincoln
  6. Liverpool John Moores University
  7. University of Salford
  8. Teesside University

Links:

Library catalogue: Site Search analytics

Posted on March 17th, 2011 by Paul Stainthorp

A while ago (and, as with all things Horizon, with the help of Dave Pattern at Huddersfield), we enabled Google Analytics on our library OPAC (sometimes referred to as HiP, the “Horizon Information Portal“). This takes the form of a piece of Google JavaScript which lives in a ‘footer’ document common to all HiP pages.

Chris Leach gave a presentation about using Google Analytics with HiP at the last SirsiDynix Horizon User Group.

Now Nick Jackson has shown me how to enable Google’s Site Search features on our Analytics profile for the library catalogue. Site Search will allow us to ‘tease out’ the search activity within the library catalogue itself, by analysing the URL structure of HiP queries, recognising and extracting the search terms, then tracking the paths users take from those search queries to destination pages (i.e., individual bibliographic record pages on HiP).

For instance: a typical HiP search query ends up looking something like this:

http://www.library.lincoln.ac.uk/ipac20/ipac.jsp?session=1G00362045UH4.101795&menu=search&aspect=subtab13&npp=10&ipp=20&spp=20&profile=ln&ri=&index=.GW&term=journalism&x=0&y=0&aspect=subtab13

By telling Google Site Search to look in the query parameter “term” for the search keyword(s)—in this case journalism—and to ignore the “session” parameter, Google Analytics can start to group similar queries together and provide us with data about what our users are searching the catalogue for.

Screenshot of the setup page for Google Site Search

It’s been running for less than 24 hours, but already we’re starting to see build up a record of the keywords people are typing into the catalogue:

Screenshot of Google Site Search top search terms on HiP

What could we [and what should we] do with this data? Are there any Google Site Search experts out there who could give me a few tips? If anyone from within the Library at the University of Lincoln would be interested in helping to analyse the search term data, please let me or Chris know.

One thing we’ve already discussed is the idea of using the HiP search term activity as test data to ‘teach’ the Jerome machine intelligence engine about the kind of things Lincoln library users are interested in… this will help us in determining how the Jerome API’s personalisation features might be used to present and relevance-rank results.

Reporting on research

Posted on March 15th, 2011 by Paul Stainthorp

The University of Lincoln is now using the Lincoln Repository as the ‘system of record’ for monitoring the University’s research activity. Internal Quarterly Research Output Reports are being produced from repository data every three months, with a full quarter in hand (to give people time to deposit any items that they weren’t able to submit before publication).

This is being advertised on the internal staff daily alert email, with a banner image:

Banner image advertising the Quarterly Research Output Reports

The next report, for Q4 2010, will be generated from the Repository on 31 March 2011.

Repository team news & report on RSP Winter School #rspws11

Posted on February 24th, 2011 by Paul Stainthorp

The latest news from the Repository team at the University of Lincoln:

RSP Winter School 2011

I was lucky enough to attend the three-day Repositories Support Project Winter School (#rspws11), which this year was held in the impressive surroundings of Armathwaite Hall near Bassenthwaite in the Lake District. As you can see from my photos, it was a real hardship.

Avenue of trees #rspws11

The programme included a keynote address by the immensely switched-on Professor Martin Hall, V-c of the University of Salford (and the first UK V-c on Twitter!), which touched on archaeology, museums, data preservation, open access, mobile learning, and the meaning of the modern university. The remaining speakers and discussions over the three days seemed to relate to two main topics:

  1. Data preservation and OA to datasets: Max Wilkinson on the work of the British Library and the BL datasets programme (bl.uk/datasets); Miggie Pickton from the University of Northampton about their ‘KeepIt‘ project to preserve university research data.

The consensus about research data seems to be this: don’t rely on your existing processes for your ‘publications’ repository. Keep a clear wall between a publications repository and a data archive. The requirements for describing/cataloguing, preserving, and providing access (sensitive data, etc.) are all just too different for datasets and publications. Also, there seems to be a general agreement that a more national, shared approach is appropriate for datasets than the strongly institutional focus of publication repositories.

_DSC9268

  1. The options for CRISes and Repositories when gathering data for the REF: presentations from Keith Jeffery; Mark Cox

It slowly emerged that there seem to be at least two different approaches to REF data-preparation that universities are taking: some [generally large, research-intensive universities] are investing heavily in a CRIS (which is impacting on the role of the Repository); others [generally the smaller HEIs, though with notable exceptions] are developing and enhancing their existing Repository systems, and relying on EPrints/DSpace to do more heavy lifting.

Bassenthwaite Lake

Interestingly, there was relatively little talk of e-theses in all this. We did however manage to slip in an advert for the UKCoRR members’ meeting (tomorrow!)

Slides and notes from the various presentations and workshops are available to download from the RSP’s website.

Tweets bearing the Winter School’s hashtag #rspws11 are preserved in a Twapper Keeper archive.

Armathwaite Hall

Meanwhile, back home in Lincoln…

And at our regular Repository team meeting on Friday, 18 February. It seems to be a particularly busy time, Repository-wise, at the moment. Welcome to David Young who came to his first Friday team meeting.

Present: Bev Jones (BJ), Paul Stainthorp (PS), Rosaline Smith (RS), David Young (DY).

  1. We’ve hit 2,800 items on the Repository, which is a credit to Lincoln’s academic staff, as well as to the tireless efforts of RS and BJ! We’re aiming for 3,000 items by the end of April, 2011. If we hit that target, I’ll be doing some more baking.
  2. There are a number of useful training events on at the moment: some organised by the RSP (e.g. this one), as well as this extremely valuable-looking non-RSP event in Glasgow. Many of the events relate in some way to getting data in/out of repositories for REF purposes (c.f. the discussions at the Winter School, above). Unfortunately, Lincoln people aren’t able to attend many of these events, so PS and DY are going to meet to discuss the possibility of running/arranging a similar event in the East Midlands.
  3. The group discussed some EPrints tweaks: publisher search, the ability to ‘bounce’ a Repository record from one owner to another, the perennial unique author IDs …all of which are possible and in place in at least one other EPrints repository. We also touched upon our succession/emergency planning (i.e. how would the Library cope if and when the volume of Repository traffic outstrips our resource to deal with it: our “Plan X“.)
  4. RS updated us on the Kultivate project: there’s another workshop in London on Monday, 28 February; RS is still planning a meeting with the Faculty of Art, Architecture & Design. RS has issued her final reminder by mass email to academic staff, asking them to attend a Repository workshop or/and to get in touch to discuss depositing their items.
  5. BJ reported that all Repository records from the calendar years 2010/2011 (so far) are now identifiable to a quarter. (We need this level of specificity to produce our Quarterly Research Output Reports.) However, there’s still some confusion over exactly how we can construct date-limited queries in EPrints – BJ is going to ask on the eprints_tech and UKCoRR mailing lists to see if we can get a definitive answer.
  6. Now-quite-finally, I (PS) ran through a number of things I’m going to bring to the next Repository steering group: including technical developments and where we might need to take EPrints in the run-up to the REF, as well as improving the Repository’s presence on our corporate website. I’m also going to speak to the chair of the steering group (University Librarian, Ian Snowley) about the date of the next meeting.
  7. Did I mention it’s the UKCoRR meeting tomorrow?

Bassenthwaite morning reflection

List of UK public libraries with downloadable e-books (mashup)

Posted on February 21st, 2011 by Paul Stainthorp

This week, I spotted that my local public library service (Lincolnshire County Council) have launched an e-books service. Hooray for them – they’ve also recently upgraded all the PCs and introduced wifi in my local branch library.

With many local libraries being cut or placed under threat, and their technological relevance criticised (often ignorantly), even by the PM, it’s great to see investment going in to library technology in Lincolnshire.

The Lincolnshire county libraries e-books site is at: https://lincolnshire.libraryebooks.co.uk/

(It’s not obvious who provides this e-books platform, but it appears Warwickshire County Council—and possibly no-one else—has chosen the same provider.)

It got me wondering: how many UK public libraries currently provide an e-book download service?

To try and find out, I’ve created a (publicly-editable) Google spreadsheet wiki, containing the names of the 232 top-level local authorities in the UK, along with a column indicating whether or not they provide an e-book download service {1|0}, and columns for the URL and provider of that service.

At the time of writing, there are 48 public library e-book download services listed. If I’ve missed one that you know about, you can edit the spreadsheet yourself.

Screenshot of the public library downloadable e-books spreadsheet on Google Docs

I’ve then used a simple, 4-part Yahoo! Pipe to turn the CSV data output from that spreadsheet into an RSS feed containing only those councils that do provide downloadable e-books.

Screenshot of my public library e-book download Yahoo! Pipe

The finished RSS feed is at: http://bit.ly/e9U2GP

Screenshot of the RSS feed of public library e-book download services

Next, if I can remember my way round the GeoNames/Nearby.org.uk/Google Maps APIs, I’ll have a go at plotting the e-book-providing libraries on a map.