Posts Tagged ‘AthensDA’

Library Impact Data Project: good news, everybody!

Posted on June 18th, 2011 by Paul Stainthorp

I think this is worth re-posting from the LIDP blog:

LIDP graphicWe are very pleased to report that we have now received all of the data from our partner organisations and have processed all but two already!

Early results are looking positive and our next step is to report back with a brief analysis to each institution. We are planning to give them our data and a general set of data so that they can compare and contrast. There have been some issues with the data, some of which has been described in previous blogs, however, we are confident we have enough to prove the hypothesis one way or another!

In our final project meeting in July we hope to make a decision on what form the data will take when released under an Open Data Commons Licence. If all the partners agree, we will release the data individually; otherwise we will release the general set for other to analyse further.

I submitted Lincoln’s data on 13 June. It consists of fully anonymised entries for 4,268 students who graduated from the University of Lincoln with a named award, at all levels of study, at the end of the academic year 2009/10 – along with a selection of their library activity over three* years (2007/08, 2008/09, 2009/10).

The library activity data represents:

  1. The number of library items (book loans etc.) issued to each student in each of the three years; taken from the circ_tran (“circulation transactions”, presumably) table within our SirsiDynix Horizon Library Management System (LMS). We also needed a copy of Horizon’s borrower table to associate each transaction with an identifiable student.
  2. The number of times each student visited our main GCW University Library, using their student ID card to pass through the Library’s access control gates in each of the three* years; taken directly from our ‘Sentry’ access control/turnstile system. These data apply only to the main GCW University Library: there is no access control at the University of Lincoln’s other four campus libraries, so many students have ’0′ for these data. Thanks are due to my colleague Dave Masterson from the Hull Campus Library, who came in early one day, well before any students arrived, in order to break in to the Sentry system and extract this data!
  3. The number of times each student was authenticated against an electronic resource via AthensDA; taken from our Portal server access logs. Although by no means all of our e-resources go via Athens, we’re relying on it as a sort of proxy for e-resource usage more generally. Thanks to Tim Simmonds of the Online Services Team (ICT) for recovering these logs from the UL data archive.

I had also hoped to provide numbers of PC/network logins for the same students for the same three years (as Huddersfield themselves have done), but this proved impossible. We do have network login data from 2007-, but while we can associate logins with PCs in the Library for our current PCs, we can’t say with any confidence whether a login to the network in 2007-2010 occurred within the Library or elsewhere: PCs have just been moved around too much in the last four years.

Student data itself—including the ‘primary key’ of the student account ID—was kindly supplied by our Registry department from the University’s QLS student records management system.

Once we’d gathered all these various datasets together, I prevailed upon Alex Bilbie to collate them into one huge .csv file: this he did by knocking up a quick SQL database on his laptop (he’s that kind of developer), rather than the laborious Excel-heavy approach using nested COUNTIF statements which would have been my solution. (I did have a go at this method—it clearly worked well for at least one of the other LIDP partners—but it my PC nearly melted under the strain.)

The final .csv data has gone to Huddersfield for analysis and a copy is lodged in our Repository for safe keeping. Once the agreement has been made to release the LIDP data under an open licence, I’ll make the Repository copy publicly accessible.

*N.B. In the end, there was no visitor data for the year 2007/08: the access control / visitor data for that year was missing for almost all students. This may correspond to a re-issuing of library access cards for all users around that time, or the data may be missing for some other reason.

The joy of e-resource authentication (warning: may contain sarcasm, hyperbole, and self-indulgent whining)

Posted on May 18th, 2011 by Paul Stainthorp

(Alternative title: why I’m going bald.)

Managing the authentication of university students and staff to electronic library resources is an awful, awful pain and I wish it would all just disappear. There, I’ve said it.

My line manager (Deputy Librarian: Academic & Technical Services, Lys Ann Reiners) is very keen for me to involve as many Library staff as possible in managing our authentication régime. For most aspects of my job, I’m more than happy to spread the love around (I don’t agree with keeping knowledge—or extra work—to myself), but when it comes to authentication, I feel guilty even asking my colleagues for help, in case I expose them to some kind of toxic authentication ju-ju death rays.

I realise that I can’t shy away from explaining things merely because they’re confusing and depressing. We can only make sense of authentication for our users once I’ve stopped putting my fingers in my ears and hoping it’ll all just go away. We have already made a start on documenting the mess, but it was these [Nicole Harris] two [Dave Pattern] recent blog posts which have spurred me into writing this, in the spirit of catharsis:

Authentication to e-resources at the University of Lincoln

A trilogy. A tragedy. A travesty.

1. IP authentication (for on-campus users)

  • Some (but not all) of the 150+ e-resource publishers/providers with which we have a relationship have the University’s IP ranges on file. This allows people using on-campus computers to seamlessly access restricted content (e.g. full text) from those providers’ sites.
  • But we have no real procedure for keeping those IP ranges up to date – i.e., of informing providers of any changes. I’ve asked my colleague (Library (E-resources) AssistantElif Varol to do something about this.
  • In particular, we now have single ‘apparent’ external IP addresses associated 1:1 with each University building. This should mean we can [a] simplify the information we give to providers, and [b] associate usage with particular buildings.
  • So far, so simple. But the fact that on-campus authentication is so seamless (as far as our users are concerned – they needn’t even know it’s there!) does cause a problem when those same users try and access the same resource from off campus and don’t get the same seamless access.
  • Also, University ICT services occasionally look worried when I tell them about IP authentication. They just aren’t comfortable that I pass on details of our IP ranges to third parties.
  • For resources where only IP-based authentication is available, in order to provide off-campus access we make use of a CGIProxy-based application which we call ‘LibResProxy’ (see part 3, below), with mixed success.

2. [Open]Athens and “Shibboleth” (but not really)

Deep breath:

  • We are members of the UK Access Management Federation. Our nominated, outsourced Identity Provider (IdP) is Eduserv, to whom we pay an annual subscription. This means we can use their product, OpenAthens (often just referred to as “Athens”), to provide local authentication (via University Portal login using network\accountID) to both ‘traditional’ Athens-protected resources and to resources which have abandoned Athens in favour of true federated access (which lots of people refer to as “Shibboleth“, even though that’s not really the correct terminology). The Eduserv software we’re running on the Portal is called ‘AthensDA’; we probably ought to upgrade this to a newer version called ‘OpenAthensLA 2.0‘, but we haven’t really discussed it yet.
  • As far as the user is concerned, this means we can create a link to an e-resource which will work both on- and off-campus. These URLs are generally in the form: http://auth.athensams.net/setorg.php?id=LINCUNI&ath_returl=XXXXXXX, where the first part of the URL sets an Athens ‘preferred organisation’ cookie, associating the user’s computer with the University of Lincoln, and “XXXXXXX” is the percent-encoded URL of either: [a] the defined Athens authentication point for resources that use the ‘old’, traditional Athens protocol (these have to be activated first—”cascaded to permission sets” in Eduserv terminology—by an administrator); or [b] a WAYFless URL for a resource which uses the ‘new’ federated access. The format of this last category of WAYFless URLs are unpredictable and very difficult to build, and for some resources can’t be created at all, leaving the user with no choice but to navigate a horrible “Where Are You From?” form where they have to select their institution from a list before they’re allowed to log in.
  • What the user sees when they click on this link is a blue-and-orange login page with a link to ‘Go to the University of Lincoln login page »‘. Clicking on that link displays a pop-up http login box (unless they are on campus using IE, in which case they’re logged in automatically), in which the user must enter some variation of network\accountID and their University network password. This is highly variable, depending on the user’s operating system and browser.
    Screenshot of the OpenAthens login point
  • This is fine for situations where we can control exactly where the user is going and what links they are clicking on, and where we have a chance to set the Athens cookie: this happy state of affairs applies to the University Portal, and almost nowhere else; certainly not to the open web and users coming via Google Scholar.
  • Problems: and they are legion:
    1. We’ve not been systematic about migrating resources from the ‘old’ Athens login to the ‘new’ federated access. (We deliberately didn’t want to stop using ‘old’ Athens links to resources if they were working. If it ain’t broke…) For the user, there’s no difference between the two, hence the lack of urgency – for the Library, it’s become rather confused and difficult to manage.
    2. If, for whatever reason, the user doesn’t end up with (or loses) the Athens cookie which sets their preferred organisation, then they don’t see the link to ’Go to the University of Lincoln login page »‘, and instead have to follow the rigmarole of setting their preferred institution again. Needless to say, most students and staff are entirely mystified by this arcane process.
    3. Related to point 2: a students or member of staff who has a relationship with more than one UK institution (e.g. two universities/colleges, or a university and the NHS) tend to run into problems, because you can’t easily have two Athens ‘preferred organisation’ cookies set at the same time on the same machine. I know, I know: it doesn’t sound very “federated”, does it?
    4. Sometimes… it …Just. Doesn’t. Work. (Because of pop-up blockers, trusted sites, peculiarities of various versions of Windows, bugs in Google Chrome, leaves on the line, etc.) When this happens—when we can’t solve the problem—and when the user is getting very frustrated, I have to grit my teeth and generate a separate, “classic“ hum————— Athens username and password for that user. This gets around the access problem in the short term, but tends only to increase confusion in the longer term.
    5. Finally, and most frustratingly: all of this is completely blown out of the water if the user encounters a resource (a journal article, say) on the open web: via Google, or even via our own Electronic Journals A-to-Z. They don’t automatically see the OpenAthens login point, so they have to hunt down a link to “login to Athens here” (or similar). Each provider deals with this differently, so a user can’t necessarily apply what they’ve learnt from one resource to any other. Some providers (‘SPs’ in access-speak) allow libraries to construct complex ‘masked’ deep-linking authentication URLs. These make it easier for us to automate the login process from the A-to-Z to an individual journal. Others just don’t work that way – so we write help guides instead. Eduserv have a web page about creating deep links for authentication.
  • If you’re not utterly, hopelessly confused by all of the above, then I bow down to your machine-like intelligence.

3. The grab-bag approach: everything else

  • For e-resources that don’t work with OpenAthens, we have a number of tricks of last resort. Some of these tricks have been built for us by Tim Simmonds of the Online Services Team (ICT). When they work, they’re brilliant. But we have no control over whether they’ll work or not with a particular resource. They tend to use the Portal-esque network\accountID and password as login, which is at least consistent with OpenAthens.
  • This includes our form capture tool, which we use to create ‘faked’ URLs for resources that have their own username and password (in effect, it pastes the login details into an HTML login form on the user’s behalf and hides the authentication from public view). The popular business database Factiva works like this.
  • It also includes LibResProxy, which provides off-campus access to certain (IP-authenticated) library e-resources. We fall back on it where no other method of off-campus authentication exists. It’s a bit hit-and-miss whether it will work with any given website, depending greatly on how the site is constructed and particularly on how heavily the site makes use of scripts (e..g JavaScript) rather than ‘vanilla’ HTML: for instance, it’s fine with the ACM Digital Library, but spits its dummy out over the IEEE Computer Society Digital Library.
  • Last of all – if all else fails, we give a username and password out to the student and tell them to get on with it. We change these passwords once a year as a security measure.

4. Whatchagonnadoaboutit?

We can’t go on living like this. In a future blog post, I’m going to map out a possible way forward for authentication. It’ll probably involve thinking about some of the plans my colleagues in ICT have for single-sign on and OAuth, and what those plans mean in a library context.