Posts Tagged ‘Horizon’

Ebook URLs: bodge upon bodge upon bodge

Posted on October 3rd, 2012 by Paul Stainthorp

From the Oxford English Dictionary:

† bodge, v.

Etymology:  An altered form of botch v.1; compare grudge < grutch.
Obs. or dial. 1. trans. To patch or mend clumsily.

Chris Leach and I have had to bodge a fix for ebook URLs in our library catalogue, for a third time. I’m getting that feeling that we’ve bodged our way into a corner. (N.B. we’re going to upgrade Athens quite soon – I hope that once we can build our own WAYFless URLs to UK Federation-authenticated resources, on a *.lincoln.ac.uk root, we should be able to fix this problem ‘properly’. Until then…)

Here’s the problem and a list of our bodges to date:

We import MARC records for ebooks from Ingram’s MyiLibrary platform. They contain perfectly good, honest URLs (stored in MARC field 856$u), tweaked for Athens in the form (e.g.):

  • http://www.myilibrary.com?Ref=Athens&id=115106

Next, to make sure our users see the correct Athens login option for the University of Lincoln…
Screenshot from Athens

…and not a generic Athens username and password box (from where the user would have to click on “Alternative login” and generally go round the houses to proceed)…
Screenshot from MyiLibrary

…we use MARC field mapping feature in our LMS (SirsiDynix Horizon – a feature which operates not unlike the e-journals A-to-Z’s “proxy mask” tool) to prefix every URL stored in MARC 856$u with our standard Athens cookie-setting prefix URL (N.B. this prefix is applied to all ebooks in the catalogue–in fact, any URL in 856$u–not just MyiLibrary ebooks):

  • http://auth.athensams.net/setorg.php?id=LINCUNI&ath_returl=

This prefix combines with the contents of 856$u to give a compound URL, which is presented to the user as a hyperlink in user in HiP, our OPAC/web catalogue (e.g.):

  • http://auth.athensams.net/setorg.php?id=LINCUNI&ath_returl=http://www.myilibrary.com?Ref=Athens&id=115106

Problem #1 – that compound URL doesn’t work. It returns an Athens error (presumably because Athens can’t tell whether the variables at the end of the URL belong to auth.athensams.net, or to www.myilibrary.com).
Screenshot from Athens

Bodge #1 – To avoid this error, the second part of the compound URL ought to be %-encoded (the A-to-Z’s proxy mask feature allows for this using {startencode}{endencode} pseudotags, but the Horizon MARC field processor doesn’t have anything like this afaik). So, we changed our import processes/record specification for the MARC records we get from MyiLibrary, %-encoding the contents of 856$u:

  • http%3A%2F%2Fwww.myilibrary.com%3FRef%3DAthens%26id%3D115106

…giving a compound URL (including the field prefix) of:

  • http://auth.athensams.net/setorg.php?id=LINCUNI&ath_returl=http%3A%2F%2Fwww.myilibrary.com%3FRef%3DAthens%26id%3D115106

This worked fine for users accessing ebooks from HiP.

Problem #2 – didn’t occur until we started using Talis Aspire as reading list software. When a user bookmarked a catalogue record from HiP, the %-encoded contents of 856$u were causing an error. See explanation here.
Screenshot from Talis Aspire

Bodge #2 – to fix this Talis Aspire error, we downloaded all of our MyiLibrary MARC records (using an SQL query to identify every record where 856$u contained ‘myilibrary.com’) and used MarcEdit to partially undo the %-encoding of the URL, to give:

  • http://www.myilibrary.com%3FRef%3DAthens%26id%3D115106

…before re-uploading the doctored records into Horizon. This was enough to fool Talis Aspire into accepting the URL as valid, and as the reading lists prefix each online resource URL with a redirection URL of their (Talis’s) own, the net result is that users can link from a reading list to an ebook. (However, because the URL-as-stored-in-Aspire doesn’t contain the Athens cookie-setting prefix, some users will inevitably be sent to the wrong, generic Athens login page instead of the correct, University of Lincoln-specific one.)

Problem #3 – Most recently, when we started weekly exports of MARC records from Horizon into our new discovery service Find it at Lincoln (our name for the EBSCO Discovery Service) we discovered that the partial-%-encoding still wasn’t enough to produce a valid URL. Find it at Lincoln doesn’t prefix the ebook URLs in any way, and when users clicked on the ‘raw’, partially-encoded URL in a book record within the EBSCO service, they were getting a browser error.
Screenshot of a browser error

Bodge #3 – this is where it gets very messy. As a short-term fix to stop users seeing the browser error every time they tried to access a MyiLibrary ebook from within Find it at Lincoln, we again exported all 1,300 or so MyiLibrary-matching MARC records from Horizon, and again edited the 856$u URLs using MarcEdit.

This time, we added the Athens cookie-setting prefix to each MyiLibrary URL, before re-uploading. We also then ran a separate export of the same records to a .csv file, which makes it easy to do a visual/formula-driven inspection of all 1,300-odd records to make sure there aren’t any duplicates/oddities/crud. This is a useful trick we’ll be using again!

So, the contents of 856$u now look like:

  • http://auth.athensams.net/setorg.php?id=LINCUNI&ath_returl=http://www.myilibrary.com%3FRef%3DAthens%26id%3D115106

…as such, they should work fine in both the reading lists system, and in Find it at Lincoln (once the most recent weekly MARC dump has been processed by EBSCO). In HiP, however, they still get the MARC field prefix applied, and they end up with a double Athens prefix:

  • http://auth.athensams.net/setorg.php?id=LINCUNI&ath_returl=http://auth.athensams.net/setorg.php?id=LINCUNI&ath_returl=http://www.myilibrary.com%3FRef%3DAthens%26id%3D115106

This double-dose of Athens cookie-setting doesn’t seem to do any harm, although I do know that Athens throws a wobbly if a user is referred to an authentication point too many times in quick succession – so I’m wary of leaving things as they are.

There’s also the problem that other ebooks (on our other main platform, Dawsonera) are still being pulled into Find it at Lincoln and the reading lists without an Athens prefix, so unless users have already encountered an Athens institutional cookie, they’re getting the ‘wrong’ Athens authentication point. To get technical, they will see the HTML login form for users with an OpenAthens MD = Managed Directory account. Otherwise known—though it’s not a term approved of by Eduserv—as ‘classic Athens’. At Lincoln we only create classic Athens accounts (with usernames beginning hum_______) in an emergency.

We could perform the same trick with all our other ebook records (several tens of thousands of records, for Dawsonera and a few odds and sods): identify and download them, incorporate the Athens cookie-setting prefix within 856$u, re-upload them, and ditch the Horizon field prefix rule entirely. But: if and when we change our methods of authentication we’d have to process all the records all over again (though to be honest, we’re getting used to it…), and I’m loath to hard-code authentication ‘noise’ into our MARCs.

Other options: we could look at alternatives to Athens authentication (UK Federation or IP/EZproxy) in the case of MyiLibrary; we could speak to Ingram to see if there’s anything that can be done about their slightly odd Athens session behaviour, and/or we could just get on with setting up a new OpenAthens environment that allows us to create proper WAYFless URLs instead of using the cookie-setting method, which is itself a bit of a bodge. We could also see if it’s possible to add proxy-mask-style behaviour to links in EDS (Find it at Lincoln) and Talis Aspire.

For the time being, it’s holding together with sticky tape. Don’t breathe on it too hard.

Reading lists problem with unnecessarily encoded e-book URLs in Horizon: temporary fix

Posted on July 31st, 2012 by Paul Stainthorp

There’s a problem with a small number of e-book URLs in our library catalogue (held in MARC records, field 856$u, for some—but not all—e-books from Coutts MyiLibrary). For complex historical reasons, the normal URL, e.g.:

http://www.myilibrary.com?Ref=Athens&id=115106

Has been percent-encoded like this:

http%3A%2F%2Fwww.myilibrary.com%3FRef%3DAthens%26id%3D115106%0A

This causes an error (“Invalid Web Address”) when you try to import the details into “My Bookmarks” in the reading lists system:
Screenshot of an error message in Talis Aspire

We’re working to eradicate these unnecessarily-encoded URLs from the catalogue. In the meantime, here’s a temporary fix.

  1. Import the record into the reading lists system as normal using the bookmarklet.
  2. Before you click on “Create”, copy the e-book URL from the “Web address” field.
  3. Go to this website: http://meyerweb.com/eric/tools/dencoder/
  4. Paste the URL you copied into the large box on the screen, and hit “Decode”.
    Screenshot of the URL decoder
  5. Copy-and-paste the normal, decoded URL back into the reading lists system.
  6. Click on “Create” as normal and the e-book will be added to “My Bookmarks”.

Getting intimate with the YAZ client

Posted on May 31st, 2012 by Paul Stainthorp

Screenshot of the YAZ client

As part of our discovery and reading list projects, I’ve had to get even more familiar with Z39.50* as a method of retrieving MARC records from our Horizon catalogue.

(*I can hear the groans even now.)

I had been using the Library of Congress’s Z39.50 test search form along with (Basedow Information Systems’) Mercury client software to test our Z setup, but I was beginning to find that we needed more flexibility. Enter YAZ, a “programmers’ toolkit supporting the development of Z39.50/SRW/SRU clients and servers”.

YAZ comes with a(n initially scary but actually very useful) command-line client for dialling up databases over Z39.50, querying them, and displaying the results retrieved. Here are my notes on how to use it with our own catalogue.

open tcp:www.library.lincoln.ac.uk:210/lincoln

format 1.2.840.10003.5.102

find @attr 1=1016 “<search_phrase>

  • Searches the catalogue for <search_phrase> using attribute code number 1016 (“Any”), which is paired with our HiP keyword index .GW (General Keyword). Here’s a list of some Z39.50 attribute codes, and the University of Lincoln HiP indexes which which they correspond (N.B. we can tweak these using HiP admin):
    • 4 (Title) = .TW (Title Keyword)
    • 7 (ISBN) = ISBNEX (ISBN/ISSN Exact Match)
    • 8 (ISSN) = ISBNEX (ISBN/ISSN Exact Match)
    • 12 (Local number) = .BI (Bib#)
    • 27 (LC subject heading) = .SW (Subject Keyword)
    • 1003 (Author) = .AW (Author Keyword)
    • 1016 (Any) = .GW (General Keyword)

Boolean searches can be constructed by using @and, @or, etc. For example:

find @and @attr 1=4 “newspapers” @attr 1=1003 “Keeble”

~~~~~~~~~~

show

  • This displays the first record matching the most recent ‘find’ command. Repeating ‘show’ returns each subsequent record in turn.

close

  • Closes the connection to the Z server.

Couple of final links:

1.8 million library loans from the University of Lincoln under CC0 – Copac Activity Data/SALT2 project

Posted on May 16th, 2012 by Paul Stainthorp

Today we published data on approximately 1.8 million items loaned from the University of Lincoln’s libraries since 2001. The data is available to re-use under a CC0 licence, and can be downloaded from:

We’ve done this as part of our involvement in the Copac Activity Data Project, a.k.a. SALT2. Along with data from the universities of Manchester, Sussex, Cambridge and Huddersfield, our circulation data will be used to power a ‘recommender API‘, which libraries will be able to use to build “People who borrowed X also borrowed Y“-type services. The API will benefit from the power of aggregated data from multiple institutions of different types, containing tens of millions of circulation events.

You’ll notice as well that we’ve chosen to host the data on our brand-new Orbital (v0.1) research data management application. Each dataset has a persistent citable URI. We’ll be keeping the data up-to-date, and generating a new activity data file from our library circulation logs shortly after the end of each academic year.

The data consists of a number of CSV files (one for each academic year since 2000-01, plus a huge file of all the data), containing the following fields:

Field index Field name Description
0 CREATE_DATE The date and time of the loan event, in the format: dd/mm/yyyy hh:mm
1 BORROWER_ID A cryptographic hash of the internal system ID associated with the borrower of the item, as used in the University of Lincoln’s library system.
2 WORK_ID A cryptographic hash of the internal system ID associated with the bibliographic work borrowed, as used in the University of Lincoln’s library system.
3 CONTROL_NUMBER The ISBN of the work borrowed (10 or 13 digits).
4 AUTHOR_DISPLAY The main author of the work borrowed.
5 TITLE_DISPLAY The title of the work.
6 PUB_DATE The publication year of the work in the form: yyyy

I’ll blog in detail another time about exactly how we created the data extracts. In short:

  1. There is a table in the SirsiDynix Horizon library management system called circ_tran which records every instance of item number X borrowed by user number Y at time Z. [#1]
  2. There is another table which provides a lookup between item numbers and the numbers of the bibliographic works of which they are a copy. [#2]
  3. Dave Pattern at the University of Huddersfield wrote a Perl script which scrapes all the bibliographic data (title, author, ISBN) for each work from our OPAC (Horizon Information Portal) and writes it to a text file. [#3]
  4. Developer, Jamie Mahoney of CERD/LNCD then stepped in, using some pretty heavy SQL on the original 3 data extracts, to:
    • Hash the internal Horizon user and work ID numbers to provide anonymity;
    • Convert the internal Horizon date and time stamps in extract [#1] from a version of Unix time into a readable datestamp (formula hint: cko_date*86400 + cko_time*60);
    • Used the item/work lookup table [#2] to pull in the bibliographic details for each loan in [#1] from the bibliographic table [#3] (an epic SQL JOIN query), removing items which are no longer represented in our library system;
    • Removed any items without an ISBN, which are of no use to the SALT recommender API;
    • Tweaked the punctuation and formatting;
    • Split the data into separate files for each year.

Once again, the data is at:

Thanks are due to Chris Leach and Dave Pattern for Horizon-fu, and to Jamie Mahoney for his patient wrangling of several millions of lines of data!

You can find out more about the Copac Activity Data Project/SALT2, at: http://copac.ac.uk/innovations/activity-data/

On the “Z” list

Posted on April 26th, 2012 by Paul Stainthorp

Tshirt "Bad Decision Mr Z"My colleagues in e-Library Services at Lincoln have been spending the last few weeks updating our Library Management System (LMS) – SirsiDynix Horizon. This work included upgrading from v7.34 to v7.51b of the Horizon software itself (and from v3.08 to v3.21 of our library catalogue HiP) as well as moving Horizon off an internal Lincoln server to external SaaS, and re-connecting all the associated systems (access control; Keystone; the 2CQR Lucid self-service touchscreen software, etc.).

We’ve also changed our connection details for remote searching of our library catalogue via the Z39.50 protocol. Our new Z39.50 URL is z3950s://www.library.lincoln.ac.uk:210/lincoln (replacing the old z3950s://194.80.48.4:210/horizon).

A couple of people on Twitter asked why I was bothering. Z39.50 is a national and international (ISO 23950) standard defining a protocol for computer-to-computer information retrieval – and is pretty much the definition of dinosaur library tech:

 

But secretly I ❤ Z39.50. Also, a few services listed below—most notably RefWorks—still make use of it. The full details of our new Z39.50 setup are:

And here’s a very short list of registries and services that list and/or make use of our Z39.50 profile. I’ll add to this list if any more come to light.

  1. Copac, the UK union catalogue, uses Z39.50 in order to include results from Lincoln in Copac “@yourlibrary” searches.
  2. IESR, the MIMAS-run free and machine-readable catalogue of electronic resources.
  3. RefWorks’ “Search Online Catalog or Database” feature uses Z39.50 to import results from our catalogue. (We also list a small number of e-databases that can be searched via Z39.50 in RefWorks – I wonder if anyone uses these?)
  4. The Library of Congress‘s Z39.50 gateway list of library catalogues accessible via Z39.50.
  5. The Z39.50 Target Directory (IRSpy) —Edit: 24 May 2012

For testing Z39.50 in the past, I have used the free-to-download Mercury Z39.50 Client from Basedow Information Systems. Other client software is available.

Library Impact Data Project: good news, everybody!

Posted on June 18th, 2011 by Paul Stainthorp

I think this is worth re-posting from the LIDP blog:

LIDP graphicWe are very pleased to report that we have now received all of the data from our partner organisations and have processed all but two already!

Early results are looking positive and our next step is to report back with a brief analysis to each institution. We are planning to give them our data and a general set of data so that they can compare and contrast. There have been some issues with the data, some of which has been described in previous blogs, however, we are confident we have enough to prove the hypothesis one way or another!

In our final project meeting in July we hope to make a decision on what form the data will take when released under an Open Data Commons Licence. If all the partners agree, we will release the data individually; otherwise we will release the general set for other to analyse further.

I submitted Lincoln’s data on 13 June. It consists of fully anonymised entries for 4,268 students who graduated from the University of Lincoln with a named award, at all levels of study, at the end of the academic year 2009/10 – along with a selection of their library activity over three* years (2007/08, 2008/09, 2009/10).

The library activity data represents:

  1. The number of library items (book loans etc.) issued to each student in each of the three years; taken from the circ_tran (“circulation transactions”, presumably) table within our SirsiDynix Horizon Library Management System (LMS). We also needed a copy of Horizon’s borrower table to associate each transaction with an identifiable student.
  2. The number of times each student visited our main GCW University Library, using their student ID card to pass through the Library’s access control gates in each of the three* years; taken directly from our ‘Sentry’ access control/turnstile system. These data apply only to the main GCW University Library: there is no access control at the University of Lincoln’s other four campus libraries, so many students have ’0′ for these data. Thanks are due to my colleague Dave Masterson from the Hull Campus Library, who came in early one day, well before any students arrived, in order to break in to the Sentry system and extract this data!
  3. The number of times each student was authenticated against an electronic resource via AthensDA; taken from our Portal server access logs. Although by no means all of our e-resources go via Athens, we’re relying on it as a sort of proxy for e-resource usage more generally. Thanks to Tim Simmonds of the Online Services Team (ICT) for recovering these logs from the UL data archive.

I had also hoped to provide numbers of PC/network logins for the same students for the same three years (as Huddersfield themselves have done), but this proved impossible. We do have network login data from 2007-, but while we can associate logins with PCs in the Library for our current PCs, we can’t say with any confidence whether a login to the network in 2007-2010 occurred within the Library or elsewhere: PCs have just been moved around too much in the last four years.

Student data itself—including the ‘primary key’ of the student account ID—was kindly supplied by our Registry department from the University’s QLS student records management system.

Once we’d gathered all these various datasets together, I prevailed upon Alex Bilbie to collate them into one huge .csv file: this he did by knocking up a quick SQL database on his laptop (he’s that kind of developer), rather than the laborious Excel-heavy approach using nested COUNTIF statements which would have been my solution. (I did have a go at this method—it clearly worked well for at least one of the other LIDP partners—but it my PC nearly melted under the strain.)

The final .csv data has gone to Huddersfield for analysis and a copy is lodged in our Repository for safe keeping. Once the agreement has been made to release the LIDP data under an open licence, I’ll make the Repository copy publicly accessible.

*N.B. In the end, there was no visitor data for the year 2007/08: the access control / visitor data for that year was missing for almost all students. This may correspond to a re-issuing of library access cards for all users around that time, or the data may be missing for some other reason.

Library catalogue: Site Search analytics

Posted on March 17th, 2011 by Paul Stainthorp

A while ago (and, as with all things Horizon, with the help of Dave Pattern at Huddersfield), we enabled Google Analytics on our library OPAC (sometimes referred to as HiP, the “Horizon Information Portal“). This takes the form of a piece of Google JavaScript which lives in a ‘footer’ document common to all HiP pages.

Chris Leach gave a presentation about using Google Analytics with HiP at the last SirsiDynix Horizon User Group.

Now Nick Jackson has shown me how to enable Google’s Site Search features on our Analytics profile for the library catalogue. Site Search will allow us to ‘tease out’ the search activity within the library catalogue itself, by analysing the URL structure of HiP queries, recognising and extracting the search terms, then tracking the paths users take from those search queries to destination pages (i.e., individual bibliographic record pages on HiP).

For instance: a typical HiP search query ends up looking something like this:

http://www.library.lincoln.ac.uk/ipac20/ipac.jsp?session=1G00362045UH4.101795&menu=search&aspect=subtab13&npp=10&ipp=20&spp=20&profile=ln&ri=&index=.GW&term=journalism&x=0&y=0&aspect=subtab13

By telling Google Site Search to look in the query parameter “term” for the search keyword(s)—in this case journalism—and to ignore the “session” parameter, Google Analytics can start to group similar queries together and provide us with data about what our users are searching the catalogue for.

Screenshot of the setup page for Google Site Search

It’s been running for less than 24 hours, but already we’re starting to see build up a record of the keywords people are typing into the catalogue:

Screenshot of Google Site Search top search terms on HiP

What could we [and what should we] do with this data? Are there any Google Site Search experts out there who could give me a few tips? If anyone from within the Library at the University of Lincoln would be interested in helping to analyse the search term data, please let me or Chris know.

One thing we’ve already discussed is the idea of using the HiP search term activity as test data to ‘teach’ the Jerome machine intelligence engine about the kind of things Lincoln library users are interested in… this will help us in determining how the Jerome API’s personalisation features might be used to present and relevance-rank results.

Total ReCal magazine article

Posted on November 10th, 2010 by Paul Stainthorp

This article appeared in the last University of Lincoln staff magazine (September 2010, issue 5, p.5). I’m representing the Library on the group managing this JISC project.

Total ReCal will help students to keep track

A new Joint Information Systems Committee (JISC) funded research project has been launched to merge calendars around the University.

Total ReCal, led by the staff from the Centre for Educational Research and Development, the Library, and ICT, will improve the student experience by collating data from systems across the support departments.

Joss Winn, Technology Officer, said: “All student calendars will comprise data from timetabling, Blackboard and the Libraryʼs Horizon system. One of the big motivations behind the project is that students have no easy way of finding out hand-in deadlines for assignments, being informed if the deadlines change, and seeing the deadlines marked on a calendar alongside academic timetables. This service will provide the solution.”

The combined data can also be used in a host of applications such as Google Calendars and Facebook. The software will be offered to the JISC community for use by other institutions when the project has been completed.

For further information visit http://totalrecal.blogs.lincoln.ac.uk

Anonymised library book circulation data for the academic year 2008/2009: collected for the JISC MOSAIC project

Posted on August 17th, 2010 by Paul Stainthorp

mosaic.2008.level1.1265378452.0000001.xml

The University of Lincoln collected one academic year’s worth of its own library book circulation data (“user activity data”) for the JISC-funded MOSAIC project, which set out to investigate the technical feasibility, service value and issues around exploiting user activity data. Data was collected for the period 1 September 2008 – 31 August 2009. Lincoln’s data was processed according to a data schema common to all participants in the MOSAIC project; any data that might be used to identify an individual library user was removed or anonymised.

View this item on the University Repository: http://eprints.lincoln.ac.uk/2164/