Posts Tagged ‘MARC’

Ebook URLs: bodge upon bodge upon bodge

Posted on October 3rd, 2012 by Paul Stainthorp

From the Oxford English Dictionary:

† bodge, v.

Etymology:  An altered form of botch v.1; compare grudge < grutch.
Obs. or dial. 1. trans. To patch or mend clumsily.

Chris Leach and I have had to bodge a fix for ebook URLs in our library catalogue, for a third time. I’m getting that feeling that we’ve bodged our way into a corner. (N.B. we’re going to upgrade Athens quite soon – I hope that once we can build our own WAYFless URLs to UK Federation-authenticated resources, on a *.lincoln.ac.uk root, we should be able to fix this problem ‘properly’. Until then…)

Here’s the problem and a list of our bodges to date:

We import MARC records for ebooks from Ingram’s MyiLibrary platform. They contain perfectly good, honest URLs (stored in MARC field 856$u), tweaked for Athens in the form (e.g.):

  • http://www.myilibrary.com?Ref=Athens&id=115106

Next, to make sure our users see the correct Athens login option for the University of Lincoln…
Screenshot from Athens

…and not a generic Athens username and password box (from where the user would have to click on “Alternative login” and generally go round the houses to proceed)…
Screenshot from MyiLibrary

…we use MARC field mapping feature in our LMS (SirsiDynix Horizon – a feature which operates not unlike the e-journals A-to-Z’s “proxy mask” tool) to prefix every URL stored in MARC 856$u with our standard Athens cookie-setting prefix URL (N.B. this prefix is applied to all ebooks in the catalogue–in fact, any URL in 856$u–not just MyiLibrary ebooks):

  • http://auth.athensams.net/setorg.php?id=LINCUNI&ath_returl=

This prefix combines with the contents of 856$u to give a compound URL, which is presented to the user as a hyperlink in user in HiP, our OPAC/web catalogue (e.g.):

  • http://auth.athensams.net/setorg.php?id=LINCUNI&ath_returl=http://www.myilibrary.com?Ref=Athens&id=115106

Problem #1 – that compound URL doesn’t work. It returns an Athens error (presumably because Athens can’t tell whether the variables at the end of the URL belong to auth.athensams.net, or to www.myilibrary.com).
Screenshot from Athens

Bodge #1 – To avoid this error, the second part of the compound URL ought to be %-encoded (the A-to-Z’s proxy mask feature allows for this using {startencode}{endencode} pseudotags, but the Horizon MARC field processor doesn’t have anything like this afaik). So, we changed our import processes/record specification for the MARC records we get from MyiLibrary, %-encoding the contents of 856$u:

  • http%3A%2F%2Fwww.myilibrary.com%3FRef%3DAthens%26id%3D115106

…giving a compound URL (including the field prefix) of:

  • http://auth.athensams.net/setorg.php?id=LINCUNI&ath_returl=http%3A%2F%2Fwww.myilibrary.com%3FRef%3DAthens%26id%3D115106

This worked fine for users accessing ebooks from HiP.

Problem #2 – didn’t occur until we started using Talis Aspire as reading list software. When a user bookmarked a catalogue record from HiP, the %-encoded contents of 856$u were causing an error. See explanation here.
Screenshot from Talis Aspire

Bodge #2 – to fix this Talis Aspire error, we downloaded all of our MyiLibrary MARC records (using an SQL query to identify every record where 856$u contained ‘myilibrary.com’) and used MarcEdit to partially undo the %-encoding of the URL, to give:

  • http://www.myilibrary.com%3FRef%3DAthens%26id%3D115106

…before re-uploading the doctored records into Horizon. This was enough to fool Talis Aspire into accepting the URL as valid, and as the reading lists prefix each online resource URL with a redirection URL of their (Talis’s) own, the net result is that users can link from a reading list to an ebook. (However, because the URL-as-stored-in-Aspire doesn’t contain the Athens cookie-setting prefix, some users will inevitably be sent to the wrong, generic Athens login page instead of the correct, University of Lincoln-specific one.)

Problem #3 – Most recently, when we started weekly exports of MARC records from Horizon into our new discovery service Find it at Lincoln (our name for the EBSCO Discovery Service) we discovered that the partial-%-encoding still wasn’t enough to produce a valid URL. Find it at Lincoln doesn’t prefix the ebook URLs in any way, and when users clicked on the ‘raw’, partially-encoded URL in a book record within the EBSCO service, they were getting a browser error.
Screenshot of a browser error

Bodge #3 – this is where it gets very messy. As a short-term fix to stop users seeing the browser error every time they tried to access a MyiLibrary ebook from within Find it at Lincoln, we again exported all 1,300 or so MyiLibrary-matching MARC records from Horizon, and again edited the 856$u URLs using MarcEdit.

This time, we added the Athens cookie-setting prefix to each MyiLibrary URL, before re-uploading. We also then ran a separate export of the same records to a .csv file, which makes it easy to do a visual/formula-driven inspection of all 1,300-odd records to make sure there aren’t any duplicates/oddities/crud. This is a useful trick we’ll be using again!

So, the contents of 856$u now look like:

  • http://auth.athensams.net/setorg.php?id=LINCUNI&ath_returl=http://www.myilibrary.com%3FRef%3DAthens%26id%3D115106

…as such, they should work fine in both the reading lists system, and in Find it at Lincoln (once the most recent weekly MARC dump has been processed by EBSCO). In HiP, however, they still get the MARC field prefix applied, and they end up with a double Athens prefix:

  • http://auth.athensams.net/setorg.php?id=LINCUNI&ath_returl=http://auth.athensams.net/setorg.php?id=LINCUNI&ath_returl=http://www.myilibrary.com%3FRef%3DAthens%26id%3D115106

This double-dose of Athens cookie-setting doesn’t seem to do any harm, although I do know that Athens throws a wobbly if a user is referred to an authentication point too many times in quick succession – so I’m wary of leaving things as they are.

There’s also the problem that other ebooks (on our other main platform, Dawsonera) are still being pulled into Find it at Lincoln and the reading lists without an Athens prefix, so unless users have already encountered an Athens institutional cookie, they’re getting the ‘wrong’ Athens authentication point. To get technical, they will see the HTML login form for users with an OpenAthens MD = Managed Directory account. Otherwise known—though it’s not a term approved of by Eduserv—as ‘classic Athens’. At Lincoln we only create classic Athens accounts (with usernames beginning hum_______) in an emergency.

We could perform the same trick with all our other ebook records (several tens of thousands of records, for Dawsonera and a few odds and sods): identify and download them, incorporate the Athens cookie-setting prefix within 856$u, re-upload them, and ditch the Horizon field prefix rule entirely. But: if and when we change our methods of authentication we’d have to process all the records all over again (though to be honest, we’re getting used to it…), and I’m loath to hard-code authentication ‘noise’ into our MARCs.

Other options: we could look at alternatives to Athens authentication (UK Federation or IP/EZproxy) in the case of MyiLibrary; we could speak to Ingram to see if there’s anything that can be done about their slightly odd Athens session behaviour, and/or we could just get on with setting up a new OpenAthens environment that allows us to create proper WAYFless URLs instead of using the cookie-setting method, which is itself a bit of a bodge. We could also see if it’s possible to add proxy-mask-style behaviour to links in EDS (Find it at Lincoln) and Talis Aspire.

For the time being, it’s holding together with sticky tape. Don’t breathe on it too hard.

Reading lists problem with unnecessarily encoded e-book URLs in Horizon: temporary fix

Posted on July 31st, 2012 by Paul Stainthorp

There’s a problem with a small number of e-book URLs in our library catalogue (held in MARC records, field 856$u, for some—but not all—e-books from Coutts MyiLibrary). For complex historical reasons, the normal URL, e.g.:

http://www.myilibrary.com?Ref=Athens&id=115106

Has been percent-encoded like this:

http%3A%2F%2Fwww.myilibrary.com%3FRef%3DAthens%26id%3D115106%0A

This causes an error (“Invalid Web Address”) when you try to import the details into “My Bookmarks” in the reading lists system:
Screenshot of an error message in Talis Aspire

We’re working to eradicate these unnecessarily-encoded URLs from the catalogue. In the meantime, here’s a temporary fix.

  1. Import the record into the reading lists system as normal using the bookmarklet.
  2. Before you click on “Create”, copy the e-book URL from the “Web address” field.
  3. Go to this website: http://meyerweb.com/eric/tools/dencoder/
  4. Paste the URL you copied into the large box on the screen, and hit “Decode”.
    Screenshot of the URL decoder
  5. Copy-and-paste the normal, decoded URL back into the reading lists system.
  6. Click on “Create” as normal and the e-book will be added to “My Bookmarks”.

Getting intimate with the YAZ client

Posted on May 31st, 2012 by Paul Stainthorp

Screenshot of the YAZ client

As part of our discovery and reading list projects, I’ve had to get even more familiar with Z39.50* as a method of retrieving MARC records from our Horizon catalogue.

(*I can hear the groans even now.)

I had been using the Library of Congress’s Z39.50 test search form along with (Basedow Information Systems’) Mercury client software to test our Z setup, but I was beginning to find that we needed more flexibility. Enter YAZ, a “programmers’ toolkit supporting the development of Z39.50/SRW/SRU clients and servers”.

YAZ comes with a(n initially scary but actually very useful) command-line client for dialling up databases over Z39.50, querying them, and displaying the results retrieved. Here are my notes on how to use it with our own catalogue.

open tcp:www.library.lincoln.ac.uk:210/lincoln

format 1.2.840.10003.5.102

find @attr 1=1016 “<search_phrase>

  • Searches the catalogue for <search_phrase> using attribute code number 1016 (“Any”), which is paired with our HiP keyword index .GW (General Keyword). Here’s a list of some Z39.50 attribute codes, and the University of Lincoln HiP indexes which which they correspond (N.B. we can tweak these using HiP admin):
    • 4 (Title) = .TW (Title Keyword)
    • 7 (ISBN) = ISBNEX (ISBN/ISSN Exact Match)
    • 8 (ISSN) = ISBNEX (ISBN/ISSN Exact Match)
    • 12 (Local number) = .BI (Bib#)
    • 27 (LC subject heading) = .SW (Subject Keyword)
    • 1003 (Author) = .AW (Author Keyword)
    • 1016 (Any) = .GW (General Keyword)

Boolean searches can be constructed by using @and, @or, etc. For example:

find @and @attr 1=4 “newspapers” @attr 1=1003 “Keeble”

~~~~~~~~~~

show

  • This displays the first record matching the most recent ‘find’ command. Repeating ‘show’ returns each subsequent record in turn.

close

  • Closes the connection to the Z server.

Couple of final links:

EBSCO information day / Discovery update

Posted on May 24th, 2012 by Paul Stainthorp

I was south of the river (Thames, not Witham) yesterday for an EBSCO information day. As I blogged recently, we’ve just signed up for the EBSCO Discovery Service (which we’re branding as “Find it @ Lincoln“). A couple of useful things came out of the event:

Elsewhere, Chris Leach and I have been making some changes to Horizon/HiP, to enable us to get our catalogue records and holdings represented in Find it @ Lincoln, as well as within our new reading lists system (more about which soon).

In particular:

  1. All MARC records in the catalogue now include the internal Horizon bibliographic record number, in field 999$a.
    Screenshot of a MARC record
  2. This MARC field has been mapped to a new searchable index in HiP (with the code .BI), e.g. http://www.library.lincoln.ac.uk/ipac20/ipac.jsp?index=.BI&term=134439
  3. Finally, records can now be retrieved by searching by this bib number over Z39.50. It’s also now possible to search for and retrieve records by ISBN/ISSN over Z39.50.

Links to Blackwell’s from the library catalogue (MARC 020 $a Considered Harmful)

Posted on September 26th, 2011 by Paul Stainthorp

We’ve added a link to Blackwell’s online bookshop to every book catalogue record on our catalogue.

Example here.

Screenshot of the library catalogue

Books bought via these links (or via http://bookshop.blackwell.co.uk/lincoln) attract a 5% discount, and free delivery on orders over £20. The Blackwell Connect shop at the University (located in the foyer of the GCW University Library) is open 10am-4pm Monday-Friday from Monday 19th September until Friday 11th November 2011.

Unfortunately, quite a few of the links don’t work, because the catalogue record contains additional, parenthetical trailing free text [examples: …(pbk.) …(hbk.) …(ebk.)] after the 10- or 13-digit ISBN itself, all within the MARC 020 $a field – and this additional trailing text breaks Blackwell’s URL structure.

This sort of thing might be standard cataloguing practice: but unfortunately it’s a practice that leads to unusable—and certainly not “MAchine-Readable“—data, especially in a field that contains such a useful unique ID. See the Robot Librarian’s blog post on why …(pbk.) drives him “absolutely batshit crazy“.

Without recataloguing or mass-MARC-editing every record in our collection, I’m not sure how we can fix this. Possibly some kind of Yahoo! Pipes hack is in order: honestly, I’m not sure I have the energy.

Here’s an example of a ‘bad’ link to Blackwell’s, as generated by our catalogue. And, just to prove it’s not just Blackwell that suffers, here’s a link to the same book in Amazon: again generated by HiP; again broken.

(Technical note: the links to Blackwell’s which appear on each catalogue record page are in the form “http://bookshop.blackwell.co.uk/jsp/id/{{x}}/?alumni=A1041″ …where {{x}} is the contents of the first MARC 020 $a field – which should be an ISBN of the book. The link will only appear if the MARC record contains an 020 field.)

Smartening up the catalogue for September

Posted on August 4th, 2011 by Paul Stainthorp

We’re making a few changes to the home page of our library catalogue in time for the new academic year. Changes include:

  • Reduced ‘tabset’ browsing to only the most important elements of the catalogue.
  • Use of the newest version of the University’s Minerva logo and colour scheme.
  • Home page used for ‘top 10′ (…ish) links to Library services elsewhere on the web – these are served up using an RSS feed via Feed2JS (so that we can display the same links in other environments such as Blackboard). All placed in one of HiP’s lovely XSL stylesheets.

Very many thanks to the new LNCD intern Jamie Mahoney for help with styling this!

Here’s the current, ‘old’ front page:

Screenshot of the old catalogue home page

And here’s the new, redesigned page – still in development!

Screenshot of the new catalogue home page

You can have a look at it, if you like, at:

This isn’t intended as a long-term solution for the question of the Library’s web presence. There’s a lot more we need to do to consolidate and simplify the information we present to users across different environments (open web, intranet/Portal, Blackboard VLE, etc.). But it’s a good short-to-medium-term fix which makes the most of the tools we have available at the moment, and recognises the value of establishing www.library.lincoln.ac.uk as the home of our ‘primary’ presence on the web. If nowt else, that’s the address we’re printing on our induction materials :-)

We also had to work out a way of testing this on one of our public-access OPAC kiosks. I was particularly proud of this little MARC hack which allowed us to navigate to the test version of the home page without having to use the browser navigation bar (which is disabled on the kiosks).

Notes on: WorldCat Local

Posted on June 17th, 2011 by Paul Stainthorp

WorldCat Local is a commercial ‘next-generation’ library resource discovery platform, produced by “the world’s largest library co-operative”, OCLC. Its tagline: “Single-search access to 800+ million items from your library and the world’s library collections

As of June 2011, it is capable of providing access to more than 1,400 databases through a single search interface, via a mixture of ‘centrally indexed’ content, and remote databases retrieved by z39.50. There’s a list of content sources on OCLC’s website.

Libraries that purchase WorldCat Local can then mesh their own library collections with WorldCat (adding to the whole), via a mixture of batch upload-then-nightly synchronisation with their traditional library catalogue, OAI-PMH import, and use of OCLC’s own e-resources knowledgebase tool (alone or in synchronisation with an existing knowledgebase).

Records include both bibliographic and ‘evaluative’ (e.g. ToCs, summaries, book cover image) content, links to detailed authority records on named individuals etc., as well as some social features (tagging/commenting). Users can create a WorldCat account and log in to build their own lists of content (with the possibility that these could be used as formal or informal reading lists).

Higher Education libraries in the UK using WorldCat Local include:

…though there are some more well-developed implementations in the USA: [1] [2] [3]

A few links about WorldCat Local:

New features coming soon include the ability to limit searches to ‘available full-text only’, as well as to ‘peer-reviewed articles only’, and a new periodicals A-Z listing tool.

More information on WorldCat Local at: http://www.oclc.org/worldcatlocal/

Three quarks for Muster MARC!

Posted on April 21st, 2011 by Paul Stainthorp

My esteemed, gracious and talented colleague Mr. Jackson is not happy.

He’s not happy because I’ve asked him to do something which he thinks is an awful, depressing, retrograde step. I’ve asked him to add a MARC export function to Jerome.

Nick’s argument in a nutshell (he won’t mind me paraphrasing):

  • MARC is awful: truly awful. It’s holding back humanity’s (and libraries’) progress. We shouldn’t be doing anything to prolong its life. #marcmustdie

My argument in a nutshell:

  • For better or worse, libraries still use MARC, and this will be a useful facility for libraries who want to consume our open data straight into their existing Library Management Systems.

What does the studio audience think? Should Jerome serve up MARC (actually, MARCXML. I’m not a monster.) because someone, somewhere might want to consume it, or should we take a stand and insist on providing only decent, sane data formats from now on?

For anyone who’s blissfully unaware of MARC (MAchine-Readable Cataloging) formats, read this. Then read this, this, and this. Then go and have a lie down in a darkened room.

I don’t love MARC. More than anything, I don’t really understand it (I have a cataloguer to do that for me). But it still has currency in libraries. #shouldmarcdie?

EMALINK reimagine the OPAC

Posted on November 25th, 2010 by Paul Stainthorp

Chris Leach and I took Jerome to Loughborough University yesterday (24 November 2010), to an EMALINK seminar on next-generation OPACs. Here’s a copy of our presentation slides.

It was a particularly useful event, especially so for being packed into 2½ hours (and worth learning to drive an automatic in order to get there!), with a presentation from Loughborough about their project to select a next-generation OPAC system; group discussions around some of the factors involved in launching such services; and our own contribution, which led to some interesting conversations about the benefits and risks of experimentation in libraries.

Jerome itself passed something of a milestone this week: having finally crawled its way round the whole of Lincoln’s catalogue, it now contains a full set of our MARC records (all 214,006 of them!); each work with its own stable, persistent URL (/work/<bibnumber>). Nick Jackson has also started to play around with pulling in additional data and services from external APIs (e.g., book cover images).

Screenshot of a Jerome work record

(Yes, there’s a problem with authors being attached to the wrong records. We’re on it. In fact, Jerome will self-heal its “leaky array” problem over the course of the next week.)

Bring it on home, Jerome

Posted on November 5th, 2010 by Paul Stainthorp

Our blue-skies library ‘un-project’ (which is still codenamed Jerome) took a significant step forward this week, as Nick Jackson has described on the Jerome blog. Thanks to some clever Horizon-wrangling code (courtesy of Dave Pattern at the University of Huddersfield), Jerome will soon provide searchable access to the whole library catalogue of the University of Lincoln ~ some 300,000 bibliographic records.

Then, hopefully, things will start to get interesting:

Our own catalogue MARC records aren’t the only sources of data that we’re throwing Jerome’s way. We’re also going to tell it to pull records from the Lincoln Repository, through the OAI-PMH* metadata-harvesting protocol. And, via the JournalTOCs API, we can give Jerome access to RSS feeds of the tables of contents for many of our full-text subscription and open access electronic journals. For all resources, we’ll then take a look at what open data and record-enrichment (e.g. book cover images) we can grab from elsewhere on the Web to bolster search results.

Hey presto: cross-collection metasearch; cheap and quick. This cross-collection search will be made available through a dedicated Jerome portal, a search API, and an iPad app.

Diagram of Jerome data inputs

Details of the Jerome API (***still very, very much in development***) are at: http://jerome.blogs.lincoln.ac.uk/api/

Also worth reading is Nick’s explanation about what we’ll do with these aggregated search results, once they’re in our clutches:

“Finally, our big new announcement for the next Really Cool And Epically Awesome bit of Jerome: the somewhat boringly named Relevancy Engine. This is something we’ve been toying with the notion of for a while, but we’ve finally worked out how to do it and how it fits into the big plan. In short, it will do its best to make sure that what you get at the top of your search results is exactly what you’re looking for. It takes variables such as the books you’ve borrowed in the past, how long they’ve been out for, which course you’re doing, what year you’re in, borrowing habits of others on your course, past borrowing trends, your physical location, how many books you currently have out, the time of day and even the weather (who wants to walk to the library when it’s raining?) and uses them to subtly adjust which resources we present to you at any given moment. If the library is closed, ebooks will drift up your search results. Everybody on your course borrowing a specific book? It’s a fair bet that’s what you want, even if there are more specific title matches for your search. Postgraduate student? You’re probably more interested in journals than a fresher. These variables wil all be taken into account along with our search weighting (how ‘close’ a given item is to what you searched for ) when we work out the search rankings.”

~~~

*OAI-PMH = the “Open Archives Initiative – Protocol for Metadata Harvesting“. No, really.