Posts Tagged ‘resource discovery’

Setting the time (some CLOCK project admin)

Posted on February 2nd, 2012 by Paul Stainthorp

Some notes from a phone chat with Andy McGregor (JISC Discovery programme manager) about CLOCK:

  1. Just as we did for Jerome, we’ll be using the CLOCK project blog for all reporting to JISC (as well as for blog posts about the work of the project itself):
    • List of required blog post headings here
  2. We also need to produce a Project Plan, based broadly on our original proposal:
    • Required headings for the Project Plan here
  3. (As project manager) I’ll also be emailing Andy once a month with a quick update on progress;
  4. There are nine other projects in the Discovery phase two programme, plus CLOCK:
    • List of projects here
    • There’s also a mailing list for the programme
  5. The next programme meeting will take place w/c 16 April 2012, in Birmingham:
    • List of programme meetings here
  6. As in phase one, consultants will be preparing case studies on the various projects (CLOCK included) for the benefit of the wider Discovery programme.

Related: we’re planning to hold our first project team meeting on 14 February 2012. To spread the burden of travel equally, we’re going to hold it in a location convenient for Lincoln, Cambridge and the West Midlands…

Peterborough

Discovery phase two: programme launch (slides)

Posted on February 2nd, 2012 by Paul Stainthorp

JISC formally launched phase two of the Information and library infrastructure: Resource discovery programme on 11 January 2012 in Birmingham. CLOCK weren’t able to attend in person, but we sent these slides in our absence. They’re good for a quick overview of the aims of the CLOCK project.

Tick tock we don’t stop. Introducing CLOCK, a new JISC-funded resource discovery project at the universities of Lincoln and Cambridge

Posted on December 10th, 2011 by Paul Stainthorp

Cambridge CLOCKThe title says it all, really. The University of Lincoln, working in consortium with Cambridge University Library and Owen Stephens Consulting, has been awarded £49,877 by JISC to investigate ways of driving innovation in libraries’ interactions with Open Bibliographic Data, through a project we’re calling CLOCK (Cambridge-Lincoln Open Catalogue Knowledgebase).

CLOCK is a continuation of and elaboration upon the work of two recent JISC Discovery projects—Jerome at the University of Lincoln and COMET at the University of Cambridge—via a programme of development work shared between the two institutions, and with library consultant Owen Stephens. JISC were impressed enough with the work of both projects, and sufficiently interested in the potential for collaboration, that they encouraged our joint bid for follow-up funding.

Between now and the end of July, 2012, the CLOCK project will provide us with a framework to:

…[1] exploit through real-world applications the significant amount of data released openly by Cambridge University Library; [2] apply the Jerome database architecture, iterative development methodology, and API framework to a bibliographic dataset an order of magnitude greater than the University of Lincoln’s; and [3] to build and enable a new set of tools and demonstrator services which will enable the future development of public Open Bib Data web applications of practical utility to libraries and end-users.

You can read the full bid document, here.

I’m very much looking forward to working with Ed Chamberlain, Systems Librarian in the University Library at the University of Cambridge, along with Owen Stephens, veteran of a number of campaigns to open up access to library data, and Chris Leach (Systems Librarian) and Ian Snowley (University Librarian) from the University of Lincoln. Thanks are due to all of them for their help in writing the successful bid; to the Research & Enterprise Development office at Lincoln for their invaluable assistance in putting together the project budget; and to the LNCD group at the University of Lincoln for providing the kind of supportive development platform that makes these kind of projects possible.

Finally, a big thank you to Andy McGregor and the JISC Digital Infrastructure: Information and library infrastructure: Resource discovery programme, for this opportunity to further explore the blossoming environment of open bibliographic data/open discovery in libraries. If you haven’t done so already, you might like to take a look at the following websites:

As with all our projects, we’ll be blogging it comprehensively (so stand by for a steady stream of awful clock-related puns used as blog post titles). Although there’s little to see there yet, the CLOCK project blog is at: http://clock.blogs.lincoln.ac.uk/ – along with its own RSS feed RSS feed icon. Watch that space!

Notes on: EBSCO Discovery Service (EDS)

Posted on July 22nd, 2011 by Paul Stainthorp

The EBSCO Discovery Service is EBSCO’s own next-generation resource discovery system, built on the already-very-familiar EBSCOhost database platform.

EBSCO’s particular ‘angle‘ for EDS is that its content is built up out of a lot of high-quality, ‘scholarly’, subject-indexed content (similar to the individual bibliographic databases on EBSCOhost), which they are keen to push as superior to basic ‘Google-type’ keyword-indexed searching, where the quality-assured, ‘information literacy’ aspect to resource discovery may not be as strong.

(Enough scare quotes for ya?)

Features of EDS:

  • Highly customisable/’brandable’ – logos, colours, background images, text/field labels;
  • Uses the same administrative interface (for back-end configuration) as EBSCOhost;
  • Integrates with EBSCO Electronic Journals A-to-Z and LinkSource (i.e. Find it @ Lincoln) for access to full text via OpenURL;
  • Harvests MARC records from local catalogue, and repository etc. records (via OAI-PMH, presumably, although I forgot to ask);
  • Content: as well as the library’s own local collections (above), EDS searches a central EBSCO ‘base index’ of content/metadata from ~20,000 providers, plus content from those EBSCOhost databases to which the library subscribes; it also contains a lot of enhanced book metadata (cover images, subject headings, reviews, etc.). See EBSCO’s website.
  • It’s possible to set up a public, ‘guest’ version of EDS to search catalogue, repository, and the main EBSCO index – then allow your own users to log in and search the more complete content including subscription databases (though EBSCO suggest that few libraries actually provide guest search in practice, despite asking for it to be made possible!); it’s also possible to use EDS to create custom search interfaces for groups of packages/databases (or even for individual databases) – e.g. subject clusters;
  • Users can extend their search out to remote databases (i.e. those not included in EBSCO’s central base index + local databases) via a traditional metasearch facility (related: EBSCOhost Integrated Search);
  • It’s possible to limit the default search to full-text items only (making use of the coverage information held in the A-to-Z/LinkSource knowledgebase) – however EBSCO advise that most subscribing libraries don’t do this – instead starting their users off with searches of the complete EDS collection, then later on allowing users to narrow the search results down to full-text-only, if they want to;
  • Various APIs, HTML widgets, and other extension tools available through an ‘EBSCOhost Integration Toolkit’ (http://support.ebscohost.com/eit/) – N.B. some of these can also be used with the existing EBSCOhost databases;
  • Developer community of library people extending and customising EDS – example blog posts here and here;
  • While the advanced search options and user interface are highly configurable, there’s no facility to adjust the search ranking algorithms – i.e. the relative placing of items/collections against each other in search results (as is possible in e.g. Ex Libris Primo);
  • FRBRising of search results will be introduced in 2012;
  • EBSCO will offer libraries free trial access to EDS, including MARC record harvest where possible.

UK HE libraries using EDS include:

Notes on: Ex Libris Primo

Posted on July 8th, 2011 by Paul Stainthorp

Primo is library software group Ex Libris‘s umbrella, “one-stop solution for the discovery and delivery of local and remote resources, such as books, journal articles, and digital objects.” It’s used by around 20 institutions in the UK, and ~800 worldwide.

Information about Primo is available at: http://www.exlibrisgroup.com/category/PrimoOverview

A couple of other useful links:

  • Slides – redacted for confidentiality
  • Discovery‘ on the SCONUL Higher Education Library Technology (HELibTech) wiki

The development of Primo marked a move away from the existing, Z39.50-intensive, metasearch model of unified resource discovery, to the use of a hosted, central metadata index of scholarly content (Ex Libris call this the Primo Central Index), characterised by unified discovery & delivery; faceted navigation; and usage-based recommendation.

Primo features include:

  • Import of local data data sources (catalogues; repositories) to a standardised XML format to allow cross-collection searching;
  • Ranking of printed, electronic and locally born-digital or digitised content, configurable by the subscribing library;
  • Integration with the OPAC – stronger integration for libraries that use one of Ex Libris’s own Library Management Systems; less-tight integration is possible for ‘foreign’ OPACs;
  • Integration with Ex Libris’s bX usage-based journal article recommendation service, which derives recommendations from the ‘user journey’ from article-to-article;
  • FRBRised grouping of similar titles in search results;
  • Facets derived from both the Primo Central Index and from locally-harvested data: for example, a facet could be configured to allow users to limit a search to only those items which are available in the OPAC;
  • Tools to embed the Primo search box in remote web sites (VLE, intranet, etc.);
  • An ‘open’ platform for development (including a suite of Primo APIs) – the EL Commons;
  • A mobile-friendly UI (e.g. this example from Germany).

Higher Education libraries in the UK using Primo include:

…and outside the UK:

Ex Libris are also developing Alma – which does for the ‘back end’ of library systems architecture what Primo does for the front end discovery UI – i.e. provides ‘umbrella’, unified management of print, electronic, and digitised/digital resources in the one system. In the UK, the University of York are ‘early adopters’ of Alma. Information about Alma is available at: http://www.exlibrisgroup.com/category/AlmaOverview

Notes on: WorldCat Local

Posted on June 17th, 2011 by Paul Stainthorp

WorldCat Local is a commercial ‘next-generation’ library resource discovery platform, produced by “the world’s largest library co-operative”, OCLC. Its tagline: “Single-search access to 800+ million items from your library and the world’s library collections

As of June 2011, it is capable of providing access to more than 1,400 databases through a single search interface, via a mixture of ‘centrally indexed’ content, and remote databases retrieved by z39.50. There’s a list of content sources on OCLC’s website.

Libraries that purchase WorldCat Local can then mesh their own library collections with WorldCat (adding to the whole), via a mixture of batch upload-then-nightly synchronisation with their traditional library catalogue, OAI-PMH import, and use of OCLC’s own e-resources knowledgebase tool (alone or in synchronisation with an existing knowledgebase).

Records include both bibliographic and ‘evaluative’ (e.g. ToCs, summaries, book cover image) content, links to detailed authority records on named individuals etc., as well as some social features (tagging/commenting). Users can create a WorldCat account and log in to build their own lists of content (with the possibility that these could be used as formal or informal reading lists).

Higher Education libraries in the UK using WorldCat Local include:

…though there are some more well-developed implementations in the USA: [1] [2] [3]

A few links about WorldCat Local:

New features coming soon include the ability to limit searches to ‘available full-text only’, as well as to ‘peer-reviewed articles only’, and a new periodicals A-Z listing tool.

More information on WorldCat Local at: http://www.oclc.org/worldcatlocal/

How commercial next-generation library discovery tools have *nearly* got it right

Posted on May 17th, 2011 by Paul Stainthorp

In Huddersfield (again – I’m barely away from the place!), yesterday, at a CILIP UC&R (University, College and Research Group) Yorkshire & Humberside [catchy name] training event on ‘Discovering Discovery Tools‘. Librarians from four different UK universities gave practical, pros-and-cons descriptions of how they implemented and are now running four different commercial next-gen resource-discovery tools:

Five (count ‘em!) people from Lincoln were in the audience. I was wearing two hats: one for project Jerome for thinking about design concepts in resource discovery tools; the other for my day job – Lincoln is in the middle of a strategic review of Library ICT systems, which may well end up recommending that we buy one of these products.

It was all good stuff. First off, libraries need to hear the honest, warts and all counterpoint to the glowing terms in which each discovery product is described by its vendor. Secondly, it’s useful to subject all four* resource discovery platforms to the same amount of daylight, and see where the common problems lie, as well as where one tool outperforms another. Thirdly—and even though there’s a lot of resource discovery hyperbole to be heard—this is still a big shift for academic libraries, and I think we should discuss implications that are wider than the costs/benefits for an individual institution.

(*Yes, I know there are a few other tools. But they weren’t in the room yesterday.)

Lockside
What’s stopping us? (Canal lock gate at the University of Huddersfield.)

Things that jumped out at me:

Commercial resource discovery has reached a level of maturity that was absent a couple of years ago. That’s not to say that all next-gen resource discovery tools are perfect (because they aren’t), or that there aren’t any problems (because there are; see below), but academic libraries do now have a genuine choice between several different, viable commercial products.

Here’s a heresy: the differences between these four products are not that significant. I think that anyone who went away from yesterday’s event thinking that out of the four discovery tools on display there are some ‘good’ and some ‘bad’ …is probably wrong. It’s not really about the product, it’s about the willingness of the vendor to overcome problems, and about their attitude to their customers. Do you buy a slightly-less slick product, but from a company you feel you can have a more productive relationship with?

In fact, most of the real problems with resource discovery seem to be common to all four of the products on show yesterday. De-duping via FRBR reckons to be a bit of an Achilles’ heel. (A shame. FRBRisation is one of those things you either need to get right, or not do at all. A half-arsed attempt is worse than not bothering.)

Also broken: known-item search. This ought to be trivial to fix, and it needs to be sorted now now now.  I find it particularly sinister that some commercial resource-discovery tools rank their search results according to secret, proprietary algorithms that can’t be inspected or challenged by their users, let alone altered/improved. This is a problem. What’s the point of a library that can’t justify how its resource discovery system actually works? Are we just here to sign the cheques?

Libraries still have a tendency to overcomplicate things for their users. Sometimes they do this because they have no choice (perhaps their shiny new discovery tool doesn’t quite work they way it should); but often they seem just too ready to accept a situation where users are inconvenienced sooner than address an underlying problem. Lincoln included in this sweeping generalisation.

There’s no point pretending that a library can make two independent decisions to purchase [a] a next-gen resource discovery platform, and [b] a journals knowledgebase/link resolver. The two things are all tied up together. To pick a random example: you want Summon, you’d better want 360.

Why can’t we just buy access to a search index? If I want to pay to provide my users with the benefits of a lovely big central index of content, why do I have to buy into your discovery algorithm and web front-end as well? (Whither JISC collections?)

Related, and finally – we really shouldn’t have to replace our search and discovery interfaces every time we want/need to use a different content provider, and we shouldn’t be placed in the situation of having to make collection/subscription decisions in order to ‘feed’ our discovery tool. It may be temptingly easy, cost aside, to pick up and put down different next-gen discovery products (“…it’s just a subscription!”) but there’s too much at stake for our users.

An elastic bucket down the data well (#rdtf in Manchester)

Posted on April 20th, 2011 by Paul Stainthorp

I was in Manchester on Monday for Opening Data – Opening Doors, a one-day “advocacy workshop” hosted by JISC and RLUK under their Resource Discovery Taskforce (#rdtf) programme. I delivered a five-minute ‘personal pitch’ about Jerome, open data, and the rapid-development ethos that’s developing at Lincoln.

Ken Chad is writing up a report from the day and Helen Harrop is producing a blog, both of which will be signposted from the website: http://rdtf.mimas.ac.uk/

The big data question

All the presentations can be viewed on slideshare, but there were some particular moments that I think are worth picking out:

The JISC deputy, Prof. David Baker was first up. His presentation, ‘A Vision for Resource Discovery‘ should be compulsory reading for university librarians. See, in particular, slides #6 (guiding principles of the RDTF), #8 (a future state of the art by 2012), and #11 (key themes).

Slide from David Baker's presentation Slide from David Baker's presentation Slide from David Baker's presentation

Following this introduction, there were three ‘perspectives’, short presentations “reflecting on the real world motivations and efforts involved in opening up bibliographic, archival and museums data to the wider world”: from the National Maritime Museum, the National Archives

…and from Ed Chamberlain of (Jerome’s ‘sister project‘) COMET (Cambridge Open METadata), the perspective from Cambridge University Library on opening up access to their non-inconsiderable bibliographic data. N.B. slides #4 (what does COMET entail?), #9 (licensing) and—more than anything else—slide #16 (“beyond bibliography”).

Slide from Ed Chamberlain's presentation Slide from Ed Chamberlain's presentation Slide from Ed Chamberlain's presentation

The first breakout/discussion session which I sat in on looked at technical and licencing constraints to opening up access to [bib] data. This was the point at which the tortured business metaphors started to pile up. ‘Buckets’ of data. ‘Elastic’ buckets that can expand to include any kind of data. And (my personal contribution, continuing the wet theme): data often exist at the bottom of a ‘well’. Just because a well is open at the top, it doesn’t necessarily make it easy to get the water out! You need another kind of bucket – a service bucket that makes it possible to extract and make use of the water. Sorry, data. What were we talking about again?

Then a series of 5-minute ‘personal pitches’, including mine just after lunch. I didn’t use slides, but I’m typing up my handwritten notes on Google Docs and I’ll post them as a separate blog post when I get a chance.

David Kay (SERO), Paul Miller (Cloud of Data) and Owen Stephens delivered the meat of the afternoon session in their presentation, ‘The Open Bibliographic Data Guide – Preparing to eat the elephant‘. The website containing the Open Bib Data Guide (which has not been formally launched until now) can be found at: http://obd.jisc.ac.uk/

The site itself is going to be invaluable in hand-holding and guiding institutions through the possibilities in opening up access to their own bibliographic data (OBD). Slides from the presentation that are particularly worth noting are #8 (which shows the colour-coding used to distinguish the different OBD use-cases) and #14 (examples of existing OBD).

Slide from the OBD presentation Slide from the OBD presentation

Paul Walk’s presentation, ‘Technical standards & the RDTF Vision: some considerations‘, is the source of the slide which I photographed (at the top of this blog post). Paul talked about ‘safe bets’; aspects of the Web that we can rely on playing a part in allowing us to create a distributed environment for resource discovery: including “ROASOADOA” (Resource- / Service- / Data-Oriented Architecture), persistent identifiers, and a RESTful approach. See also this blog post.

In the second breakout/discussion session, we discussed technical approaches. One of the themes which we kept coming back to was that of two approaches (encapsulated by Paul’s slide) which—while not mutually exclusive—may require different business cases or different explanations in order to be taken up by institutions. We characterised the two approaches as:

  • Raw open data vs Data services
  • Triple store vs RESTful APIs
  • Jerome vs COMET (bit of a caricature, this one, but not entirely unjustified!)

I was gratified that Lincoln’s approach to rapid development and provision of open services was also referred to in non-ungratifying terms, as a model which could be valuable for the HE sector as a whole.

Finally, we heard what’s next for the #rdtf programme. It’s going to be rebranded as ‘Discovery‘ and formally re-launched under the new name at another event: ‘Discovery – building a UK metadata ecology‘ on Thursday, 26 May 2011, in London. See you there?

Ken Chad is writing up a report from the day and Helen Harrop is producing a blog, both of which will be signposted from the website: http://rdtf.mimas.ac.uk.

JISC #rdtf meeting, Birmingham (Jerome)

Posted on March 1st, 2011 by Paul Stainthorp

I’m in Birmingham for the JISC Infrastructure for Resource Discovery start-up meeting. We’re here to get to know the other 7 projects that JISC has funded. Here’s what we’ll be talking about:

The objectives for this meeting are:
  • To introduce the bigger picture of the resource discovery taskforce work and all of the projects that are involved
  • To share approaches and knowledge on the key issues for the programme – technical approaches, licensing and aggregation.
For this session each project will need to prepare a 5 minute overview of their project. We would like your overview to address the following questions
  • What content and metadata are you working with?
  • How will this data be made available?
  • What are your use cases for the data?
  • What benefits to your institution and the sector do you anticipate?
12.30 Discussion of technical approaches
  • Each project will be asked to briefly outline the biggest technical challenge they face in their project. We will then look for common issues and opportunities for projects to collaborate.
  • What technical approaches and tools are you using?

And here are my slides for the 5-minute presention on Jerome: