Posts Tagged ‘Cambridge’

Slides on the CLOCK project for #Mashcat (Cambridge mashed library cataloguing event)

Posted on July 5th, 2012 by Paul Stainthorp

Mashcat logoA whole contingent from Lincoln—Andrew Beeken, Trevor Jones, Elif Varol and I—are at the Cambridge University Clinical School at Addenbrooke’s Hospital in Cambridge, for a mashed library event – Mashcat.

Mashcat is “a mashed library event focussing on cataloguing data. For cataloguers, developers and anyone else with an interest in how library catalogue data can be created, manipulated, used and re-used by computers and software”. It’s being sponsored by DevCSI.

We’re presenting about the CLOCK project to a room full of cataloguers. No pressure. The slides are online at: http://lncn.eu/hknp

It’s a model and it’s looking good

Posted on May 23rd, 2012 by Paul Stainthorp

Ever since the CLOCK meeting we had in Peterborough, I’ve been trying to describe how open linked bib data might open up new models of ‘cataloguing’, resource description, and (by extension) presentation of bibliographic information to a user of a discovery system.

I’ve found it quite difficult to articulate these ideas without resorting to vague hand gestures and gibberish. At the recent CLOCK hack days at the CARET offices in Cambridge, we finally managed to capture these models on paper [actually, we used Lucidchart]. Thanks to Ed Chamberlain and Trevor Jones for taking notes as we talked through the various models, and for Ed’s colleague @ppetej for acting as a sounding board and critical friend.

The diagrams describe cataloguing processes real and hypothetical. They use a kind of pseudo-scientific notation which I find helpful; feel free to ignore it if you don’t.

Also: a cop-out disclaimer: these are rough sketches not polished theses. Please feel free to jump in and criticise, tweak, suggest improvements. If you understand Linked Data, we’re really interested in your comments about how these models could be physically represented. We’re not trying to suggest that any one of these models has all the answers or could be a ‘just-plug-it-in’ replacement for current practice, and we don’t intend to write software as part of the CLOCK project that will make these a reality. But: somewhere in the middle, we think there might be ideas or threads that are worth tinkering with and following up.

1.

The first diagram attempts to describe copy cataloguing as libraries currently understand it, and involves the transfer of MARC records between institutions. When someone catalogues a book or resource, they tend to copy an existing record from another database, alter it to their needs and use it as they see fit. The record of any changes made is lost. Over time, this convention results in many unconnected versions of a record. N.B.:

  • The ‘donor’ institution (X) has a certain reputation, which is why the ‘recipient’ institution X′ chooses to copy its records.
  • Cataloguers at recipient institutions add, delete or change individual data elements according to local practice, preference or prejudice, or to correct errors. R and R′ are now effectively different entities with no described relationship between them. There is no record of the properties of changes made; no concept of an ‘edit history’.
  • This diagram does not go so far as to include the role of the union catalogue (e.g. Copac, Newton) – where R, R′, R″, R‴, etc., are re-combined (munged was the word we used!) to prove a single, new, averaged record (which is itself just another version of R).

Cataloguing workflow diagram 1 of 3

 

2.

In the second model, which we described variously (and possibly not entirely accurately) as wiki-ish, Github-ish, OpenLibrary-ish, and LibraryThing-ish, there is only one, shared/community version of a bibliographic record for a given work, out on the web somewhere. Various institutions/their discovery systems all agree to use this one record.

  • The record is changed incrementally, one constituent data element at a time. Probably only the most recent version of the record is viewable/queryable by users and applications, although an edit history may exist and so older versions of records may be recoverable.
  • Changes are made by editors who might be cataloguers-at-institutions-with-reputations… or might not be. We’ve assumed that in this model institutional reputation is far less important. (On the Internet no-one knows you’re a cataloguer.)
  • This model doesn’t necessarily have to exist along a single timeline (although that’s how it’s shown here) – code-repository-style branching and merging is conceivable.

Cataloguing workflow diagram 2 of 3

3.

The third, final, and most speculative model is also the most complex and probably the most poorly defined, but I think the most interesting. It’s also very Linked Data.

In any resource description ‘ecosystem’, there will always be multiple versions of a description of an entity out there somewhere (see scenario #1), each providing some unique or particular value to a specific audience. Cataloguers may benefit from a workflow that allows them to view these multiple descriptions and choose the specific assertions from each description that are most relevent to their target audience. In this model:

  • The notion of a series of discrete, changeable ‘records’ largely disappears (Where to? But should it?), to be replaced by a whole mass of overlapping individual data assertions about different aspects of the entity, derived from all manner of different sources. Multiple assertions which are trying to say the same thing about an entity can co-exist.
  • Assertions have additional properties which define and qualify them.
  • No assertion is ever destroyed – though it may be awarded properties which render it superseded or deprecated. Relationships between assertions are maintained.
  • Assertions are assembled on-the-fly into any number of transient Record Representations (RR) which are not permanently stored (though could be cached) according to a set of criteria which we’ve called here a filter. A filter defines a ‘recipe’ for specific data assertions to be included or excluded in the Record Representation, and/or specifies preferences for assertions with particular properties. A discovery tool becomes a device to store filters, and to build Record Representations. Data assertions may be stored elsewhere – and distributed across multiple datastores.
  • Filters could be defined manually by a user, as a set of preferences within a discovery tool. For instance: a second year Chinese medical student at a particularly university could choose to see assertions in Mandarin, to prefer MeSH subject headings over Library of Congress, and to include notes, URLs and local physical holdings information relevant to the university they study at [added by cataloguers who work_at the same institution at which they work/study]…
  • …alternatively, filters could be defined more passively: using ‘clues’ from the user’s institutional context, geolocation, or profile on external social networks (“show me records like my friends see” or even “show me records like people with similar research interests as me see”) to build a personalised filter (leading to personalised Record Representations that no-one else sees).

Key questions: What’s the value added by this model over others? Are there any individual ideas from one model that could be applied to another, even if the model as a whole is too complex?

Cataloguing workflow diagram 3 of 3

Hack da Fens: open bib hack day objectives!

Posted on May 17th, 2012 by Paul Stainthorp

Most of the CLOCK project team (AB, EC, CL, TJ, PS) are at CARET in Cambridge today and tomorrow (17-18 May 2012) to generally hack bibliographic data and try and point the way for the remaining 2 months’ technical development for the CLOCK project.

After coffee on day 1 we agreed our objectives for the next two days. They are:

  1. To review what we’ve done so far and what we need to do. To play with the SPARQL and JSON-parsing search tools that Andrew Beeken has started to develop and to incorporate more data (BL, etc.)
  2. To think about the user interface for CLOCK: how do we present open bib data from multiple sources (Lincoln, Cambridge, Harvard, BL, OpenLibrary, other) in a single UI in a way which helps our users (cataloguers. researchers) solve problems?
  3. What’s the high level architecture for CLOCK? How does data flow thru’ the system – can we draw a meaningful diagram?
  4. A comparison of open data / Discovery projects that Ed Chamberlain is involved in! What can we take and re-use from OpenBiblio2 and the OEM-UK project? What might those projects be able to take and re-use from CLOCK?
  5. What are we going to do with all this data? A plan for http://data.lincoln.ac.uk/http://data.lib.cam.ac.uk/, and http://data.ac.uk/library (or http://library.data.ac.uk/).
  6. To run interviews and live cognitive workthroughs with cataloguers in Cambridge and Lincoln.

Return of the Mash

Posted on April 27th, 2012 by Paul Stainthorp

As I write this, there are just 8 tickets left for #Mashcat, the next Mashed Library event taking place on 5 July 2012 in Cambridge (…and the first mashlib since Pancakes and Mash in Lincoln a year ago? – silly me, I forgot about #ChrisMash). Becuse of the topic and the location, this is a particularly interesting one for the CLOCK project. Mashcat is:

A mashed library event focussing on cataloguing data. For cataloguers, developers and anyone else with an interest in how library catalogue data can be created, manipulated, used and re-used by computers and software. It will be an invaluable opportunity for cataloguers, developers and others to meet and share knowledge, thoughts, and ideas. Possible topics participants could explore on the day include the principles behind the data, tools and code for working with it, and real examples of work on bibliographic data.

Mashcat is a free one day event, which is supported by DevCSI. After refreshments, the first session will start at 10am.

For more information, see http://www.mashcat.info, email us at info@mashcat.info, contact @orangeaurochs and follow the hashtag #mashcat on Twitter.

USTLG meeting on research data management

Posted on November 29th, 2011 by Paul Stainthorp

Clare CollegeYesterday I was at Clare College, University of Cambridge for a meeting organised by USTLG, the University Science & Technology Librarians Group. The group—open to any librarians involved with engineering, science or technology in UK universities—has meetings once or twice a year. The theme of yesterday’s meeting (free to attend, thanks to sponsorship from the IEEE) was data management, with an implied focus on research data.

The meeting consisted of a series of presentations (plus a fantastic lunchtime diversion, below) with plenty of time for networking – there were about 40 people there, all with an interest in research data management – though interestingly, a show of hands suggested very few people were actively engaged in looking after their own institution’s researchers’ data.

As usual, this blog post has been partially reconstructed from the Twitter stream (hashtag #ustlg).

First up, Laura Molloy, substituting for Joy Davidson of the Digital Curation Centre (DCC), on a project called the Data Management Skills Support Initiative (DaMSSI), looking at the [shades of information literacy] skills needed by different people involved in the research data curation process. “DaMSSI aims to facilitate the use of tools like Vitae’s Researcher Development Framework (RDF) and the Seven Pillars of Information Literacy model” developed by SCONUL. Key question: how do you assess the effectiveness of research data management training?

Useful links:

Second, Yvonne Nobis of Cambridge’s Central Science Library talked about supporting researchers at Cambridge: data sharing and the role of librarians; including her project—funded through CUL’s Arcadia library staff research scheme—looking at the issues involved in curating not research data per se, but the software code and techniques used to analyse that source data. Key points: [1] there are disincentives (time, and lack of recognition within ones own field) to researchers’ spending time on code/software for research data manipulation. [2] But without that investment in code, the transparency–openness–replicability of computational-data science is at risk. [3] ”Librarians are missing a trick” by not engaging in research data software curation issues. Yvonne also talked about the work of the eScience Centre.

Links and articles…

Before lunch we also got a chance to inspect the USTLG’s brand new website (and smashing new logo), at ustlg.org

Then the highlight of the day… we were invited in groups over to go over to the adjacent University Library, where we were treated to a display and commentary on some of Cambridge University’s rare science manuscripts and early printed books. All laid out in a reading room were Isaac Newton’s notebooks containing his notes on the method of fluxions (i.e. early calculus), Darwin’s field notes from the Beagle, Ernest Rutherford’s lab diaries (still slightly radioactive! – “…not ever so, but Health & Safety made us do a risk-assessment…”), plus Prof. Stephen Hawking’s typed and ring-bound first draft of A brief history of time, along with several early printed herbals and a book containing the first known technical drawings (of machines of warfare). Inspiring stuff, and really quite brilliant of them to lay it out for us to see!

In the afternoon—not directly connected with research data, but certainly of interest to the engineers involved in the Orbital project—we heard from Rachel Berrington of the IEEE, about the work of the organisation and some of the planned developments to the IEEE Xplore platform: new journal titles in 2012, a mobile platform, the inclusion of CrossRef data, and new interactive HTML content.

Handful of interesting links:

Finally, a useful presentation from Anna Collins, Research Data and Digital Curation Officer (good job title) for Cambridge’s DSpace repository. Anna spoke about the Incremental project, a joint exercise between Cambridge and the University of Glasgow, aimed at providing a best practice approach to supporting data management techniques amongst research communities. This is really good practical nuts & bolts stuff (e.g. when’s the right time to broach the subject of data curation with a PhD student? Too early, and they won’t care – too late, and the best you can do is help pick up the pieces!). I’ll be recommending my colleagues at Lincoln take a look at the materials on both institution’s websites. Top quote: ”be the boss of your hard drive”!

Links from Anna’s presentation:

(An aside: after the USTLG meeting had ended, I was lucky enough to get a quick tour of [about 1% of] the Cambridge University Library, along with a cup of tea in the staff room(!), thanks to a “badly-encoded” colleague. I won’t blog about it in any detail now—hopefully I should be back in Cambridge in January for another Orbital-related event—but it’s just a jaw-dropping library.)

The new USTLG website is at ustlg.org, and you can follow them on Twitter at @USTLG.

Jerome/COMET hack day: Fun in the Fens

Posted on August 10th, 2011 by Paul Stainthorp

Here’s a photo of the CARET (Centre for Applied Research in Educational Technologies) offices at the University of Cambridge, where we held our log-awaited joint Jerome/COMET hack day, on Monday 8 August. Actually, in the end, it turned out to be a kind of Jerome/COMET/SALDA/synthesis/OUseful mashup-AH!

Jerome/COMET

In attendance (for the record):

Train mayhem aside (in the end the Lincoln contingent didn’t arrive until nearly midday), it was a really useful day and well worth doing. Particular thanks to Ed Chamberlain and his colleagues for hosting the event and for arranging the food and refreshments. Thanks also to everyone who travelled from afar for no other reason than they love a good mashup.

Typically, the ever-prolific Tony Hirst has already managed to write up not one, but two blog posts about ideas that came out of the day:

  • Getting Library Catalogue Searches Out There…
  • Open Data Processes: the Open Metadata Laundry (N.B. this one relates specifically to Jerome – in particular, our notion of ‘scrubbing’ dodgy MARC records by taking only the identifiers plus the bare citation-only fields, and using that minimal set to grab additional free and Open data from the web, automatically creating new full versions of records that are inherently Open. ‘Metadata laundry’, me like.)

Here are three more ideas/conversations we had in Cambridge that I thought were going somewhere interesting. Yeah, we might get around to actually doing these, sometime…

1. Using COMET data to enhance Jerome

The ideaSimilar to the ‘metadata laundry’, above, and to the way Jerome already uses data from the Open Library, JournalTOCs, LibraryThing, etc., to enhance its book records with additional metadata. Jerome constructs a URL in the form http://data.lib.cam.ac.uk/isbn/_______, with the ISBN from the Jerome record dropped in at the end. COMET responds with a link to an open record in RDF and/or JSON, which Jerome gladly sucks in, adding any additional fields to its original source record. Enrichment ensues.

2. Using Jerome search to ‘skin’ COMET

I called this one ”Jerome Scholar” ;-) …we make use of the search aspects of Jerome (in particular, the speed of Sphinx, the ‘mixing desk‘ idea, the neat record presentation, to provide a really smooth way of interacting with the much more well-structured (hence “Scholar”) data that resides in COMET.

3. Using the differences between the two datasets to tell us something interesting

I have a notion that there’s something inherently useful about being able to compare two versions of a record for the ‘same’ object. If we could use Jerome+COMET to generate a web application/data feed – one that other discovery services could themselves consume, we’d have ways of ‘sparking off’ whole new avenues of discovery: from misspelled names, variant titles, different subject terms assigned by different cataloguing practices, etc. Like xISBN, but for non-standardised data(?). All right, that’s the fuzziest of the three ideas. And as the eminiently sensible Owen Stephens kept asking me, “…what’s the use case?”.

And then we went to the pub.

And then we went to the pub.

Boutique technique clique: critique

Posted on March 30th, 2011 by Paul Stainthorp

I was in Cambridge last week, for a symposium on ‘Personalised library services in HE‘, the brainchild of Andy Priestner and Libby Tilley, both of Cambridge University. They were the authors of a CILIP magazine article last year…

Priestner, A. and Tilley, E. (2010) Boutique libraries at your service. Library & Information Update, 9(6), pp.36–39

Homerton College, Cambridge 2…which explored the idea of the ’boutique’ library service: autonomous, small-scale (probably), human-scale (certainly), highly personalised, user-centric, non-homogenous, quality-over-quantity. C.f. the ’boutique’ hotel.

Quite a few of the presentations from the symposium are online:

I was there with my Jerome hat on (“radical personalisation”, remember?), combining my attendance with a meeting with Ed Chamberlain of the COMET project, but it was an event that was maybe aimed more at subject librarians, or library staff from small (campus, specialist) libraries. It wasn’t the sort of event I’d normally think of attending, and I was impressed by the enthusiasm and positivity of people’s ideas: it seemed like there’s a desire to celebrate what’s unique and worth cultivating in academic libraries, and which perhaps has been lost in recent years.

Twitter was getting a good hammering, as usual.

I’m not entirely convinced by the ’boutique’ idea as a workable model for academic library services… at my grumpiest, I’d characterise it as an unholy mixture of what we’re already doing anyway, what we could never possibly afford to do, and what technology will take care of with or without us… but it’s definitely a fresh way of thinking about libraries and how we might ‘sell’ them to our parent institutions.

In any case, I’m convinced just enough that I’m going to be putting forward a 1,000-word case study on how our own Holbeach Campus Library provides a personalised service to an less-than-usual group of library users, for possible inclusion in a forthcoming book on personalisation in HE libraries (to be published by Ashgate).

The ‘Personalised library services in HE’ blog is at: http://personalisedlibraries.wordpress.com/

See also Emma Cragg (Digitalist)’s blog post on the same event: “my default position has largely been to define [students] by their method of study; full-time, part-time or by distance. Now that we are all becoming more connected, more reliant on the Internet and used to the ease of access to information I think these boundaries are blurring“.

My library ‘footprint’

Posted on December 21st, 2010 by Paul Stainthorp

Very slightly inspired by a recent blog post by Joss Winn:

A couple of things have reminded me recently that it might be useful to describe how I use libraries.

Historical interlude: my first experience of libraries would have been in visting Cullercoats/North Tyneside Central public libraries in the ’80s. After moving down to Lincolnshire, I borrowed books from Horncastle public library (more on which later), and used my secondary school’s Jobson Library (named after local benefactor George Jobson).

As an undergraduate, I didn’t use APU’s university library all that much. I remember, vaguely, a library induction talk in a large lecture theatre. I used to cycle in to campus early and read their newspapers before my first lecture. Over three years, I might have borrowed a handful of books (not really course-related) and a few music scores. And occasionally used the study carrels to work on maths assignments, when I really needed to concentrate.

Overall, looking back, it was a bit of a missed opportunity. I didn’t understand the value of the campus library: at the time I was much more excited by our course lab and studio facilities, and by the Sinclair computing centre, which gave me my first taste of the Internet, email, IM, Yahoo! and Lycos, web design and HTML, and which stayed open until 9pm (I remember being surprised and impressed by that; just as I was by the first 24-hour garage I found in Cambridge. Such things did not exist in rural Lincolnshire).

After having worked as a librarian at the University of Lincoln for a few years, I made a slightly better stab at using the services of the Robert Gordon University’s Georgina Scott Sutherland Library while I was studying there for my MSc. Because Aberdeen is a long way away, I never actually visited the library in person (I still haven’t), but I made heavy use of both their e-resources and their postal loans service.

Great Central Icehouse

Now, in 2010, I regularly use the services of four libraries:

  1. Horncastle public library, which is ten minutes’ walk from my front door. My children go there every week for storytime and activities. From time to time, I check my LibraryThing wishlist against the Lincolnshire County Council ‘Virtual Library‘, and reserve books to read on the bus. (What would be really nice would be if I could point my LCC library account at an RSS feed of my LibraryThing wishlist, and be alerted when a new title becomes available). And I’ve recently been getting into researching my family history, for which the public library’s online access to Ancestry is invaluable. Horncastle library has also been a great place to work ‘from home’ when the roads have been bad this winter. I’ll be pleased when they upgrade from IE6, though.
  2. I’ve also joined Essex public libraries. I was tipped off about them by a colleague: they don’t require that you be resident in Essex to join, and they have a very good collection of e-books (Lincolnshire public libraries don’t do e-books, yet). I think I might also still be a member of East Riding Libraries, from when I lived in Beverley in the East Riding of Yorkshire.
  3. As I mentioned last week, I often base myself in the British Library when I’m in London: because it’s so close to King’s Cross and St Pancras railway stations; because they offer decent, free wi-fi; because there’s always an exhibition to see; and because there’s plenty of coffee to hand.
  4. Last but not least, the 5 libraries of the University of Lincoln – because that’s where I work.

Libraries I’d like to visit include the Ward Library, Henry Bloom Noble Library, and Castletown Library (all on the Isle of Man), the Lit & Phil in Newcastle, and Cambridge University Library.

Librarian props

Posted on July 29th, 2010 by Paul Stainthorp

</modesty>

"At the very early stages, Paul Stainthorp from Lincoln University did tremendously wide literature searches for me and this work has been invaluable."

Taken from: Chapman, J. (2009) Issues in contemporary documentary (with additional research by Kate Allison). Cambridge: Polity Press [Google book preview]

And:

"Equally, the research on primary and secondary publications undertaken for me at Lincoln University by journalism subject librarian Paul Stainthorp continues to be extensive and far-reaching. I am constantly grateful for Paul's energy, application and thoroughness, [...]"

Taken from: Chapman, J. (2007) Documentary in practice: filmmakers and production choices. Cambridge: Polity Press [Google book preview]