Posts Tagged ‘final’

This is the end (again) – “final” CLOCK project blog post

Posted on August 1st, 2012 by Paul Stainthorp

[A commentable version of this blog post can be found on the CLOCK project blog.]

Here is a summary of the work undertaken as part of the CLOCK project, February-July 2012. As in previous projects, I don’t expect that this will really be the end of this work. We intend to carry on developing tools for publishing, consuming, and playing with Open Bibliographic Data at the University of Lincoln (via LNCD or elsewhere), and I expect us to take CLOCK ideas, data and code further over the coming months.

Thanks to all the people who have contributed to the CLOCK project.

1. Outputs: what has the project produced?

Our major tangible output is a library of code for interrogating and working with multiple, distributed sources of open bib data.

  1. The PHP source code for all of the prototypes produced by the CLOCK project is available from a GitHub repository - https://github.com/lncd/clock. This is an open, public, “live”, working version of the code which may well be developed and improved over time. (A ‘snapshot’ of the code as it existed on 27 July 2012 has been archived here.) The public code is available to re-use under a GNU Affero General Public Licence. All of the software prototypes have been detailed in the following posts:
  2. A series of modified cognitive interviews were conducted with several university library cataloguers at Cambridge University Library and the University of Lincoln. The interviews have been summarised here and here. We captured narrated screencapture videos for each of the interviews: these are being held in a private repository and could be mined further investigated for similar projects.
  3. A number of further blog posts discussing:
  4. While it is not yet publicly accessible, we have taken significant steps toward the establishing apermanent data.lincoln.ac.uk service at the University of Lincoln. We expect that by late August 2012 this service will be operational and will provide a gateway to the entirety of the bibliographic data and enhancement/manipulation tools produced by CLOCK and its predecessor project (Jerome).

2. Lessons learned

CLOCK reinforced to us that the constraints of a six-month project (effectively 4½ months of development time) are difficult to reconcile with the needs and possibilities for constant experimentation and development around bib data. We also identified the following points:

  1. The CLOCK model (of presenting different versions of the same bibliographic element for recombination) is feasible and can be modelled in software—albeit with a limited number of sources—even in a limited time; it has potential as a real-world tool for use in libraries. A writeup of this approach can be found here: CLOCK – the localized index model
  2. Software developers and librarians need to be aware of, and realistic about, the limitations of particular bib data formats. For example, real-time querying of RDFcan be very resource intensive. Be clear about your needs up front. Not every shiny open data format is right for every shiny open data project!
    • Related: synchronous querying of SPARQL endpoints is not the way forward! (We characterised this blind alley as “federated searching for the 21st century”…)
  3. We are a long way away from consistency of approach in the exchange of non-MARC bibliographic information. Every data source we approached in CLOCK required that we develop a different tactic from first principles to query, retrieve and manipulate/execute functions. Our blog posts on the idea of a ‘universal translator‘ explore this problem further.
  4. Available data is often poorly and confusingly documented. We were regularly frustrated merely in understanding what a data source contained and how it was structured. I (PS) have argued throughout that without some form of centralised registry/gateway (“data.ac.uk/library“) – whether it be managed through external curation or self-submission to a rigorous documentation standard, developers will waste time repeatedly having to interpret the same data again and again. We believe that a national bib data portal would be a great help to centrally index data sources and catalogue their respective schemas. We understand that not everyone agrees with this approach (“who pays?”) and we invite discussion of the alternatives!
  5. And a couple of practical ‘meta’ lessons about running agile software development projects in libraries:
    • Working with a number of developers based in different locations is hard. On a project like this it would be ideal to have a central location that could be used as a development hub. Other projects under the LNCD umbrella have found the same.
    • There is the risk of losing ground already made if the outputs of previous projects (e.g. Jerome) are not correctly maintained and curated. This can lead to significant delays when previous work is not available. More rigorous use of development tools—including GitHub and Orbital—is helping to mitigate against this in future.

3. Opportunities and possibilities

We have identified possible continuations and extensions of the CLOCK work; these are detailed further below.

  1. Continuation of development of the CLOCK software as a tool for (a) faster querying of distributed bib data sources through local indexing of key fields, (b) translating disparate data formats into a common translation standard, (c) meaningfully presenting distributed bib data to the user, (d) allowing a ‘cataloguer’ to select and recombine bibliographic elements to create a new record, which (e) feeds back into a new original data source, making the process iterative, and (f) incorporates social and reputational components in a user’s selection between alternative data elements for the same resource.
  2. Discussion of the business case to libraries. What value does an alternative resource description model offer, and what would libraries gain through relinquishing some of the control over institutionally-owned catalogue data? Related: could we quantify and qualify the time spent on particular cataloguing / discovery activities using traditional LMSs and demonstrate possible efficiencies or increased quality in a new, distributed model? What arguments for the incorporation of open bib data in cataloguing will convince library managers to replace current practice?
  3. A more thorough examination of all the application functions that a cataloguer relies upon through more extensive cognitive interviews and/or functional mapping processes with cataloguers at a range of institutions.
  4. Further investigation of best practice in documenting and describing our own published bib data in JSON/RDF. [Being picked up as part of the data.lincoln.ac.uk work, above]
  5. We also intend to submit for publication articles on a number of topics arising from CLOCK. Suitable journals and conferences have been identified and articles are being prepared on the following broad topics:
    • The approach of the CLOCK project in developing software for working with multiple, distributed sources of open bib data.
    • Expectations of truth and ‘trust’ in bibliographic data (“How has this assertation been derived about a work?”)
    • The potential of new models of resource description to save time and effort for libraries; techniques for analysing the efficiency of search and cataloguing workflows.

This is the end: Jerome project

Posted on August 1st, 2011 by Paul Stainthorp

The JISC-funded Jerome project ended on 31 July 2011. Here are the final few project blog posts:

Jerome record pageThe Jerome search portal itself is [still] at http://jerome.library.lincoln.ac.uk/, and the open data APIs are all being documented on http://data.lincoln.ac.uk/ – we’ll not be switching any of it off any time soon :-)

LIDP: end of project. Using libraries = good.

Posted on July 28th, 2011 by Paul Stainthorp

I was in Huddersfield last week for the final project meeting of the Library Impact Data Project (LIDP).

LIDP was successful in proving that:

There is statistically significant relationship between both book loans and e-resources use and student attainment. And this is true across all of the universities in the study that provided data in these areas.

“We want to stress here again that we realise THIS IS NOT A CAUSAL RELATIONSHIP!  Other factors make a difference to student achievement, and there are always exceptions to the rule, but we have been able to link use of library resources to academic achievement.”

An initial (outline) report on how the University of Lincoln’s own activity-attainment holds up to this same statistical inspection is available to download from here [PDF]. As much as possible of the library activity data used in the project will be released under an Open Data Commons Attribution License in the near future, and hosted on the project blog.

LIDP [old photo]Thanks are due to Graham Stone, Dave Pattern, Bryony Ramsden, and all the project partners for the opportunity for Lincoln to participate in this project. We had fun getting our together. The end-of-project blog post for LIDP is here – it suggests some very interesting areas for further investigation.

Personally, I’m very interested in looking for cross-institutional comparisons – perhaps trying to explain particular levels of activity-attainment attached to individual subject areas, irrespective of which university the student is at (i.e. does a Lincoln computing student have more in common with a Lincoln business student, or with a Huddersfield computing student?). I’d also be interested in looking particularly at those students whose library activity behaviour changes through the life of their course, and who then go on to get a better degree than they might have been predicted based on their library activity in their first year.

“Finally, we have been astonished by how much interest there has been in our project. To date we have two articles ready for publication imminently and have another 2 in the pipeline. In addition by the end of October we will have delivered 11 conference papers on the project. All articles and conference presentations are accessibly at: http://library.hud.ac.uk/blogs/projects/lidp/articles-and-conference-papers/

I can see this project getting cited, and cited again, simply every time anyone wants to argue that academic libraries are A Good Thing.