Posts Tagged ‘CLOCK’

CLOCK and a summary of 2 other Discovery projects

Posted on May 17th, 2012 by Paul Stainthorp

Ed Chamberlain, who is on the CLOCK project team as a researcher, is involved in two other projects under the Discovery strand: OEM-UK and Open Bibliography 2. We’re looking for ways in which CLOCK can re-use data, code, processes and ideas from these projects (and elsewhere) – also what CLOCK could offer in return.

Notes:

  • Open Biblio project over the last few years; aim to aggregate large amounts of bibliographic data for scientific discovery.
  • Data collected from Cambridge University, the BL, PubMed and held as RDF, used to power an open catalogue called “Bibliographica“.
  • Problems around scaling the data/system led to the current JISC-funded Open Biblio 2 project (in the meantime, Cambridge and the BL had started to publish their data openly).
  • Open Biblio 2 started looking at a NoSQL approach (CouchDB, Lucene/Solr) – eventually settling on Elastic Search.
  • The approach of Open Biblio is to build bottom-up, community tools: BibServer and BibSoup (“Like Wikimedia for bib data”). Raises interesting questions about data quality in an open community-driven system.
  • Also looking at JSON as lightweight way of sharing bib data: emerging BibJSON convention for representing bibliographic record as a JSON object (Ed wrote a MARC-to-BibJSON-parser in Perl). N.B. BibJSON is not a million miles away from the JSON that Jerome spits out! There are three hack days taking place next month in London to look specifically at BibJSON.
  • Open Biblio 2 is also looking at JSON-LD (JSON for Linking Data), a ‘real’ JSON standard which does a lot of the things that RDF does.

tl;dr = use their JSON standards and BibSoup as a data source.

  • The second project, OEM-UK (Open Education Metadata UK), based at the IoE in London, is focusing on cataloguing workflows.
  • Data from the IoE’s SirsiDynix catalogue, plus EPrints is drawn into a Drupal framework; forms to create data (autopopulation of forms); “cataloguing the Drupal way”.
    • Thought from Andrew Beeken: could we replicate this approach, using WordPress custom post types to store and display structured content? Shades of the OPACPress project which Joss Winn and I proposed—but that was not funded—several years ago.
  • Some evidence that this approach is capable of speeding up the cataloguing process considerably: the more data you put in the faster it gets! Ed has some screencapture videos from OEM-UK showing workflow, including grabbing data via Zotero.

td;dr = OEM-UK are also successfully disrupting cataloguing workflows.

On open data licensing and sustainability

Posted on May 17th, 2012 by Paul Stainthorp
Last week I attended a free ‘licensing clinic‘ in Birmingham, organised by the Discovery programme – mainly as a means of kick-starting my brain into considering the copyright/licensing issues around the CLOCK project. Here are my notes.
  1. The Jerome project addressed licensing in April, 2011, and the situation hasn’t really changed for us: we’re still intending to expose as much of our bibliographic data as possible using a properly open licence such as CC0.
    • “The licensing of data is an interesting one, since we run into a whole bunch of questions around who actually owns the information in our catalogue. Since it’s all factual information (and you can’t copyright a fact) then surely it’s a free for all – except that EU law introduces a curve ball in the form of database right. Broadly speaking this provides specific protection for collections of records, but not the records themselves.”
  2. Ed Chamberlain and the COMET project also addressed licensing and the ownership of MARC records: work that we should revisit.
  3. The JISC Open Bibliographic Data Guide (obd.jisc.ac.uk) provides very clear advice and information useful in creating an open data business case. E.g.:
    • “[…]if we presume that the rationale for publication is to ensure the widest possible dissemination then adoption of a generic open data license (such as Open Data Commons or CC0) is the most effective way to make the set of potential uses unambiguous. Restrictive licenses are counter-productive[…]“
  4. There is some very helpful guidance coming out of the Discovery project around building a business case for open discovery. This was summarised at the recent Discovery programme meeting (also in Birmingham) by David Kay –
    • N.B. I’ll revisit this in a future blog post. I’m getting almost surprisingly interested in the problem of ‘selling’ the idea of open bib data to an institution, and I’ve found the Discovery work on business cases increasingly useful.
  5. At Lincoln in March, 2012, we had a very useful visit from Sander van der Waal of OSS Watch where we discussed the University of Lincoln’s approach to openness (Open Source, Open Access, as well as Open Data). Joss Winn is following this work up with the University’s IP manager with a view to writing a University policy on open licensing of our IP.
  6. Related to the ‘business case’ aspect is the work of LNCD (and also discussions I’ve had with Ed Chamberlain recently) about how to ensure sustainability of open services in a technical sense – what sort of systems architecture and processes do we need in place, and how do we work with university ICT support departments to ensure that projects become institutionally-supported services when it’s important for them to do so?
  7. At this, Birmingham event, Chris Banks of the University of Aberdeen presented about the benefits and challenges of sharing from a library director’s perspective. I was particularly interested in the metaphor of “metadata as currency”: how are aggregators creating value based on the mass accumulation of metadata, and how are they selling that value back to libraries? See Chris’s blog for more. Aberdeen are clearly doing a lot around the analysis of e-resources usage and relating it back to their library strategy / information literacy, etc.
  8. Paul Miller (Cloud of Data): one key quote “amateurs tend to do a better job of aggregating content than institutions” (e.g. collections of images on Flickr). This may be in part because individuals don’t have the same risk-averse approach, but whatever the reason
  9. Barrister Frances Davey gave us a quick run-through of IP law as it relates to data. Key quote: “the legal repercussions of publishing data openly are pretty much nil“. Fear and uncertainty poisons initiative! Frances also touched on the business / reputation-management arguments for having an active approach to open data: people might well be getting bad copies of your data already (via screenscraping) – release it yourself and take control of the quality. Example of the British Library choosing a CC0 licence precisely because of the lack of an attribution clause – then any subsequent re-use is “nothing more to do with us”.
  10. Then, after lunch, copyright consultant Naomi Korn ran a workshop on the practical aspects of choosing a licence for your data. Naomi spoke about the need to start by deciding how open you want to be as an institution (noting that institutions with a dedicated © person tend to have a greater appetite for risk) – then consider whether you have the resources in place to get where you want to be. Key quote: “Let’s do some attribution mapping!” Some link from Naomi’s workshop:
  11. At the Birmingham clinic we also discussed the risks (including the risk of doing nothing) and benefits of taking an open approach. My contribution: open bibliographic data enables high-level services to be sold back to universities (c.f. Chris Banks’ notes on metadata aggregation, above). We shouldn’t be scared of this or see it as a reason to not open up our data (we can’t compete with those companies; we want their services and we’re prepared to pay for them!); but we can build lower-level, locally-relevant services as a result of releasing our own open data, and play on the web by web rules – if we don’t make our data open for re-use on the web, we can’t even have the conversation. Lincoln’s approach is entirely around open data as a means to an end: it’s the best and most natural way of sparking off new, innovative services based on unexpected combinations of our own and other people’s data.
    • The best example of this so far are the new data-driven staff profiles at Lincoln: but we’re going to need more and more convincing examples if we’re going to make a convincing business case.
  12. Final overall quote of the day: “Writing your own open licence is an unpleasant form of vanity“.

CLOCK notes – 8 May 2012

Posted on May 8th, 2012 by Paul Stainthorp

This is what the CLOCK project team are currently up to (from meetings over the past couple of weeks and from notes made at the recent Discovery: making sure your resources are discovered, used and reused event in Birmingham):

  • Andrew Beeken has been exploring the Cambridge COMET data via its SPARQL endpoints and has already blogged about the process of using SPARQL to “build kind of a ‘Hello World’ of open data querying”. He’s now looking at the recently-released Harvard open bib data and comparing the speed, the use of matching namespaces, and the use of JSON vs RDF/XML.
  • This work is leading up to unified search and presentation of records from several sources (Cambridge/COMET, Harvard, Lincoln/Jerome, OpenLibrary, etc.). Andrew and Trevor Jones are collaborating on drawing up a high-level architecture for CLOCK, and a strategy for expressing Linked Data, which will be shared with the rest of the project team (and publicly) for discussion.
  • To support this, Alex Bilbie in ICT services at Lincoln is helping to get the original Jerome application up and running on the CLOCK server (jerome.library.lincoln.ac.uk), where it can be used as a stable platform for developing and RDF-ifying Lincoln’s own bib data.
  • Trevor Jones and Ed Chamberlain will work together on developing the work with users (in parallel, at the University of Lincoln and the University of Cambridge) to clarify their requirements for bibliographic data:
    • For cataloguers, based around a rethink of copy cataloguing workflows, we will try to tease out requirements from talking to cataloguers (and associated subject librarians) asking to be ‘positively disrupted’: what do they need to do? What is missing from their data?
    • For researchers, we will build on some initial user walkthrough analysis done by Trevor and Andrew in Lincoln, with performing arts students in LPAC (the Lincoln Performing Arts Centre). What are the research questions that users are trying to answer? How does bib data help them answer those questions? What’s missing? Ed and Trevor will agree on a set of questions and tasks;
    • These requirements will be used to feed the remainingcycles of platform development for CLOCK.
  • Ed Chamberlain will act as the conduit between CLOCK and related projects in the Discovery strand, looking for points of shared interest/technology, and blogging (or asking others to blog) about aspects of one project which can inform the others. The other projects in which Ed is involved are: the Open Education Metadata UK (OEM-UK) project at the Institute of Education (shared interest in new user interfaces for cataloguing – possibly use screencasts to demonstrate alternative workflows?) and the Open Bibliography 2 project (lots of potential technical overlap – BibJSON, JSON-LD, BibSoup.net, expression in RDF container formats).
  • Ed and I (Paul Stainthorp) will work on developing the ‘business case’ / sustainability of CLOCK and data.*.ac.uk, following up on themes discussed in the recent Discovery event, and thinking not only about institutional funding / high-level support for open bib data, but also what it takes to move open bib data publishing from a development environment into an institutionally-supported, ICT-run service.
  • Finally, PS is arranging a couple of internal CLOCK ‘hack days’ (to take place on 17th-18th May, in Cambridge) – more details to follow.

Imminent domain

Posted on May 4th, 2012 by Paul Stainthorp

With various new services arising out of the ongoing Library ICT systems review, we’re amassing a nice little collection of library-related 2nd-level subdomains. Here’s a list, which I’ll edit as they become live.

  1. http://library.lincoln.ac.uk/ (i.e. the ‘bare’ library subdomain: this isn’t used at the moment, but we intend that it will become the Library’s ‘root’ web presence)
  2. http://www.library.lincoln.ac.uk/ (currently used for our SirsiDynix Horizon Information Portal OPAC, which we intend to move to catalogue.library… in order to free up www for our web pages hosted on WordPress)
  3. http://catalogue.library.lincoln.ac.uk/ (the future home of the library catalogue)
  4. http://findit.library.lincoln.ac.uk/ (a launch point for our new Discovery system, still to be announced, and with a name yet to be decided!)
  5. http://lists.library.lincoln.ac.uk/ (Talis Aspire reading lists, currently being developed)
  6. http://archives.library.lincoln.ac.uk/ (Axiell Calm archives and special collections software)
  7. http://jerome.library.lincoln.ac.uk/ (Jerome is our innovation platform and a home for experimental search services, being re-developed as part of the CLOCK project)
  8. http://auth.library.lincoln.ac.uk/ (OpenAthens LA v2.1 authentication software)
  9. http://proxy.library.lincoln.ac.uk/ (EZProxy authentication software)

We also have two core systems which aren’t on the library subdomain:

  1. http://eprints.lincoln.ac.uk/ (the Lincoln Repository on EPrints – it’s appropriate that this isn’t on library, as we’ve always managed the Repository as a shared/collaborative project between CERD, ICT services, the Library, and the Research Office)
  2. http://ill.lincoln.ac.uk/ (CLIO inter-library loans software)

The technical approach: a CLOCK dev stack

Posted on May 2nd, 2012 by Paul Stainthorp

A note on technical development:

We’re beginning to make some progress towards a framework for development in the CLOCK project. Project developers Trevor Jones and Andrew Beeken, with the support of the other developers in LNCD, now have the following at their fingertips:

That list should give you an idea of LNCD’s approach to development. [N.B. some links may not be publicly accessible.]

CLOCK implementation: key themes (the Peterborough meeting)

Posted on May 2nd, 2012 by Paul Stainthorp

Screengrab of our notes from the CLOCK Peterborough meeting

This blog post is a comment upon the formal project implementation plan, and gives some more detail about how the CLOCK project intends to meet its project aims.

In February, 2012, the project team (EC, CL, PS, OS) met at Peterborough Regional College (roughly equidistant between Lincoln and Cambridge!) to discuss the implementation plan and our CLOCK ‘first steps’. We made copious notes using an interactive whiteboard. Here’s what we agreed for CLOCK…

Most of the day’s discussion was spent attempting to define more clearly the users/audience for CLOCK, narrowing down the field of study a bit as we went along, and looking for potential ways to engage those audiences in the research. We agreed that our users consist of:

1. Cataloguers and library managers looking to innovate their resource description workflows as well as contribute to the corpus of Open Bib Data, through improving/correcting/augmenting existing records as well as submitting new records, “adding to the story” by allowing libraries to incorporate data elements outside the boundaries of traditional resource description.

We spent a while discussing how the project might approach the problem of proposing new ”…minimal workflows for cataloguing around individual, disaggregated RDF elements” (taken from the project plan). We’ve also since discussed this back at Lincoln with staff in the Library and LNCD – I’ll shortly be blogging some diagrams which illustrate several different possible approaches to cataloguing workflow, as part of the ‘Users and use cases’ thread. We’ll also ve speaking to cataloguers at Lincoln and at Cambridge to try and get a clearer picture of the ‘pinch points’ in existing cataloguing, where applications using OBD might make a difference to their work.

Key quotes:

“Matching / negotiating of the best available Open bib data through common identifiers; the importance of a social/reputational aspect in identifying authoritative data; [use of] associated social/reputational metadata making explicit the provenance, history, and ‘pagerank’ measurements of each data element. [The phrase 'a narrative verdict on the catalogue record' was used…]“

2. Researchers (qualified as “the ‘serious’ and tech-savvy researcher“), who may be keen to incorporate Open Bib Data in user tools (e.g. citation/reference management software). We agreed to concentrate within the CLOCK project on a specific discipline—that of Drama/Performing Arts—because of the interesting challenges posed by the description of performance resources in existing bibliographic data. (“Almost anything you’d want to know about a play isn’t recorded in the MARC record!”). We identified a number of potentially useful resources and sources of data, including:

  • The play’s the thing
  • TheatreDB
  • Resources in institutional repositories
  • Theatricalia
  • Dutch Culture Link
  • Wikipedia/DBpedia

We agreed that we’ll set up a series of interviews/structured tasks for researchers in performing arts at Cambridge and Lincoln; also for subject librarians in the discipline (as a proxy to the researchers themselves). CLOCK will look at how well existing catalogue data describes performance and related resources (perhaps by sampling MARC records at both instititutions), and how external sources of ‘non-library’ data might complement and enhance those records.

3. Developers attached to academic libraries, who are looking to build applications exploiting available Open Bib Data, and techniques for interrogating and exploiting that data. The engagement with this audience is probably more at a strategic level than the first two – what are the technology choices and the decisions around the design of APIs and data endpoints – can we make a case study on developing using OBD?

We also discussed CLOCK’s overlap with other projects (in particular the Open Biblio 2 and the Open Education Metadata UK project). This work will be picked up by Ed Chamberlain, who is a common factor in all three projects!

“The project team believe that an important aspect of this innovation will be serious consideration given to the development of an awesome, national, open scholarly catalogue knowledgebase for the UK (“data.ac.uk/library” or “library.data.ac.uk”).”

Members of the CLOCK project team have since signed up to the new DATA-AC-UK mailing list and we will use the project as an opportunity to propose first steps in publishing national bibliographic data to data.ac.uk. This will be the topic of a future blog post.

“CLOCK will explore options for updating and maintaining the shared platform on data.lincoln.ac.uk as an eventual service”

University of Lincoln developer Alex Bilbie has blogged about the future of 5★ open data publishing at Lincoln: “As part of the Jerome project, we cracked open the university library’s digital catalogues and stored the data in a sane format (i.e. not MARC). Now through the CLOCK project the data will be semantically marked-up and compatible with other institutions bibliographic data“. This will also be the topic of a future blog post.

Return of the Mash

Posted on April 27th, 2012 by Paul Stainthorp

As I write this, there are just 8 tickets left for #Mashcat, the next Mashed Library event taking place on 5 July 2012 in Cambridge (…and the first mashlib since Pancakes and Mash in Lincoln a year ago? – silly me, I forgot about #ChrisMash). Becuse of the topic and the location, this is a particularly interesting one for the CLOCK project. Mashcat is:

A mashed library event focussing on cataloguing data. For cataloguers, developers and anyone else with an interest in how library catalogue data can be created, manipulated, used and re-used by computers and software. It will be an invaluable opportunity for cataloguers, developers and others to meet and share knowledge, thoughts, and ideas. Possible topics participants could explore on the day include the principles behind the data, tools and code for working with it, and real examples of work on bibliographic data.

Mashcat is a free one day event, which is supported by DevCSI. After refreshments, the first session will start at 10am.

For more information, see http://www.mashcat.info, email us at info@mashcat.info, contact @orangeaurochs and follow the hashtag #mashcat on Twitter.

CLOCK project implementation plan

Posted on April 23rd, 2012 by Paul Stainthorp

The University of Lincoln and Cambridge University Library both delivered successful projects (Jerome and COMET) for the JISC Infrastructure for Resource Discovery Programme in 2011. Both projects produced outputs of interest to researchers, students, librarians, developers, and designers of bibliographic discovery environments.

The CLOCK project is harnessing the success of these two complementary initiatives and investigating new approaches to data creation and discovery in the library domain. In particular, CLOCK will investigate, propose, and develop new, web-based bibliographic tools which will make it easier for different users—cataloguers in academic libraries, and the “serious”, tech-savvy researcher—to find Open Bibliographic Data and incorporate that data into systems and workflows.

This is the CLOCK project implementation plan:

Aims, Objectives and Final Output(s) of the project

The CLOCK project’s overall aim is to challenge assumptions and drive innovation in libraries’ interaction with bibliographic data. The project team believe that an important aspect of this innovation will be serious consideration given to the development of an awesome, national, open scholarly catalogue knowledgebase for the UK (“data.ac.uk/library” or “library.data.ac.uk”).

As a medium-term step towards this goal, CLOCK will explore options for updating and maintaining the shared platform on data.lincoln.ac.uk as an eventual service. Longer term maintenance of the Cambridge open data service will also be investigated.

The investigation will take an experimental approach, building upon the RDF encoded structured metadata released through the COMET project as a readily accessible resource for enrichment of data within the Jerome software environment. At this preliminary stage, four steps in record enrichment have been identified:

  1. Matching / negotiating of the best available Open bib data through common identifiers;
  2. The importance of a social/reputational aspect in identifying authoritative data;
  3. A process of harvesting a returned record (parts of a record) to be re-used;
  4. Enrichment, repair and cleansing of data in the knowledgebase (positive feedback loop).

By exploring aggregation, ‘data cleansing’ and enrichment through readily available open sources, it hope to highlight new distributed approaches to metadata production–cataloguing, storage and delivery, including minimal workflows for cataloguing around individual, disaggregarted RDF elements. The project will explore ways to do this using automated techniques built around open reusable metadata.

Whilst the two million records published under the COMET project will act as a starting point for this process, the participants aim to utilize other sources, including images, table of contents data and related supplementary resources (geo-data, author biographies, etc). Through this, there will be an additional social aspect of the project, to identify and document other authoritative open data sources to consume and to report back on successes and failures.

Alongside a focus on enrichment through open data, the project will recognise that a ‘pure open’ information environment is still far from the norm. It will also investigate methods in which open data can be consumed by semi-open and commercial resource discovery services and how such services may themselves benefit from open approaches to data publishing.

Final Project Products

Primary outputs:

  1. An enhanced Open Bibliographic Dataset containing records sourced from Jerome, COMET, and other open data sources, permissively licensed, delivered over fast API in a range of formats (e.g. MARC, RDF, JSON) as both whole records and disaggregated, Linked data: along with associated social/reputational metadata making explicit the provenance, history, and ‘pagerank’ measurements of each data element. All data and APIs produced will be published on data.lincoln.ac.uk and access will be maintained by the University of Lincoln on that platform for at least the next 3 years;
  2. A repository of Open Source software for gathering, manipulating and publishing such data plus public documentation for the APIs, clarifying in particular the utility of the ‘data cleansing’ and the social/reputational metadata in distributed cataloguing environments;
  3. A proposal for the continuation of the work of CLOCK toward the specific aim of establishing a distributed open scholarly catalogue knowledgebase for the UK (“data.ac.uk/library” or “library.data.ac.uk”).

Secondary outputs:

  1. User documentation: a formal clearly-documented user requirements analysis and evidence of user engagement (e.g. user ‘stories’);
  2. Contextual documentation: a published literature review;
  3. Technical documentation: an examination of relevant standards and processes for manipulating Open Bib Data, particularly via API, and a comparison-cum-synthesis of the parallel approaches to open data publishing taken by COMET and Jerome;
  4. Contributions toward the JISC Open Bibliographic Data guide, initially in the form of commentable public plans for implementation of the shared Lincoln-Cambridge datastore. These will be reviewed at regular intervals and will eventually build into guidance for other academic libraries on releasing data openly. An experimental focus will allow mistakes and development ‘wrong turns’ to be shared with a wider community;
  5. We will disseminate our work during and beyond the duration of the project. Progress will be communicated by regular blogging throughout the life of the project. Project members are active within the UK HE development and library communities. Through blogs, social networks and talks at events they will continue to act as champions for open data publishing, furthering the aims and objectives of the Discovery programme. In particular, we will showcase our work at relevant JISC workshops, and will produce public project documents according to the JISC branding guidelines, targeting specific and relevant audiences.

Wider Benefits to Sector & Achievements for Host Institution

Jerome provided a modern API-centric approach to open data services and discovery using NoSQL database technologies and Open Source search. COMET published over two million records under a Public Domain Data License, many of them available for query via an RDF-store/SPARQL endpoint. Tools and techniques to achieve this have also been release under a permissive software license.

The CLOCK project aims to scope and develop powerful and usable API-based web services which will make it easy to locate available Open Bibliographic Data for a given bibliographic work, These services will be aimed predominantly to meet the needs of:

  1. developers attached to academic libraries looking to build applications exploiting available Open Bibliographic Data, and techniques for interrogating and exploiting that data;
  2. cataloguers and library managers looking to innovate their resource description workflows as well as contribute to the corpus of Open Bib Data;
  3. the ‘serious’ and tech-savvy researcher, who may be keen to incorporate Open Bib Data in tools aimed at the user (discovery, citation/reference management software, repositories).

In addition:

  1. Students and staff at the University of Lincoln will benefit from a substantial increase in the size and ‘weight’ of Jerome/data.lincoln.ac.uk (“Quantity has its own quality”); from a refinement of the discovery interface; and from engagement with RDF/linked data;
  2. Cambridge University Library will benefit from aspects of the Jerome architecture (e.g. the use of schemaless databases and aggregated search indices), the practical re-use of its own data (N.B. through this consumption of its own output, important lessons on RDF utility can be learned and shared. This methodology has already pworked for CUL with its public facing API project), and from the ‘proving’ of existing approaches through agile distributed development sprints;
  3. To wider HE, the project will demonstrate the value of such data (and the development method) to universities and the wider community, enabling future developments. CLOCK is an opportunity to demonstrate ‘real world’ open web services for libraries, including [i] APIs to enhance existing free or commercial Discovery environments, [ii] the making-accessible of emerging sources of open metadata (the BL table of contents; the outputs from the Open Bibliography 2 project), (iii) a distributed ‘data cleansing’ model (articulated at the COMET-Jerome hack day in August 2011), a new more open approach to cataloguing–resource description, (iv) time and money savings for academic libraries through exploitation of the bibliographic commons and tools for engaging with it.
  4. Both institutions share a firm strategic commitment to open data publishing. It is the ambition of the project participants that any future major developments in national level resource discovery learn and benefit from the experiences gained in this project. JISC, Discovery, and the community at large will benefit from demonstrations of the above ‘real world’ discovery-enhancement tools (above), from a robust public discussion of the parallel technologies for storing and manipulating bib data – RDF store vs. schemaless approaches.

Risk Analysis and Success Plan

  1. The principal risk to the success of the project would be an inability to appoint a suitable person to the position of developer in time for the start of the project. The recruitment process for this post began in February and was completed in March 2012.
  2. A related risk is that the other members of the project team (Paul Stainthorp; Ed Chamberlain) are involved in other JISC-funded projects. The project manager (PS) will take care to ensure that—while the work of the various projects may complement CLOCK—there is clear distinction between the goals and outputs of the various projects. Weekly–fortnightly iteration meetings for the CLOCK project will help to ensure this, and Lincoln has established the LNCD group to co-ordinate the work of its overlapping commitments.
  3. As always, there is a risk that key staff may be absent through illness. We will mitigate against this through close collaboration via the web-based development tools, weekly–fortnightly iteration meetings, and periodic reviews of the project.

Risk Analysis (*overall risk = likelihood × severity):

Risk #

Likelihood 1-10

Severity 1-10

Overall risk 1-50*

1.

2

9

18

2.

3

4

12

3.

2

3

6

If the CLOCK project is a success, we anticipate it will have the following long-term effects (ETA up to one year after the end of the project):

  • Developers unconnected with Lincoln or Cambridge will exploit the APIs to build or enhance new open (and semi-open) bibliographic discovery services.
  • Academic libraries will incorporate Open Bib Data elements from CLOCK in their cataloguing–resource description workflows.
  • Serious researchers will use Open Bib Data elements from CLOCK in personal citation/reference management software.
  • A new social/reputational model of reputation in distributed cataloguing will have gained some traction in academic libraries.
  • Significant steps will have been taken toward a national, distributed open scholarly catalogue knowledgebase for the UK (“data.ac.uk/library” or “library.data.ac.uk”).

IPR

We have no objection to JISC making any part of this proposal available should the contents be requested under the Freedom of Information Act, or if we are successful in our bid for funding that our project proposal is made available on JISC’s website.

  1. Any additional bibliographic data or metadata created as a result of this project will be released under an open license that permits unrestricted re-use. Wherever possible, the Open Data Commons PDDL or CC0 will be used.
  2. All software outputs will be released under an appropriate Open Source licence (we will seek further advice from OSSWatch on the most appropriate licence).
  3. All documentation and blog posts will be released under the Creative Commons attribution share-alike licence, CC-BY-SA.

Project Team Relationships and End User Engagement

We intend to use the CLOCK blog (http://clock.blogs.lincoln.ac.uk/) to provide regular updates on the status of the project, and to provide links to working services and data. In addition to the JISC Discovery events, the developer community will be engaged through the growing data.ac.uk community and mailing list, and library staff through events such as a Mashed Library unconference planned for early July, with a cataloguing / open data theme.

The CLOCK project team will consist of:

Role Name Institution FTE/hours
Project manager Paul Stainthorp University of Lincoln 0.2FTE
Lead researcher Ed Chamberlain University of Cambridge 0.2FTE
Researcher Chris Leach University of Lincoln 0.1FTE
External consultant Owen Stephens Owen Stephens Consulting (set number of days)
Web developer Andrew Beeken University of Lincoln 0.2FTE
Web developer Trevor Jones University of Lincoln 0.2FTE
Project director Ian Snowley University of Lincoln (Uncosted)

Paul Stainthorp is Electronic Resources Librarian at the University of Lincoln. Here he will act as project manager and (jointly with Ed Chamberlain) researcher. Paul has several years’ experience of working with open metadata systems (repositories, journal article knowledgebases); he successfully project-managed the Jerome project. Here he will manage the project overall, produce reports and documentation for JISC, as well as leading the lit. review and user engagement workpackages.

Ed Chamberlain is Systems Development Librarian at Cambridge University Library. Ed will act as lead researcher / internal technical consultant and provide additional project guidance. He brings extensive experience of project management, library systems implementation, metadata publishing and open licensing. In addition to managing the COMET project, he was responsible for releasing and documenting Cambridge’s existing APIs to library services. As lead researcher for CLOCK, he will be primarily responsible for the technical standards & methods workpackages, and for guiding the work of the developer.

Chris Leach is Systems Librarian at the University of Lincoln. With more than 30 years experience in a range of technical library roles, Chris’s focus in CLOCK will be to support the analysis of existing and emerging library data standards, and to support the work of the developer.

Owen Stephens is a library consultant, and for this project will provide consultancy and advice to put the work into a national context, relating CLOCK to the wider movement toward open data and the work of the JISC Discovery initiative. Owen has a technical background in libraries with experience of service delivery and strategic planning. He has been responsible for a number of innovative projects at both institutional and national levels.

Andrew Beeken and Trevor Jones have been appointed (March 2012) as developers on CLOCK. They will act as lead programmer on the project, making use of iterative development tools as described above. They will also participate in the user requirements analysis and the review of existing data standards.

Ian Snowley, the University Librarian at the University of Lincoln, will act as project sponsor and director.

Projected Timeline, Workplan & Overall Project Methodology

The University of Lincoln has an established and rapidly-maturing agile, iterative, distributed approach to web development, supported by tools including Codeigniter, Github, Google Groups, Pivotal Tracker and WordPress – this methodology has serviced previous JISC-funded projects well and will again be employed here. Tools used will be exclusively web-based, allowing staff from Lincoln, Cambridge, JISC and elsewhere to participate.

The project will end 31 July 2011. Because of the iterative approach to development, there will be continual gathering, analysis and documentation of user/technical requirements throughout the project. Results will be disseminated via a project blog, community events, the Discovery newsletter, etc., and via more formal channels (e.g. journal articles in scholarly and trade publications for libraries) where appropriate.

High-level plan of workpackages:

Workpackage/Month Feb 2011 Mar 2011 Apr 2011 May 2011 Jun 2011 Jul 2011
1. Project initiation X
2. Community engagement X X X X X X
3. Literature review X X X
4. Gather user requirements X X X X X
5. Assess and describe existing sources of open data for harvesting X X X
6. Evaluation of technical standards & methods X X X
7. Technical development, testing and verification X X X X X
8. Documentation X X X X X X
9. Project evaluation X X
10. Dissemination X X X X X X
11. Project close X

Budget

We propose that the follow-on funding sought will be used to cover the time of the team members at Lincoln and Cambridge, and to fund part of the new post of web developer. In total, 45% of the funding sought will go on staff (incurred appointments and institutional staff allocations): this is appropriate for a project where a high level of expertise must be applied. Apart from on-costs and travel/expenses, the other significant expense is that of the consultancy work which is necessary to ensure a wider application and scope for the CLOCK project than was the case in Jerome or COMET, both being more rooted to their respective institutions.

Breakdown of the budget:

Projects Web Developer, 0.6 FTE 18.63%
Recruitment 0.68%
Equipment 3.02%
Travel 1.51%
Consultancy 4.98%
Directly Incurred Total 28.81%

Directly Allocated Staff* 22.99%
Estates (Lincoln) 6.04%
Estates (Cambridge) 1.03%
Directly Allocated Total 30.06%

Indirect Costs (Lincoln) 33.1%
Indirect Costs (Cambridge) 8.03%
Indirect Costs Total 41.13%

Total Project Cost £ 66,329.20 (100.0%)
Amount Requested from JISC £ 49,879.56 (75.2%)
Institutional Contributions £ 16,449.64 (24.8%)

Setting the time (some CLOCK project admin)

Posted on February 2nd, 2012 by Paul Stainthorp

Some notes from a phone chat with Andy McGregor (JISC Discovery programme manager) about CLOCK:

  1. Just as we did for Jerome, we’ll be using the CLOCK project blog for all reporting to JISC (as well as for blog posts about the work of the project itself):
    • List of required blog post headings here
  2. We also need to produce a Project Plan, based broadly on our original proposal:
    • Required headings for the Project Plan here
  3. (As project manager) I’ll also be emailing Andy once a month with a quick update on progress;
  4. There are nine other projects in the Discovery phase two programme, plus CLOCK:
    • List of projects here
    • There’s also a mailing list for the programme
  5. The next programme meeting will take place w/c 16 April 2012, in Birmingham:
    • List of programme meetings here
  6. As in phase one, consultants will be preparing case studies on the various projects (CLOCK included) for the benefit of the wider Discovery programme.

Related: we’re planning to hold our first project team meeting on 14 February 2012. To spread the burden of travel equally, we’re going to hold it in a location convenient for Lincoln, Cambridge and the West Midlands…

Peterborough

Discovery phase two: programme launch (slides)

Posted on February 2nd, 2012 by Paul Stainthorp

JISC formally launched phase two of the Information and library infrastructure: Resource discovery programme on 11 January 2012 in Birmingham. CLOCK weren’t able to attend in person, but we sent these slides in our absence. They’re good for a quick overview of the aims of the CLOCK project.