Posts Tagged ‘JISC’

CLOCK and a summary of 2 other Discovery projects

Posted on May 17th, 2012 by Paul Stainthorp

Ed Chamberlain, who is on the CLOCK project team as a researcher, is involved in two other projects under the Discovery strand: OEM-UK and Open Bibliography 2. We’re looking for ways in which CLOCK can re-use data, code, processes and ideas from these projects (and elsewhere) – also what CLOCK could offer in return.

Notes:

  • Open Biblio project over the last few years; aim to aggregate large amounts of bibliographic data for scientific discovery.
  • Data collected from Cambridge University, the BL, PubMed and held as RDF, used to power an open catalogue called “Bibliographica“.
  • Problems around scaling the data/system led to the current JISC-funded Open Biblio 2 project (in the meantime, Cambridge and the BL had started to publish their data openly).
  • Open Biblio 2 started looking at a NoSQL approach (CouchDB, Lucene/Solr) – eventually settling on Elastic Search.
  • The approach of Open Biblio is to build bottom-up, community tools: BibServer and BibSoup (“Like Wikimedia for bib data”). Raises interesting questions about data quality in an open community-driven system.
  • Also looking at JSON as lightweight way of sharing bib data: emerging BibJSON convention for representing bibliographic record as a JSON object (Ed wrote a MARC-to-BibJSON-parser in Perl). N.B. BibJSON is not a million miles away from the JSON that Jerome spits out! There are three hack days taking place next month in London to look specifically at BibJSON.
  • Open Biblio 2 is also looking at JSON-LD (JSON for Linking Data), a ‘real’ JSON standard which does a lot of the things that RDF does.

tl;dr = use their JSON standards and BibSoup as a data source.

  • The second project, OEM-UK (Open Education Metadata UK), based at the IoE in London, is focusing on cataloguing workflows.
  • Data from the IoE’s SirsiDynix catalogue, plus EPrints is drawn into a Drupal framework; forms to create data (autopopulation of forms); “cataloguing the Drupal way”.
    • Thought from Andrew Beeken: could we replicate this approach, using WordPress custom post types to store and display structured content? Shades of the OPACPress project which Joss Winn and I proposed—but that was not funded—several years ago.
  • Some evidence that this approach is capable of speeding up the cataloguing process considerably: the more data you put in the faster it gets! Ed has some screencapture videos from OEM-UK showing workflow, including grabbing data via Zotero.

td;dr = OEM-UK are also successfully disrupting cataloguing workflows.

CLOCK project implementation plan

Posted on April 23rd, 2012 by Paul Stainthorp

The University of Lincoln and Cambridge University Library both delivered successful projects (Jerome and COMET) for the JISC Infrastructure for Resource Discovery Programme in 2011. Both projects produced outputs of interest to researchers, students, librarians, developers, and designers of bibliographic discovery environments.

The CLOCK project is harnessing the success of these two complementary initiatives and investigating new approaches to data creation and discovery in the library domain. In particular, CLOCK will investigate, propose, and develop new, web-based bibliographic tools which will make it easier for different users—cataloguers in academic libraries, and the “serious”, tech-savvy researcher—to find Open Bibliographic Data and incorporate that data into systems and workflows.

This is the CLOCK project implementation plan:

Aims, Objectives and Final Output(s) of the project

The CLOCK project’s overall aim is to challenge assumptions and drive innovation in libraries’ interaction with bibliographic data. The project team believe that an important aspect of this innovation will be serious consideration given to the development of an awesome, national, open scholarly catalogue knowledgebase for the UK (“data.ac.uk/library” or “library.data.ac.uk”).

As a medium-term step towards this goal, CLOCK will explore options for updating and maintaining the shared platform on data.lincoln.ac.uk as an eventual service. Longer term maintenance of the Cambridge open data service will also be investigated.

The investigation will take an experimental approach, building upon the RDF encoded structured metadata released through the COMET project as a readily accessible resource for enrichment of data within the Jerome software environment. At this preliminary stage, four steps in record enrichment have been identified:

  1. Matching / negotiating of the best available Open bib data through common identifiers;
  2. The importance of a social/reputational aspect in identifying authoritative data;
  3. A process of harvesting a returned record (parts of a record) to be re-used;
  4. Enrichment, repair and cleansing of data in the knowledgebase (positive feedback loop).

By exploring aggregation, ‘data cleansing’ and enrichment through readily available open sources, it hope to highlight new distributed approaches to metadata production–cataloguing, storage and delivery, including minimal workflows for cataloguing around individual, disaggregarted RDF elements. The project will explore ways to do this using automated techniques built around open reusable metadata.

Whilst the two million records published under the COMET project will act as a starting point for this process, the participants aim to utilize other sources, including images, table of contents data and related supplementary resources (geo-data, author biographies, etc). Through this, there will be an additional social aspect of the project, to identify and document other authoritative open data sources to consume and to report back on successes and failures.

Alongside a focus on enrichment through open data, the project will recognise that a ‘pure open’ information environment is still far from the norm. It will also investigate methods in which open data can be consumed by semi-open and commercial resource discovery services and how such services may themselves benefit from open approaches to data publishing.

Final Project Products

Primary outputs:

  1. An enhanced Open Bibliographic Dataset containing records sourced from Jerome, COMET, and other open data sources, permissively licensed, delivered over fast API in a range of formats (e.g. MARC, RDF, JSON) as both whole records and disaggregated, Linked data: along with associated social/reputational metadata making explicit the provenance, history, and ‘pagerank’ measurements of each data element. All data and APIs produced will be published on data.lincoln.ac.uk and access will be maintained by the University of Lincoln on that platform for at least the next 3 years;
  2. A repository of Open Source software for gathering, manipulating and publishing such data plus public documentation for the APIs, clarifying in particular the utility of the ‘data cleansing’ and the social/reputational metadata in distributed cataloguing environments;
  3. A proposal for the continuation of the work of CLOCK toward the specific aim of establishing a distributed open scholarly catalogue knowledgebase for the UK (“data.ac.uk/library” or “library.data.ac.uk”).

Secondary outputs:

  1. User documentation: a formal clearly-documented user requirements analysis and evidence of user engagement (e.g. user ‘stories’);
  2. Contextual documentation: a published literature review;
  3. Technical documentation: an examination of relevant standards and processes for manipulating Open Bib Data, particularly via API, and a comparison-cum-synthesis of the parallel approaches to open data publishing taken by COMET and Jerome;
  4. Contributions toward the JISC Open Bibliographic Data guide, initially in the form of commentable public plans for implementation of the shared Lincoln-Cambridge datastore. These will be reviewed at regular intervals and will eventually build into guidance for other academic libraries on releasing data openly. An experimental focus will allow mistakes and development ‘wrong turns’ to be shared with a wider community;
  5. We will disseminate our work during and beyond the duration of the project. Progress will be communicated by regular blogging throughout the life of the project. Project members are active within the UK HE development and library communities. Through blogs, social networks and talks at events they will continue to act as champions for open data publishing, furthering the aims and objectives of the Discovery programme. In particular, we will showcase our work at relevant JISC workshops, and will produce public project documents according to the JISC branding guidelines, targeting specific and relevant audiences.

Wider Benefits to Sector & Achievements for Host Institution

Jerome provided a modern API-centric approach to open data services and discovery using NoSQL database technologies and Open Source search. COMET published over two million records under a Public Domain Data License, many of them available for query via an RDF-store/SPARQL endpoint. Tools and techniques to achieve this have also been release under a permissive software license.

The CLOCK project aims to scope and develop powerful and usable API-based web services which will make it easy to locate available Open Bibliographic Data for a given bibliographic work, These services will be aimed predominantly to meet the needs of:

  1. developers attached to academic libraries looking to build applications exploiting available Open Bibliographic Data, and techniques for interrogating and exploiting that data;
  2. cataloguers and library managers looking to innovate their resource description workflows as well as contribute to the corpus of Open Bib Data;
  3. the ‘serious’ and tech-savvy researcher, who may be keen to incorporate Open Bib Data in tools aimed at the user (discovery, citation/reference management software, repositories).

In addition:

  1. Students and staff at the University of Lincoln will benefit from a substantial increase in the size and ‘weight’ of Jerome/data.lincoln.ac.uk (“Quantity has its own quality”); from a refinement of the discovery interface; and from engagement with RDF/linked data;
  2. Cambridge University Library will benefit from aspects of the Jerome architecture (e.g. the use of schemaless databases and aggregated search indices), the practical re-use of its own data (N.B. through this consumption of its own output, important lessons on RDF utility can be learned and shared. This methodology has already pworked for CUL with its public facing API project), and from the ‘proving’ of existing approaches through agile distributed development sprints;
  3. To wider HE, the project will demonstrate the value of such data (and the development method) to universities and the wider community, enabling future developments. CLOCK is an opportunity to demonstrate ‘real world’ open web services for libraries, including [i] APIs to enhance existing free or commercial Discovery environments, [ii] the making-accessible of emerging sources of open metadata (the BL table of contents; the outputs from the Open Bibliography 2 project), (iii) a distributed ‘data cleansing’ model (articulated at the COMET-Jerome hack day in August 2011), a new more open approach to cataloguing–resource description, (iv) time and money savings for academic libraries through exploitation of the bibliographic commons and tools for engaging with it.
  4. Both institutions share a firm strategic commitment to open data publishing. It is the ambition of the project participants that any future major developments in national level resource discovery learn and benefit from the experiences gained in this project. JISC, Discovery, and the community at large will benefit from demonstrations of the above ‘real world’ discovery-enhancement tools (above), from a robust public discussion of the parallel technologies for storing and manipulating bib data – RDF store vs. schemaless approaches.

Risk Analysis and Success Plan

  1. The principal risk to the success of the project would be an inability to appoint a suitable person to the position of developer in time for the start of the project. The recruitment process for this post began in February and was completed in March 2012.
  2. A related risk is that the other members of the project team (Paul Stainthorp; Ed Chamberlain) are involved in other JISC-funded projects. The project manager (PS) will take care to ensure that—while the work of the various projects may complement CLOCK—there is clear distinction between the goals and outputs of the various projects. Weekly–fortnightly iteration meetings for the CLOCK project will help to ensure this, and Lincoln has established the LNCD group to co-ordinate the work of its overlapping commitments.
  3. As always, there is a risk that key staff may be absent through illness. We will mitigate against this through close collaboration via the web-based development tools, weekly–fortnightly iteration meetings, and periodic reviews of the project.

Risk Analysis (*overall risk = likelihood × severity):

Risk #

Likelihood 1-10

Severity 1-10

Overall risk 1-50*

1.

2

9

18

2.

3

4

12

3.

2

3

6

If the CLOCK project is a success, we anticipate it will have the following long-term effects (ETA up to one year after the end of the project):

  • Developers unconnected with Lincoln or Cambridge will exploit the APIs to build or enhance new open (and semi-open) bibliographic discovery services.
  • Academic libraries will incorporate Open Bib Data elements from CLOCK in their cataloguing–resource description workflows.
  • Serious researchers will use Open Bib Data elements from CLOCK in personal citation/reference management software.
  • A new social/reputational model of reputation in distributed cataloguing will have gained some traction in academic libraries.
  • Significant steps will have been taken toward a national, distributed open scholarly catalogue knowledgebase for the UK (“data.ac.uk/library” or “library.data.ac.uk”).

IPR

We have no objection to JISC making any part of this proposal available should the contents be requested under the Freedom of Information Act, or if we are successful in our bid for funding that our project proposal is made available on JISC’s website.

  1. Any additional bibliographic data or metadata created as a result of this project will be released under an open license that permits unrestricted re-use. Wherever possible, the Open Data Commons PDDL or CC0 will be used.
  2. All software outputs will be released under an appropriate Open Source licence (we will seek further advice from OSSWatch on the most appropriate licence).
  3. All documentation and blog posts will be released under the Creative Commons attribution share-alike licence, CC-BY-SA.

Project Team Relationships and End User Engagement

We intend to use the CLOCK blog (http://clock.blogs.lincoln.ac.uk/) to provide regular updates on the status of the project, and to provide links to working services and data. In addition to the JISC Discovery events, the developer community will be engaged through the growing data.ac.uk community and mailing list, and library staff through events such as a Mashed Library unconference planned for early July, with a cataloguing / open data theme.

The CLOCK project team will consist of:

Role Name Institution FTE/hours
Project manager Paul Stainthorp University of Lincoln 0.2FTE
Lead researcher Ed Chamberlain University of Cambridge 0.2FTE
Researcher Chris Leach University of Lincoln 0.1FTE
External consultant Owen Stephens Owen Stephens Consulting (set number of days)
Web developer Andrew Beeken University of Lincoln 0.2FTE
Web developer Trevor Jones University of Lincoln 0.2FTE
Project director Ian Snowley University of Lincoln (Uncosted)

Paul Stainthorp is Electronic Resources Librarian at the University of Lincoln. Here he will act as project manager and (jointly with Ed Chamberlain) researcher. Paul has several years’ experience of working with open metadata systems (repositories, journal article knowledgebases); he successfully project-managed the Jerome project. Here he will manage the project overall, produce reports and documentation for JISC, as well as leading the lit. review and user engagement workpackages.

Ed Chamberlain is Systems Development Librarian at Cambridge University Library. Ed will act as lead researcher / internal technical consultant and provide additional project guidance. He brings extensive experience of project management, library systems implementation, metadata publishing and open licensing. In addition to managing the COMET project, he was responsible for releasing and documenting Cambridge’s existing APIs to library services. As lead researcher for CLOCK, he will be primarily responsible for the technical standards & methods workpackages, and for guiding the work of the developer.

Chris Leach is Systems Librarian at the University of Lincoln. With more than 30 years experience in a range of technical library roles, Chris’s focus in CLOCK will be to support the analysis of existing and emerging library data standards, and to support the work of the developer.

Owen Stephens is a library consultant, and for this project will provide consultancy and advice to put the work into a national context, relating CLOCK to the wider movement toward open data and the work of the JISC Discovery initiative. Owen has a technical background in libraries with experience of service delivery and strategic planning. He has been responsible for a number of innovative projects at both institutional and national levels.

Andrew Beeken and Trevor Jones have been appointed (March 2012) as developers on CLOCK. They will act as lead programmer on the project, making use of iterative development tools as described above. They will also participate in the user requirements analysis and the review of existing data standards.

Ian Snowley, the University Librarian at the University of Lincoln, will act as project sponsor and director.

Projected Timeline, Workplan & Overall Project Methodology

The University of Lincoln has an established and rapidly-maturing agile, iterative, distributed approach to web development, supported by tools including Codeigniter, Github, Google Groups, Pivotal Tracker and WordPress – this methodology has serviced previous JISC-funded projects well and will again be employed here. Tools used will be exclusively web-based, allowing staff from Lincoln, Cambridge, JISC and elsewhere to participate.

The project will end 31 July 2011. Because of the iterative approach to development, there will be continual gathering, analysis and documentation of user/technical requirements throughout the project. Results will be disseminated via a project blog, community events, the Discovery newsletter, etc., and via more formal channels (e.g. journal articles in scholarly and trade publications for libraries) where appropriate.

High-level plan of workpackages:

Workpackage/Month Feb 2011 Mar 2011 Apr 2011 May 2011 Jun 2011 Jul 2011
1. Project initiation X
2. Community engagement X X X X X X
3. Literature review X X X
4. Gather user requirements X X X X X
5. Assess and describe existing sources of open data for harvesting X X X
6. Evaluation of technical standards & methods X X X
7. Technical development, testing and verification X X X X X
8. Documentation X X X X X X
9. Project evaluation X X
10. Dissemination X X X X X X
11. Project close X

Budget

We propose that the follow-on funding sought will be used to cover the time of the team members at Lincoln and Cambridge, and to fund part of the new post of web developer. In total, 45% of the funding sought will go on staff (incurred appointments and institutional staff allocations): this is appropriate for a project where a high level of expertise must be applied. Apart from on-costs and travel/expenses, the other significant expense is that of the consultancy work which is necessary to ensure a wider application and scope for the CLOCK project than was the case in Jerome or COMET, both being more rooted to their respective institutions.

Breakdown of the budget:

Projects Web Developer, 0.6 FTE 18.63%
Recruitment 0.68%
Equipment 3.02%
Travel 1.51%
Consultancy 4.98%
Directly Incurred Total 28.81%

Directly Allocated Staff* 22.99%
Estates (Lincoln) 6.04%
Estates (Cambridge) 1.03%
Directly Allocated Total 30.06%

Indirect Costs (Lincoln) 33.1%
Indirect Costs (Cambridge) 8.03%
Indirect Costs Total 41.13%

Total Project Cost £ 66,329.20 (100.0%)
Amount Requested from JISC £ 49,879.56 (75.2%)
Institutional Contributions £ 16,449.64 (24.8%)

Latest steps in defining the business case for resource discovery

Posted on April 12th, 2012 by Paul Stainthorp

A quote from me on p.1 of the latest Discovery newsletter (April 2012), after the recent ‘Better Resource Discovery – Is there a business case?‘ workshop.

A similar user-centred approach is the driver at University of Lincoln, as explained by Paul Stainthorp, Electronic Resources Librarian: “We see the role of the student as collaborator in the production of knowledge and focus on improving the student experience through their active engagement.” Lincoln has an open development group which is committed to exposing library and other institutional data through APIs as the basis for agile and innovative development. “We are starting to see the results in terms of new cataloguing workflows, knowledge share, staff development, and new partnerships.”

Clunky business: making an institutional case for library discovery

Posted on March 23rd, 2012 by Paul Stainthorp

I’m in London today for a workshop bringing together people from the JISC/SCONUL Discovery initiative and the JISC/SCONUL shared services programme, on this topic:

‘Better Resource Discovery – Is there a business case?’

An exploratory workshop to identify business cases for new modes of resource discovery based on real service drivers

David Kay from SERO asked me to talk for 10 minutes on Lincoln’s approach to open bibliographic data and open discovery; what our ‘business case’ might be; what we’re doing to put it into practice; and how the institution might become aware of and judge our success.

‘Business case’ isn’t a phase that falls from my lips naturally… however: I am uneasy about our getting too comfortable under a protective ‘shield’ of (mainly JISC-funded) project-based development: it may protect us from a large amount of flak, and provides us with an enviable amount of freedom, but by definition it’s only there temporarily.

We need to build ourselves some new shields – perhaps ones less impregnable, but ones that are more persistent and less easily dissipated once projects ends. And to do that, we need to create more useful, disruptive institutional services like this one. Our business case is ‘being a university‘; how do we create convincing applications from open tools and data that further that business?

My slides are on Google Docs.

UKCoRR members’ meeting, University of Portsmouth, 27 Jan 2012

Posted on February 2nd, 2012 by Paul Stainthorp

Four boatsHere are some notes on the first event held for UKCoRR members this year:

As you probably know, UKCoRR is an entirely unfunded organisation which relies heavily on the time and energy of its members, and on the generosity of universities to host our meetings – on this occasion our heartfelt thanks to the University of Portsmouth Library, and particularly to Andy Barrow and (associate university librarian) Ken Dick, for very kindly putting us up and keeping us fed and coffee-ed, and for Ken’s warm welcome at the start of the meeting.

This was a very well-attended event: nearly 50 UKCoRR members and invited guests, from as far afield as Edinburgh (350+ miles away)… and a packed schedule. So packed, in fact, that we probably didn’t leave enough breathing space. We’ll build in more rest breaks and time for gossip professional networking at the next meeting!

  1. Slides from all the presentations below will shortly be made available on UKCoRR’s slideshare account, at: slideshare.net/ukcorr
  2. Some of the speakers kindly agreed to be filmed, and videos will be made available at: youtube.com/user/ukcorr

After Ken had welcomed us to Portstmouth, UKCoRR chair Gaz Johnson gave the first presentation of the day, with a science fiction gloss and a look at the possible future directions of UKCoRR. Gaz has already blogged about his talk. A few key points and questions:

  • The committee needs to consult with members, and these members’ meetings are a good way of doing that!
  • Our priorities (validated by the user survey, 2011) should be best practice exchange, lobbying, and advocacy;
  • Is our lack of a membership fee our USP? It means we’re beholden to no-one, we don’t have to serve anyone’s agenda (other than our members’), and it makes it easier to avoid conflicts of interest…
  • …but it’s worth considering what we could do differently if we were funded;
  • Should membership of UKCoRR bring with it certain responsibilities?
  • Aren’t repositories generally understaffed in the UK?

Next up, Andrew Dorward of EDINA on the UK RepositoryNet+ project to build “a socio-technical infrastructure to support repositories”. Andrew gave an overview of the original RepositoryNet project, and the ongoing aim to build shared services for repositories. Recently, the new project interviewed a range of UKCoRR members, Open Access publishers, members of ARMA, and active researchers about the repository landscape — broadly, those interviews validated the current approach to services — but Andrew noted that in repository “ecology“, there is some room for drawing together the range of services (search, deposit statistics, etc.) into fewer but more comprehensive tools. He also talked about the growth in OA publishing since the launch of PLoS in 2003: see doi:10.1371/journal.pbio.1001235.t001

Last up before lunch, Marie-Therese Gramstadt from the University of the Creative Arts gave us an update on the Kultivate project, the advocacy and decision-making toolkits, and the associated Kultur II group, sharing best practice in repository design for creative and visual arts research. Asked to show hands, about half the UKCoRR delegates had arts researchers ‘at home’ – about the same number of people also expressed an interest in continuing the work of Kultur II. Some Kultivate links:

After lunch – the lightning talks!

  • Talking about a new strategic marketing project for WRAP (the University of Warwick’s repository) – Yvonne Budden explained the need to revamp the repo’s image, and how WRAP piggybacked on a wider redesign project at Warwick and used an interesting methodology from the Kay Grieves at the University of Sunderland, summarised as: (1) Match services to users (2) Transform services into benefits (3) Translate benefits into messages! Freebie materials (highlighter pens, etc.) are being used as bribes to encourage depositors to take the message of the repo back to their colleagues. A really striking new black-and-yellow colour scheme!
  • Matthew Smith from the University of So’ton, on the EPrints Shelves project. Building a tool to give users more control over how results from their repository are displayed on author profile pages, etc., by allowing people to log in and add/remove items from a ‘shelf’. Those ‘shelves’ can then be exported using normal EPrints export tools. Shelves should be released to the EPrints Bazaar soon. Lots of interest in the room about this plugin!
  • Tracey Kent on the use of a “request a copy” for e-theses at the University of Birmingham. Birmingham offer four options for access to e-theses: from [1] “full OA” through to [2] “request a copy” (with theses available through EThOS), [3] a more limited request (excerpts only; not on EThOS), and finally [4] fully-embargoed theses. They went from around 2,500 thesis requests per year to more than 250,000 requests/yr., with ~88% on some kind of Open Access (options [1] or [2]).
  • Margaret Feetham of Southampton Solent University talked about running their mixed-economy repository (research, student work, university publications) …with (very familiar to UKCoRR members!) little budget and few staff. SSU practice unmediated deposit, with academics given training on  copyright and licensing issues. Margaret explained how they’ve still managed to get an impressive deposit rate by engaging keen users and advocates, and by working with the university’s research services – with REF2014 as an attention-focuser!
  • From the STFC (Science and Technology Facilities Council), Catherine Jones explained how they are using CrossRef to create large numbers of (metadata-only) records in epubs.stfc.ac.uk – scientific authors like the ability to use that repository’s quick & easy DOI import tool to deposit records, but are now pressing to be able to speed the process up even further. Challenges of recording articles with hundreds or even thousands of collaborators – not uncommon in some areas of physics!

A quick breather, then straight on to the first of two invited speakers to wind the day up:

Sarah Gould of the British Library on some of the changes in the pipeline for the EThOS service. There’s general recognition that some of the features of EThOS (e.g. the “checkout” process for supplying PDF copies of theses) are a bit old hat, and too rooted in old document supply processes. The limited metadata applied to many items in EThOS is also a barrier. EThOS are engaging a new development to drag the service kicking and screaming into the 21st century, and are also engaging on a big programme (working with the BL’s library systems vendors as well as with panels of librarians) to improve the quality and range of metadata. There was an interesting discussion at this point about the possibility of EThOS linking to copies of theses in institutional repositories, rather than/as well as holding digitised copies – what might that mean for the responsibilities of the BL and institutions to ensure preservation of access?

Bravely accepting the final slot of the day, Phil Barker of JISC CETIS on the world of Open Educational Resources (OERs). Another show of hands: fewer than 25% of UKCoRR members in the room have involvement with OERs (either through projects, or through working institutional OER repos). That’s not too much of a surprise: the issues involved in storing and managing repositories of OERs can be much more complex (multiple complex objects, quality control, metadata requirements, copyright and licensed re-use, the sheer number of people involved!) and many institutions have shyed away.

Phil talked about some of the motivators for universities to engage with OER, including the morals obligation of the university (“…charter to widen knowledge”), the role of OERs in marketing universities / acting as a shop window / leading to student recruitment, and the hope that the rigorous approach needed in creating of OERs will provide a beneficial ‘trickle down’ effect into the design and management of all educational materials. Some food-for-though OER links:

As always, there was a breathtaking amount of ‘stuff’ for us to get stuck into — useful advice, supportive discussions, and news of exciting work going on — and the recognised benefit of UKCoRR members’ meetings as being a refreshingly practical, non-threatening and safe place for repository staff to talk to people faced with the same problems every day. Keep your eyes peeled for the next couple of UKCoRR events planned for this year: looks like 2012′s going to be one of our busiest yet.

Discovery phase two: programme launch (slides)

Posted on February 2nd, 2012 by Paul Stainthorp

JISC formally launched phase two of the Information and library infrastructure: Resource discovery programme on 11 January 2012 in Birmingham. CLOCK weren’t able to attend in person, but we sent these slides in our absence. They’re good for a quick overview of the aims of the CLOCK project.

KB+ project Technical Advisory Group (TAG)

Posted on January 31st, 2012 by Paul Stainthorp

……aaand just as an adjunct to my last blog post, it’s worth mentioning that I’m currently serving [time] on the TAG (Technical Advisory Group) for the JISC Knowledge Base+ (KB+) project. We had our first meeting on 19 December 2011 at HEFCE’s offices in central London.

Over the course of 2011-2012 HEFCE will be investing £600,000 in the creation of a shared service knowledge base for UK academic libraries to support the management of e-resources by the UK academic community.

This is my idea of a worthy cause—e-journal knowledgebase problems being a particular favourite of mine—and I’m pleased HEFCE and JISC Collections have decided it’s worth investing in a serious and robust attempt to share information between universities and to build better systems for managing e-resources. I’m happy to be involved.

Worth reading = KB+: What’s in it for libraries?

  • Improved Data and Tools
  • Enhanced JISC Services
  • Improving ERM systems
  • Shared Community Activity
For the untainted by ERM jargon, Wikipedia explains as well as anywhere what a knowledgebase actually is and what some of the challenges are. The University of Lincoln’s e-journals knowledgebase is the EBSCO A-to-Z. Also related is the work of the UKSG/NISO Knowledge Bases And Related Tools (KBART) working group.

A pain in the midlands: JISC/SCONUL future of library systems workshop

Posted on January 31st, 2012 by Paul Stainthorp

London Midland 153, very smart

In January I made the long train journey over to the University of Warwick, to attend and speak at the first day of a two-day JISC/SCONUL workshop exploring the future of library systems, under the banner of the “Squeezed Middle” – that is the LMS & other library systems, the bits of library infrastructure often overshadowed/squeezed out of the limelight by the twin heavyweights of Discovery & ERM.

Carrying on from the work done as part of the JISC/SCONUL Shared Services ‘LMS horizon scan‘ in 2008, this workshop points the way toward a new JISC call for ‘path finder’ projects addressing the future of LMSes, under the Information and Library Infrastructure: Emerging Opportunities programme: “you can’t do nothing any more”.

Thank you to Ben Showers of JISC for the invitation to speak at this event!

First, we were treated to a bit of virtual Lorcan Dempsey. In a video talk, he spoke about the trends facing academic libraries (a background of budget constraints, networked decentralisation of content vs. our tradition of vertically integrating services into the one building), and how libraries are re-examining our priorities under pressure, building more flexible spaces, making our expertise more visible, engaging with the network, etc.. Lorcan’s video will be made available via OCLC’s YouTube channel shortly.

Then to the bit of the workshop in which I was involved: a series of ‘provocations‘: radical, challenging visions for the future of library systems (by, say, the year 2020), designed to get the attendees thinking. David Kay of SERO, Ken Chad, and Paul Walk provided the other three visions.

I found it a struggle knowing quite where to ‘pitch’ my vision: it can be difficult to be provocative/radical enough without sounding like you don’t know what talking about. For possibly only the second time in my career I was careful to prefix my statement with “…this isn’t my employer’s opinion!”. I took quite a broad, scattergun approach (figuring if I was broad enough, I’d be bound to hit something…); for that reason I was pleased that some of my themes were echoed in Paul Walk’s Marshall Smith-esque sf/dystopian view of libraries in 2020, which he delivered through the “medium of fiction and the genre of bonkers”.

You can read my own provocation statement, “A vision for library systems in 2020“, on Google Docs.

Links to other blog posts about this event are here, here and here.


Tick tock we don’t stop. Introducing CLOCK, a new JISC-funded resource discovery project at the universities of Lincoln and Cambridge

Posted on December 10th, 2011 by Paul Stainthorp

Cambridge CLOCKThe title says it all, really. The University of Lincoln, working in consortium with Cambridge University Library and Owen Stephens Consulting, has been awarded £49,877 by JISC to investigate ways of driving innovation in libraries’ interactions with Open Bibliographic Data, through a project we’re calling CLOCK (Cambridge-Lincoln Open Catalogue Knowledgebase).

CLOCK is a continuation of and elaboration upon the work of two recent JISC Discovery projects—Jerome at the University of Lincoln and COMET at the University of Cambridge—via a programme of development work shared between the two institutions, and with library consultant Owen Stephens. JISC were impressed enough with the work of both projects, and sufficiently interested in the potential for collaboration, that they encouraged our joint bid for follow-up funding.

Between now and the end of July, 2012, the CLOCK project will provide us with a framework to:

…[1] exploit through real-world applications the significant amount of data released openly by Cambridge University Library; [2] apply the Jerome database architecture, iterative development methodology, and API framework to a bibliographic dataset an order of magnitude greater than the University of Lincoln’s; and [3] to build and enable a new set of tools and demonstrator services which will enable the future development of public Open Bib Data web applications of practical utility to libraries and end-users.

You can read the full bid document, here.

I’m very much looking forward to working with Ed Chamberlain, Systems Librarian in the University Library at the University of Cambridge, along with Owen Stephens, veteran of a number of campaigns to open up access to library data, and Chris Leach (Systems Librarian) and Ian Snowley (University Librarian) from the University of Lincoln. Thanks are due to all of them for their help in writing the successful bid; to the Research & Enterprise Development office at Lincoln for their invaluable assistance in putting together the project budget; and to the LNCD group at the University of Lincoln for providing the kind of supportive development platform that makes these kind of projects possible.

Finally, a big thank you to Andy McGregor and the JISC Digital Infrastructure: Information and library infrastructure: Resource discovery programme, for this opportunity to further explore the blossoming environment of open bibliographic data/open discovery in libraries. If you haven’t done so already, you might like to take a look at the following websites:

As with all our projects, we’ll be blogging it comprehensively (so stand by for a steady stream of awful clock-related puns used as blog post titles). Although there’s little to see there yet, the CLOCK project blog is at: http://clock.blogs.lincoln.ac.uk/ – along with its own RSS feed RSS feed icon. Watch that space!

#jiscmrd programme launch; day 1 – DCC tools workshop

Posted on December 1st, 2011 by Paul Stainthorp

This week sees the formal two-day launch event for the JISC Managing Research Data programme 2011–2013 (the programme which is funding Orbital). It’s being held in the National College for School Leadership, next to the University of Nottingham’s Jubilee Campus.

Unfortunately, after schlepping it from the furthest fringes of Lincolnshire (and then having to go back home for the evening), I was only able to attend a couple of hours of day 1. But it was worth it.

I arrived just in time for a workshop about a number of research data management tools developed/provided by the Digital Curation Centre (DCC). Dr Mansur Darlington, who’s acting as external assessor/consultant to the Orbital project, was also in this workshop and contributed greatly to the discussions. (My Orbital colleagues Joss Winn and Nick Jackson attended the [parallel] workshop on various JANET, Eduserv and UMF SaaS/cloud storage services.)

Slides from this workshop will be posted online. When they’re available I’ll link to them here.

The tools being discussed were:

1. DAF – the Data Asset Framework (www.data-audit.eu)

  • A methodology for identifying gaps in an institution’s data management practices; designed to help institutions ‘clarify their thinking’ around how they manage research data.
    • N.B. We are already planning to use this methodology within the user requirements analysis workpackage of the Orbital project.
  • DAF arose out of recommendations made in the JISC/UKOLN Dealing with Data report (2007): initially the Data Audit Framework, the name was changed because ‘Audit’ was felt to be off-putting, and not an accurate reflection of what DAF is for – now DAF = Data Asset Framework.
  • “It’s worth looking at the four DAF pilot implementation projects” (carried out in 2008), because there’s likely to be one that has subject-relevance to your #jiscmrd project. The pilot projects found that most HEIs were at a very early stage (lack of RDM infrastructure; an emphasis on needs-scoping).
    • (N.B. the ERIM project at the University of Bath [engineering] used DAF but found it rather daunting and “stopped halfway down the page”(!): since then it has been condensed from a 60-page handbook into a shorter implementation guide. However the Dublin Core-based metadata requirements for datasets in DAF are still rather complex – one suggestion is to “ask fewer questions about more things”: the University of Northampton did something like this; running their own tailored ‘mini-DAF’: broadly following the DAF methodology, but tweaking it to meet their own end and the available resources.)
  • Key points:
    • Speak to lots of people in as many different roles as possible.
    • Use a variety of datagathering techniques (desk research, questionnaires, shadowing researchers, etc.)
    • Ask the DCC for tips!

2. CARDIO (cardio.dcc.ac.uk)

  • A freely-available benchmarking tool, designed to help institutions assess strengths and weaknesses in their RDM infrastructure. Developed out of the IDMP: Integrated Data Management Planning toolkit and support project.
  • Based on a ‘three legged stool’ model’; i.e. a successful RDM infrastructure will be based on three stable ‘legs’: technical infrastructure, appropriate resources (e.g. staff & skills), and commitment from the institution. An imbalance in any of these ‘legs’ leads to unstable RDM. The tool helps institutions to identify short ‘legs’ and plan to improve them. Identifying these imbalances can also be helpful in providing evidence to your institution that further investment needs to be made in a particular area.
  • CARDIO is still effectively in beta, with some tweaks still to make (and perhaps a lack of documentation?) – however some institutions have already found it useful.
  • How it works… a co-ordinator registers with the system and initiates the CARDIO assessments. (“If the scale and nature of your research data holdings isn’t known, run a DAF assessment first.”) CARDIO emails participants and asks them to rate a series of statements relating to their institution’s RDM infrastructure. Only once someone has entered their own ratings are they able to view what other people have put. Takes from 30-60 minutes for a full assessment, though it is possible to target shorter sets of questions at particular groups. CARDIO then automatically generates a [customisable] PDF report complete with charts/visualisations of the data.
  • A shorther, nine-question ‘mini-CARDIO’ is also available: see the latest issue of JISC Inform.

3. DMP Online (dmponline.dcc.ac.uk)

  • A practical, browser-based tool which allows researchers to create and store Data Management Plans (DMPs) for research projects – increasingly, research funders explicity require a DMP (e.g. the Wellcome Trust’s policy on data management).
  • Funder- and institution-specific guidance is provided through the website, along with help (“pointers”) on filling in a DMP. Completed plans can be exported in a number of formats.
  • Researchers may also be interested in the JISC guidance document, How to develop a Data Management and Sharing Plan – complementary to DMP Online.
  • The impression I get is that DMP Online is a tool which will be of practical, day-to-day utility to researchers/groups engaged in funded projects (and to the research offices that support them), whereas the other two tools (DAF/CARDIO) are perhaps aimed more at institutions starting out on the road to developing institutional RDM policies & systems, and/or looking to improve on current practice.
  • Some interesting discussions in the workshop:
    • Can DMP Online be ‘scaled up’ to work at the level of the institution, rather than the individual researcher? (A couple of projects—at UCL and Oxford—are already looking at extending the toolkit to form a more institutional service.)
    • If DMP Online (or other similar tools) make it easier for academics to routinely create DMPs by copying/pasting boilerplate text, is there a danger that writing a DMP becomes a box-ticking exercise (less meaningful/less useful for funders if less consideration given by the researcher)?
    • “Who is qualified to peer-review DMPs!?”

More information and help on using all three of these tools can be got by emailing: info@dcc.ac.uk

Then: a cup of tea, a quick catch-up with some colleagues, and to the road/rails again. I’ll be back tomorrow for day 2.