Posts Tagged ‘projects’

This is the end (again) – “final” CLOCK project blog post

Posted on August 1st, 2012 by Paul Stainthorp

[A commentable version of this blog post can be found on the CLOCK project blog.]

Here is a summary of the work undertaken as part of the CLOCK project, February-July 2012. As in previous projects, I don’t expect that this will really be the end of this work. We intend to carry on developing tools for publishing, consuming, and playing with Open Bibliographic Data at the University of Lincoln (via LNCD or elsewhere), and I expect us to take CLOCK ideas, data and code further over the coming months.

Thanks to all the people who have contributed to the CLOCK project.

1. Outputs: what has the project produced?

Our major tangible output is a library of code for interrogating and working with multiple, distributed sources of open bib data.

  1. The PHP source code for all of the prototypes produced by the CLOCK project is available from a GitHub repository - https://github.com/lncd/clock. This is an open, public, “live”, working version of the code which may well be developed and improved over time. (A ‘snapshot’ of the code as it existed on 27 July 2012 has been archived here.) The public code is available to re-use under a GNU Affero General Public Licence. All of the software prototypes have been detailed in the following posts:
  2. A series of modified cognitive interviews were conducted with several university library cataloguers at Cambridge University Library and the University of Lincoln. The interviews have been summarised here and here. We captured narrated screencapture videos for each of the interviews: these are being held in a private repository and could be mined further investigated for similar projects.
  3. A number of further blog posts discussing:
  4. While it is not yet publicly accessible, we have taken significant steps toward the establishing apermanent data.lincoln.ac.uk service at the University of Lincoln. We expect that by late August 2012 this service will be operational and will provide a gateway to the entirety of the bibliographic data and enhancement/manipulation tools produced by CLOCK and its predecessor project (Jerome).

2. Lessons learned

CLOCK reinforced to us that the constraints of a six-month project (effectively 4½ months of development time) are difficult to reconcile with the needs and possibilities for constant experimentation and development around bib data. We also identified the following points:

  1. The CLOCK model (of presenting different versions of the same bibliographic element for recombination) is feasible and can be modelled in software—albeit with a limited number of sources—even in a limited time; it has potential as a real-world tool for use in libraries. A writeup of this approach can be found here: CLOCK – the localized index model
  2. Software developers and librarians need to be aware of, and realistic about, the limitations of particular bib data formats. For example, real-time querying of RDFcan be very resource intensive. Be clear about your needs up front. Not every shiny open data format is right for every shiny open data project!
    • Related: synchronous querying of SPARQL endpoints is not the way forward! (We characterised this blind alley as “federated searching for the 21st century”…)
  3. We are a long way away from consistency of approach in the exchange of non-MARC bibliographic information. Every data source we approached in CLOCK required that we develop a different tactic from first principles to query, retrieve and manipulate/execute functions. Our blog posts on the idea of a ‘universal translator‘ explore this problem further.
  4. Available data is often poorly and confusingly documented. We were regularly frustrated merely in understanding what a data source contained and how it was structured. I (PS) have argued throughout that without some form of centralised registry/gateway (“data.ac.uk/library“) – whether it be managed through external curation or self-submission to a rigorous documentation standard, developers will waste time repeatedly having to interpret the same data again and again. We believe that a national bib data portal would be a great help to centrally index data sources and catalogue their respective schemas. We understand that not everyone agrees with this approach (“who pays?”) and we invite discussion of the alternatives!
  5. And a couple of practical ‘meta’ lessons about running agile software development projects in libraries:
    • Working with a number of developers based in different locations is hard. On a project like this it would be ideal to have a central location that could be used as a development hub. Other projects under the LNCD umbrella have found the same.
    • There is the risk of losing ground already made if the outputs of previous projects (e.g. Jerome) are not correctly maintained and curated. This can lead to significant delays when previous work is not available. More rigorous use of development tools—including GitHub and Orbital—is helping to mitigate against this in future.

3. Opportunities and possibilities

We have identified possible continuations and extensions of the CLOCK work; these are detailed further below.

  1. Continuation of development of the CLOCK software as a tool for (a) faster querying of distributed bib data sources through local indexing of key fields, (b) translating disparate data formats into a common translation standard, (c) meaningfully presenting distributed bib data to the user, (d) allowing a ‘cataloguer’ to select and recombine bibliographic elements to create a new record, which (e) feeds back into a new original data source, making the process iterative, and (f) incorporates social and reputational components in a user’s selection between alternative data elements for the same resource.
  2. Discussion of the business case to libraries. What value does an alternative resource description model offer, and what would libraries gain through relinquishing some of the control over institutionally-owned catalogue data? Related: could we quantify and qualify the time spent on particular cataloguing / discovery activities using traditional LMSs and demonstrate possible efficiencies or increased quality in a new, distributed model? What arguments for the incorporation of open bib data in cataloguing will convince library managers to replace current practice?
  3. A more thorough examination of all the application functions that a cataloguer relies upon through more extensive cognitive interviews and/or functional mapping processes with cataloguers at a range of institutions.
  4. Further investigation of best practice in documenting and describing our own published bib data in JSON/RDF. [Being picked up as part of the data.lincoln.ac.uk work, above]
  5. We also intend to submit for publication articles on a number of topics arising from CLOCK. Suitable journals and conferences have been identified and articles are being prepared on the following broad topics:
    • The approach of the CLOCK project in developing software for working with multiple, distributed sources of open bib data.
    • Expectations of truth and ‘trust’ in bibliographic data (“How has this assertation been derived about a work?”)
    • The potential of new models of resource description to save time and effort for libraries; techniques for analysing the efficiency of search and cataloguing workflows.

Record of the A-to-Z as it looked June 2012

Posted on June 28th, 2012 by Paul Stainthorp

We’re about to embark on a project to substantially re-design and completely re-populate, from scratch, our electronic journals A-to-Z website along with our link resolver. This is being done to bring it up to date and make it ready for ‘Find it at Lincoln‘.

For posterity, here are some screenshots of what the site looked like before we began our clean sweep.

A-to-Z home page

A-to-Z home page

A-to-Z titles page

A-to-Z titles page

A-to-Z subjects page

A-to-Z subjects page

A-to-Z packages page

A-to-Z packages page

A-to-Z advanced search page

A-to-Z advanced search page

A-to-Z article finder page

A-to-Z article finder page

A-to-Z help page

A-to-Z help page

Example of a page in our link resolver

Example of a page in our link resolver

Notes on Orbital v0.2.1

Posted on June 19th, 2012 by Paul Stainthorp

A few notes on some of the new features in the latest version of Orbital: these were presented to Dr Bingo Wing-Kuen Ling on 15 June 2012.

  1. ‘Your Projects’ now includes an Activity Timeline of comments and file changes aggregated across all projects in Orbital; each project page also displays a timeline for that project.
    Screenshot of the Orbital timeline
  2. Files from the File Archives can be organised using Collections (which are ‘tag-like’ rather than ‘folder-like’: i.e. a file can belong to more than one Collection).
    Screenshot of Orbital project
  3. You can now edit project information and add new members to a Project. To do this, go to the Project within Orbital, click on the ‘edit’ button, and scroll down to Project Members.
    Screenshot of the Orbital project page
    Screenshot of the Orbital add members section
  4. Finally, a bug which was preventing the upload of files using Internet Explorer has now been fixed.

Orbital notes, 24 May 2012

Posted on May 24th, 2012 by Paul Stainthorp

The Orbital project team met today (24 May 2012) and agreed the following:

  • Documentation
    • User documentation will focus on the “why”s of Research Data Management, rather than being a point-and-click guide to the Orbital UI (which should not require detailed explanations).
    • JW will create a changelog (human readable text file) for each major release of Orbital, so that documentation for each feature is review if that feature is updated.
    • PS will lead on writing documentation (as HTML pages, stored in the GitHub repository), with documentation for release v0.N completed and available by the launch of v0.N+1
    • PS will email colleagues from the Library and Research/Enterprise for assistance on writing documentation.
  • Training
    • JW will invite Melanie Bullock and David Sheppard on to the Orbital working group. He is meeting Annalisa Jones to discuss RDM training for staff.
  • Releases/development
    • Orbital v0.1.1 (including bug fixes) met all of the initial ‘minimum viable product‘ requirements specified by Dr Tom Duckett, and also includes the basics of project administration.
    • v0.2 will include improvements to the file upload/management, project management, and license management interfaces, as well as clearer distinction between language files and operating code.
    • NJ demoed the current version of Orbital to Siemens staff. He now has access to Siemens machine data for testing within Orbital.
    • The group discussed the LNCD plans for internal servers/private cloud, and about the disk space requirements and costs.
  • Integration
    • The current version of the DMPOnline tool has been installed on a test server. The group discussed our approach to integration between external tools/software (such as DMPOnline, R, Gephi) and Orbital.
    • NJ is going to email Adrian Richardson at the DCC to ask when the DMPOnline APIs will become available.
  • RDM policy
    • JW presented the draft policy to the University RIEC committee. The committee have been asked to send comments to Joss. (One comment at the committee meeting was that our having a policy too geared around the requirements of the Research Councils may not be appropriate for Lincoln, which generates a lot of non-RC income. However it was noted that the good practice specified by the RCs is good practice for management of all research data, whatever the funding source.)
  • Conferences and meetings
  • Data Asset Framework survey
    • The group discussed the recent DAF survey which we conducted at the University of Lincoln.
    • JW will convene a sub-group to consider the responses in detail, and plan follow-up interviews.
  • Business case
    • JW is currently gathering costs for long-term data storage. This will form the first strand of the Orbital business case, which will be presented to University SMT (along with the agreed RDM policy) in September 2012.

CLOCK and a summary of 2 other Discovery projects

Posted on May 17th, 2012 by Paul Stainthorp

Ed Chamberlain, who is on the CLOCK project team as a researcher, is involved in two other projects under the Discovery strand: OEM-UK and Open Bibliography 2. We’re looking for ways in which CLOCK can re-use data, code, processes and ideas from these projects (and elsewhere) – also what CLOCK could offer in return.

Notes:

  • Open Biblio project over the last few years; aim to aggregate large amounts of bibliographic data for scientific discovery.
  • Data collected from Cambridge University, the BL, PubMed and held as RDF, used to power an open catalogue called “Bibliographica“.
  • Problems around scaling the data/system led to the current JISC-funded Open Biblio 2 project (in the meantime, Cambridge and the BL had started to publish their data openly).
  • Open Biblio 2 started looking at a NoSQL approach (CouchDB, Lucene/Solr) – eventually settling on Elastic Search.
  • The approach of Open Biblio is to build bottom-up, community tools: BibServer and BibSoup (“Like Wikimedia for bib data”). Raises interesting questions about data quality in an open community-driven system.
  • Also looking at JSON as lightweight way of sharing bib data: emerging BibJSON convention for representing bibliographic record as a JSON object (Ed wrote a MARC-to-BibJSON-parser in Perl). N.B. BibJSON is not a million miles away from the JSON that Jerome spits out! There are three hack days taking place next month in London to look specifically at BibJSON.
  • Open Biblio 2 is also looking at JSON-LD (JSON for Linking Data), a ‘real’ JSON standard which does a lot of the things that RDF does.

tl;dr = use their JSON standards and BibSoup as a data source.

  • The second project, OEM-UK (Open Education Metadata UK), based at the IoE in London, is focusing on cataloguing workflows.
  • Data from the IoE’s SirsiDynix catalogue, plus EPrints is drawn into a Drupal framework; forms to create data (autopopulation of forms); “cataloguing the Drupal way”.
    • Thought from Andrew Beeken: could we replicate this approach, using WordPress custom post types to store and display structured content? Shades of the OPACPress project which Joss Winn and I proposed—but that was not funded—several years ago.
  • Some evidence that this approach is capable of speeding up the cataloguing process considerably: the more data you put in the faster it gets! Ed has some screencapture videos from OEM-UK showing workflow, including grabbing data via Zotero.

td;dr = OEM-UK are also successfully disrupting cataloguing workflows.

User testing with Orbital v0.1

Posted on May 16th, 2012 by Paul Stainthorp

Orbital v0.1 was released on 16 May 2012. Every two weeks, staff working on Orbital meet with Dr Bingo Wing-Kuen Ling and Dr Chunmei Qing to discuss their research and RDM practice. Until now these meetings have been all about requirements-gathering – today was the first opportunity for some real, hands-on user testing with the alpha release of Orbital.

The notes below have been turned into tasks on the Orbital project Pivotal Tracker site.

BL = Bingo Wing-Kuen Ling.

  1. BL successfully viewed Orbital v0.1 in Internet Explorer 7 on the UoL corporate desktop and was able to sign in and grant access to the application using his UoL credentials. BL was able to create and describe a new project.
  2. BL tried to upload a file from his desktop to Orbital using IE7 and received an error (this is a known bug with Orbital in Internet Explorer). He was then unable to delete this file.
  3. Switching to Firefox, BL uploaded multiple files from his desktop to his project in Orbital (it wasn’t clear from the page that this was possible). This completed successfully: but because the files sizes were small, he did not receive any feedback on his upload.
  4. Returning to the original file upload screen, BL had to manually refresh the page to view the changes made (files uploaded). Files scheduled for processing are marked as ‘queued’ however this status does not update automatically without refreshing.
  5. Joss Winn demonstrated the file and project metadata pages, citable URLs for files, and Google Analytics on projects. The display of file metadata needs to be more complete, and G.A. needs a better explanation and links to sources of help.
  6. The group discussed BL’s requirements around project calendars/timelines. BL wants to be able to view project events (meetings, deadlines, etc.) for each project (but not aggregated) and is not particularly concerned about notifications on activity/changes to files. The group discussed this and will explore ways of presenting timelines made up of three sorts of events (project events, activity stream, and comments) with each type of event suppressible in the timeline. A timeline overview will be displayed on the Orbital ‘front page’ once a user has logged in.
  7. BL also would like to be able to organise project and data files in all Orbital workspaces using folders/tags, and to allow bundled file download by organising files into collections.

You can read about Orbital v0.1 in this blog post, and about the roadmap for development and release of future versions, here.

Notes on last week’s KB+ meeting

Posted on May 8th, 2012 by Paul Stainthorp

As I start to understand the aims of JISC Collections’ KB+ (KnowledgeBase+) project a bit better, it’s starting to seem more and more relevant to the real-life problems of e-resources management. At last week’s meeting of the Technical Advisory Group, here were the things I found particularly interesting:

  • The proposed database model for journal package data, which does a neat job of distinguishing between the various ‘layers’ of ERM (in allowing data to be recorded separately for the issue, title, package and platform involved in a particular subscription deal);
  • The proposed links with the GOKb project in the USA, including the possibility (and it’s only a possibility at present) for sharing/co-designing data import processes; and the aims of the GOKb project itself in building and publishing collaboratively-maintained journal package data openly for the Kuali Open Library Environment;
  • The plans for live user testing of the first KB+ data release later in May, which will include e-resources librarians from 10 institutions getting their hands on the data and initial UI. This seems like a really useful and rare opportunity to do some near-real-world testing with groups of experts in the field of ERM. (N.B. this first group of 10 users doesn’t include the University of Lincoln – but I’ve asked Liam Earney if we could have ‘observer status’!);
  • The interesting questions (raised by Owen Stephens) around the complexities involved in representing overlapping journal package deals to e-resources managers – how will the librarians react to having their assumptions (and their mental model of what a journal ‘deal’ is) … challenged? My gut instinct is that we ought to want to know the underlying detail of multiple access rights in a single journal package – to dispel the ‘myths’ that might have grown up about our holdings over several years, even if it makes things look more complicated than we thought they were. (Naturally we need a way of re-presenting / simplifying this complexity to our users.)

I’ll continue to make notes about KB+ on this blog.

List of cross-repository search tools

Posted on March 9th, 2012 by Paul Stainthorp

I’ve been wondering for a while why [national] aggregated/cross-repository search services haven’t really taken off – why aren’t they as well-known as union library catalogue services (e.g. Copac, which is part of the standard librarian’s armoury)?

Is it because aggregated search of repository-only content wouldn’t be particularly useful to researchers; perhaps because Google [Scholar] provides them with what they already need? Is it because no subset of all the repositories in the world would really meet researchers’ needs; i.e., they aren’t interested in finding articles just from one ‘showcase’, country-specific repo search tool? Because it’s too difficult? (Can’t believe that; not compared to the aggregation of catalogue data.) Or because OA is too far off 100% to make it a worthwhile exercise?

It’s certainly not for the want of initiatives and projects to build ‘em. A presentation at the recent UKCoRR members’ meeting made me realise just how many there are.

Here’s a list of ten eleven websites, tools and projects which relate to inter-repository search:

  1. Google Scholar (scholar.google.com), “a simple way to broadly search for scholarly literature” – the de facto cross-repository search tool. Google’s inclusion guidelines for webmasters (inc. of repositories). A journal article about finding repository content via Google (doi:10.1177/0961000606070587).
  2. Institutional Repository Search  (IRS) demonstrator from Mimas (irs.mimas.ac.uk/demonstrator), retrieves content “across 130 UK academic repositories”, from a project completed in 2009.
  3. KMi CORE (COnnecting REpositories) Portal (core.kmi.open.ac.uk/search), a newer project with its own project website and blog. “The CORE project aims to make it easier to navigate between relevant scientific papers stored in Open Access repositories. ” Recently extended by the ServiceCORE project
  4. OAIster (oaister.worldcat.org), developed by the library at the University of Michigan and adopted by OCLC in 2009. “More than 23 million records representing digital resources from more than 1,100 contributors.”
  5. OpenDOAR search (www.opendoar.org/search.php) – using Google’s Custom Search Engine (CSE) to search the full-text of material held in open access repositories listed in the OpenDOAR directory of repositories. At the time of writing this blog post, the service had been temporarily withdrawn since 25 January 2012.
  6. RepUK (repuk.ukoln.ac.uk), a project to build a central cache of metadata from institutional repositories in the UK (currently harvesting from 159 repositories).
  7. RIAN (rian.ie), a national portal to the contents of the institutional repositories of the seven university libraries in Ireland; “your route to Open Access Irish research publications” – this is the kind of thing I had in mind: why isn’t there one for the UK?
  8. ROAR (roar.eprints.org/content.html) – also uses Google’s Custom Search Engine across all 2000-odd repositories registered in ROAR.
  9. Subject and discipline-specific repositories including such venerable initiatives as arXiv (arxiv.org) and PubMed Central (www.ncbi.nlm.nih.gov/pmc): offering different approaches to aggregating content that—for the most part—ignore the role of the institution and work directly with authors and publishers, respectively.
  10. Mendeley (www.mendeley.com)… not searching repositories, but achieving much the same result, and, sez Les Carr, spanning the public/institutionalised (OA) and private/social (peer-to-peer) methods of providing access to papers.
  11. BASE (base.ub.uni-bielefeld.de/en/); “BASE is one of the world’s most voluminous search engines especially for academic open access web resources. BASE is operated by Bielefeld University Library.” (Added at the suggestion of John Murtagh, 12 April 2012)

Any others I’ve missed?

Now let us “thank” OAI-PMH (and quite possibly SWORD, too), for making all of this possible… other shared repository tools and projects include:  AEIOUJULIETNamesOA-RJORCIDOpen Depot, OpenDOAR, ORI, PIRUS2, RoMEO, and about 9,997½ more.

KB+ project Technical Advisory Group (TAG)

Posted on January 31st, 2012 by Paul Stainthorp

……aaand just as an adjunct to my last blog post, it’s worth mentioning that I’m currently serving [time] on the TAG (Technical Advisory Group) for the JISC Knowledge Base+ (KB+) project. We had our first meeting on 19 December 2011 at HEFCE’s offices in central London.

Over the course of 2011-2012 HEFCE will be investing £600,000 in the creation of a shared service knowledge base for UK academic libraries to support the management of e-resources by the UK academic community.

This is my idea of a worthy cause—e-journal knowledgebase problems being a particular favourite of mine—and I’m pleased HEFCE and JISC Collections have decided it’s worth investing in a serious and robust attempt to share information between universities and to build better systems for managing e-resources. I’m happy to be involved.

Worth reading = KB+: What’s in it for libraries?

  • Improved Data and Tools
  • Enhanced JISC Services
  • Improving ERM systems
  • Shared Community Activity
For the untainted by ERM jargon, Wikipedia explains as well as anywhere what a knowledgebase actually is and what some of the challenges are. The University of Lincoln’s e-journals knowledgebase is the EBSCO A-to-Z. Also related is the work of the UKSG/NISO Knowledge Bases And Related Tools (KBART) working group.

Tick tock we don’t stop. Introducing CLOCK, a new JISC-funded resource discovery project at the universities of Lincoln and Cambridge

Posted on December 10th, 2011 by Paul Stainthorp

Cambridge CLOCKThe title says it all, really. The University of Lincoln, working in consortium with Cambridge University Library and Owen Stephens Consulting, has been awarded £49,877 by JISC to investigate ways of driving innovation in libraries’ interactions with Open Bibliographic Data, through a project we’re calling CLOCK (Cambridge-Lincoln Open Catalogue Knowledgebase).

CLOCK is a continuation of and elaboration upon the work of two recent JISC Discovery projects—Jerome at the University of Lincoln and COMET at the University of Cambridge—via a programme of development work shared between the two institutions, and with library consultant Owen Stephens. JISC were impressed enough with the work of both projects, and sufficiently interested in the potential for collaboration, that they encouraged our joint bid for follow-up funding.

Between now and the end of July, 2012, the CLOCK project will provide us with a framework to:

…[1] exploit through real-world applications the significant amount of data released openly by Cambridge University Library; [2] apply the Jerome database architecture, iterative development methodology, and API framework to a bibliographic dataset an order of magnitude greater than the University of Lincoln’s; and [3] to build and enable a new set of tools and demonstrator services which will enable the future development of public Open Bib Data web applications of practical utility to libraries and end-users.

You can read the full bid document, here.

I’m very much looking forward to working with Ed Chamberlain, Systems Librarian in the University Library at the University of Cambridge, along with Owen Stephens, veteran of a number of campaigns to open up access to library data, and Chris Leach (Systems Librarian) and Ian Snowley (University Librarian) from the University of Lincoln. Thanks are due to all of them for their help in writing the successful bid; to the Research & Enterprise Development office at Lincoln for their invaluable assistance in putting together the project budget; and to the LNCD group at the University of Lincoln for providing the kind of supportive development platform that makes these kind of projects possible.

Finally, a big thank you to Andy McGregor and the JISC Digital Infrastructure: Information and library infrastructure: Resource discovery programme, for this opportunity to further explore the blossoming environment of open bibliographic data/open discovery in libraries. If you haven’t done so already, you might like to take a look at the following websites:

As with all our projects, we’ll be blogging it comprehensively (so stand by for a steady stream of awful clock-related puns used as blog post titles). Although there’s little to see there yet, the CLOCK project blog is at: http://clock.blogs.lincoln.ac.uk/ – along with its own RSS feed RSS feed icon. Watch that space!