Ed Chamberlain, who is on the CLOCK project team as a researcher, is involved in two other projects under the Discovery strand: OEM-UK and Open Bibliography 2. We’re looking for ways in which CLOCK can re-use data, code, processes and ideas from these projects (and elsewhere) – also what CLOCK could offer in return.
Notes:
- Open Biblio project over the last few years; aim to aggregate large amounts of bibliographic data for scientific discovery.
- Data collected from Cambridge University, the BL, PubMed and held as RDF, used to power an open catalogue called “Bibliographica“.
- Problems around scaling the data/system led to the current JISC-funded Open Biblio 2 project (in the meantime, Cambridge and the BL had started to publish their data openly).
- Open Biblio 2 started looking at a NoSQL approach (CouchDB, Lucene/Solr) – eventually settling on Elastic Search.
- The approach of Open Biblio is to build bottom-up, community tools: BibServer and BibSoup (“Like Wikimedia for bib data”). Raises interesting questions about data quality in an open community-driven system.
- Also looking at JSON as lightweight way of sharing bib data: emerging BibJSON convention for representing bibliographic record as a JSON object (Ed wrote a MARC-to-BibJSON-parser in Perl). N.B. BibJSON is not a million miles away from the JSON that Jerome spits out! There are three hack days taking place next month in London to look specifically at BibJSON.
- Open Biblio 2 is also looking at JSON-LD (JSON for Linking Data), a ‘real’ JSON standard which does a lot of the things that RDF does.
tl;dr = use their JSON standards and BibSoup as a data source.
- The second project, OEM-UK (Open Education Metadata UK), based at the IoE in London, is focusing on cataloguing workflows.
- Data from the IoE’s SirsiDynix catalogue, plus EPrints is drawn into a Drupal framework; forms to create data (autopopulation of forms); “cataloguing the Drupal way”.
- Thought from Andrew Beeken: could we replicate this approach, using WordPress custom post types to store and display structured content? Shades of the OPACPress project which Joss Winn and I proposed—but that was not funded—several years ago.
- Some evidence that this approach is capable of speeding up the cataloguing process considerably: the more data you put in the faster it gets! Ed has some screencapture videos from OEM-UK showing workflow, including grabbing data via Zotero.
td;dr = OEM-UK are also successfully disrupting cataloguing workflows.

