Posts Tagged ‘Orbital’

Throw down the SWORD

Posted on May 7th, 2013 by Paul Stainthorp

With the Orbital project at its end, and plans for a University research information / research data service afoot, I’m reviewing the excellent work carried out by our (now-departed) developers Harry Newton and Nick Jackson – work which linked up CKAN, the Orbital ’bridge’ application, and the Lincoln Repository (EPrints) using SWORD – described in earlier blog posts here and here.

“One important piece of work that we’re undertaking at the moment in Orbital is the facility to deposit the existence of a dataset, from CKAN and the University’s new Awards Management System (AMS), into our (EPrints) Repository via SWORD – at the same time requesting a DOI for the dataset via theDataCite API. The software at the centre of this operation is what we refer to as Orbital Bridge.”

This deposit workflow is now broadly working as it should – I think only a few tweaks would be necessary now to turn this into a working tool for the University of Lincoln.

Most urgent is the need for the University to sign up with the DataCite DOI service, which would secure a DOI for each dataset record deposited from CKAN and hence formally published by the University. This subscription should form part of the new research information service.

The underlying code could be used for other SWORD-enabled deposit from sources of metadata (e.g. the Library’s discovery system, Find it at Lincoln), to the Lincoln Repository as the University’s bibliographic ‘system of record’.

Warning: this is an extremely screenshot-heavy blog post! Click on any one of the screenshots below to view a larger image.

Here’s a step-by-step walkthrough of the entire process of adding a dataset to CKAN, and depositing it as a record in the Lincoln Repository.

  1. Go to the Researcher Dashboard at: https://orbital.lincoln.ac.uk/ and click on “Sign In”.
    Screenshot from the Researcher Dashboard
  2. Enter your staff accountID and password to sign in to the Researcher Dashboard.
    Screenshot from the Researcher Dashboard
  3. Once you have been signed in and returned to the Researcher Dashboard, click on your name (in the top right-hand corner) and then click on “My Projects”.
    Screenshot from the Researcher Dashboard
  4. You will see an overview of your research projects – both funded projects (derived from the AMS), and unfunded projects you have added locally. Click on the name of the project you want to add data to.
    Screenshot from the Researcher Dashboard
  5. You will be taken to a page for that research project. On the right-hand side of this page, under the heading “Options”, click on “Create Research Data Environment”.
    Screenshot from the Researcher DashboardImage7
  6. You will be taken to the University’s CKAN research data platform, where a page/group will have been created which corresponds to your project in the Researcher Dashboard. Sign in to CKAN using your staff accountID (there is currently no single sign-on between the Researcher Dashboard and CKAN) and password and you should be returned to the same page. However you will probably be sent instead to the CKAN home page, in which case you will have to look again for your project under the “Groups” menu.
    Screenshot from CKAN
  7. Toward the top of the project screen in CKAN, click on “Add Dataset” > “New Dataset…”.
    Screenshot from CKAN
  8. Fill in the form with information about the overall dataset, including the following fields:
    • Title
    • URL
    • License (N.B. US spelling!)
    • Description
      Screenshot from CKAN
  9. Then click on “Add Dataset”
    Screenshot from CKAN
  10. If you now click on “Further information” tab on the left-hand menu, you can add the following additional information about the dataset (this is not obvious from the initial dataset form):
    • Author
    • Author email
    • Maintainer
    • Maintainer email
    • Version
    • Summary [of changes]
      Screenshot from CKAN
  11. To attach individual data document(s)—which CKAN refers to as “resources”—to the dataset, scroll down the page and click on “Upload a file” (there are other options) > “Choose file” > “Upload”.
    Screenshot from CKAN
  12. Then fill in the form with the following basic information about the “resource”:
    • Name
    • Description
    • Format
    • Resource Type
    • Datastore enabled (ticked by default)
    • Mimetype
    • Mimetype (Inner)
    • “Extra Fields” (user-defined, or used by Orbital)
      Screenshot from CKAN
  13. To deposit a record for this dataset in the Lincoln Repository, go back to the Orbital Researcher Dashboard at: https://orbital.lincoln.ac.uk/ and navigate to your project. Toward the bottom left of the page you should now see a table containing the dataset(s) you have created in CKAN for this project. Choose which dataset you want to deposit, and hit the “Publish to Lincoln Repository” button.
    Screenshot from the Researcher Dashboard
  14. The Researcher Dashboard will then display a deposit form containing the following fields (some of which should be being autopopulated from CKAN fields but which do not appear to be):
    • Title
    • Description
    • Type of Data
    • Keywords
    • Subjects
    • Divisions
    • Metadata visibility [Show|Hide]
    • People
      Screenshot from the Researcher Dashboard
      “Publishing will publicly announce the existence of your dataset on the Lincoln Repository, as well as start the process of long-term preservation of your data.“Usually you should only publish a dataset either at the end of a research project, or if the data is being cited in a paper. Publishing a dataset will place some restrictions on the changes you can make to the dataset in the future, such as removing your ability to delete the data. It will also generate a DOI, which allows your dataset to be uniquely identified and located using a simple identifier.“Please check the information in this form and make any necessary changes, as this is the information which will be entered into the published record of the dataset.“If you have any questions about this process please contact a member of the research services team for advice or assistance.”
  15. When you hit the “Publish Dataset” button, the dataset record from CKAN will be used to create a record in the Lincoln Repository. The record will be submitted for review by the Repository team, who will then make it live. N.B. for the time being, you will see an error “Validation errors: [doi] is a required string“ – this happens because the University does not currently have access to the live DataCite DOI service, which would secure a DOI for each dataset record deposited from CKAN. This should form part of the new research information service.
    Screenshot from the Researcher Dashboard
  16. Here’s an example of a record in the Lincoln Repository, created from a CKAN dataset and made live by the Repository team.
    Screenshot from the Lincoln Repository

Problems with the deposit process as it currently stands:

  1. Permissions are not correctly cascaded from a project the Researcher Dashboard to a group in CKAN.
  2. There is currently no single sign-on between the Researcher Dashboard and CKAN.
  3. When CKAN challenges a user to log in to a group, they should be redirected back to the group page after logging in – instead they get sent back to the CKAN home page, in which case they will have to look again for their project under the “Groups” menu.
  4. A minor one – in CKAN ”License” (noun) appears in US spelling (should be “Licence”).
  5. In order to add all the information needed to deposit a dataset from CKAN, user has to click  ”Further information” tab on the left-hand menu (this is not obvious from the initial dataset form).
  6. Some of the field labels in CKAN are a bit opaque or use technical terms (“Mimetype”) which could do with explanation.
  7. When depositing to EPrints, some of the deposit fields should be being autopopulated from CKAN fields – this does not appear to be happening. The fields affected are:
    • “Description” (could be derived from CKAN dataset/resource Description fields)
    • “Type of Data” (could be derived from CKAN resource Format field)
  8. Repository records created from CKAN have the data “Creator” attached, but not the “Maintainer”.
  9. Repository records created from CKAN don’t have a link back to the CKAN dataset (should go in the EPrints “Official URL” field) – this will be required to provide access to the data.
  10. After deposit, users see the error message “Validation errors: [doi] is a required string” – the University does not currently have access to the live DataCite DOI service, which would secure a DOI for each dataset record deposited from CKAN.

Research data documentation and training materials

Posted on April 26th, 2013 by Paul Stainthorp

The final within-project version of the Orbital Research Data Management training materials are now live on the Orbital Researcher Dashboard website. They have been written collaboratively by the Orbital project team, and draw on a lot of existing RDM training and guidance material from across the web (in particular, from the DCC).

We intend that these materials will continue to be maintained and developed as part of the new University-wide research information service mentioned in a previous blog post.

The training materials can be accessed at https://orbital.lincoln.ac.uk/ and cover the following areas:

  1. Screenshot of the Researcher DashboardWhat is research data?
  2. The research data lifecycle
  3. Policies affecting your research data
  4. Data Management Planning (DMP)
  5. Data search and discovery tools
  6. Data storage and security
  7. Legal and ethical issues
  8. Tools for working with your data
  9. Data publishing and citation
  10. Licences for sharing your data
  11. Data curation and preservation
  12. Workshops and training events
  13. Help and support

The source text for each page is stored in an open Github repository (at http://github.com/unilincoln/rdm) in Markdown format. The page admin tools in the Researcher Dashboard can then be used to link to the source document, which is then formatted in the University’s Common Web Design.

These web pages will be used to support the ongoing RDM training for postgraduate students, which will shortly be rolled out to University staff.

“Managing Your Research Data” – training for postgrad students

Posted on January 9th, 2013 by Paul Stainthorp

As part of the JISC-funded Orbital project, we are starting to offer introductory training to (initially) postgraduate students, on how to look after their research data.

The first workshop is on 23 January 2013 at 10.00 in the Graduate School classroom, and there are further workshops every couple of weeks throughout 2013.

I’ll be arranging further workshops aimed more at staff (including Library staff) in due course.

MANAGING YOUR RESEARCH DATA

The Graduate School – University of Lincoln Multiple dates throughout 2013

Research data management is an important part of the research process, and a vital part of academic practice. This one-hour workshop will include a presentation and discussion of what you should consider when creating, looking after, and sharing/publishing your research data.

The workshop will cover:

  • What do we mean by research data?
  • Policies affecting your data
  • Data Management Planning (DMP)
  • The research data lifecycle
  • Practical tools for looking after your data
  • Data publishing and citation
  • Where to go for help

Postgraduate students can book a place on a workshop, online at: http://uolresearchdata.eventbrite.co.uk/

Orbital deposit of dataset records to the Lincoln Repository: workflow

Posted on December 6th, 2012 by Paul Stainthorp

Further to yesterday’s blog post about linking our CKAN datastore with our EPrints Repository (to allow researchers to deposit permanent, public, citable records of their datasets), here’s a fleshed-out diagram of the proposed dataset deposit workflow process.

At the moment, this assumes a one-time “fire and forget” deposit. At some point, we’re going to have to tackle versioning.

The original diagram is available on Lucidchart. See the table in my previous blog post for details of which data fields are involved in the process (i.e. passed between CKAN, Orbital Bridge, the DataCite API, and EPrints).

This is a proposal and still has to be road-tested. Comments welcome.

Diagram of the dataset deposit process

Stages in the proposed deposit process:

  1. User enters project metadata in AMS
  2. AMS creates project container in CKAN
  3. User creates dataset record in CKAN
  4. Nucleus adds user metadata to CKAN
  5. User deposits data in CKAN
  6. User presses “DEPOSIT DATASET” button in CKAN
  7. Orbital Bridge requests DOI
  8. DataCite API returns DOI
  9. Orbital Bridge adds DOI to dataset record in CKAN
  10. User reviews and approves dataset metadata (making changes if necessary)
  11. Orbital Bridge writes changes back to dataset record in CKAN
  12. Orbital Bridge creates a new EPrints record via SWORD
  13. EPrints confirms existence of new record
  14. Orbital Bridge writes EPrints record URL back to CKAN dataset record

Research data training at the University of Lincoln

Posted on December 5th, 2012 by Paul Stainthorp

As part of the Orbital project to build a pilot Research Data Management (RDM) infrastructure at the University of Lincoln, I’m looking particularly at support, training and documentation.

We aim to start offering—early in 2013—an introductory 1-hour workshop on managing your research data, aimed at early-career researchers and postgraduate research students. In particular, we want to promote this training through three avenues:

  1. As part of the Lincoln Graduate School‘s standard timetable of postgrad training;
  2. Directly, to PhD students in the School of Engineering (our pilot group);
  3. To researchers who completed our Data Asset Framework questionnaire.

The training will be supported by documentation (written and maintained through WordPress and a dedicated RDM reading list), presented through the main Orbital “bridge” site, which we’re starting to treat as a VRE.

Here’s an outline of the initial workshop. I’m meeting the Graduate School this afternoon to agree this.

“Managing your research data”

  1. Definitions, terminology and scope (what do we mean by research data?)
  2. Policies and laws affecting your data
  3. The “research data lifecycle
  4. Data Management Planning (DMP)
  5. Practical tools for looking after your data
  6. Data publishing and citation
  7. Where to go for further help and support

Comments welcome!

Orbital training and documentation

Posted on October 3rd, 2012 by Paul Stainthorp

I’ve been quiet—too quiet—about the Orbital project recently. While I’ve not been blogging, Joss, Nick and Harry have overseen several fairly important developments:

As Orbital-the-product (coherent set of products, really) develops, my own focus between now and the end of the project (March 2013) will be on Orbital-the-servicetraining, support, documentation, and implementation of RDM policy at the University of Lincoln. I’ll work closely with the Research & Enterprise department on these aspects.

Four level hierarchy of documentationAs part of this strand of the project (which cuts across workpackages 7, 11, and 12), I want to consider the following:

  1. The current usability of ownCloud, CKAN, EPrints, etc. – what ‘sticking plaster’ help materials do we need to provide right now (if any?).
  2. How the production of documentation fits in to the software development release cycle (“change management“?) – particularly so in an agile/iterative environment, and how we ensure we meet our responsibility to ‘leave no feature undocumented’ as well as provide adequate contextual information on RDM. Related: I’m thinking about a four-level hierarchy of documentation (see right): how do the different levels relate to each other (how do we ensure internal consistency?), and how do we ensure all four levels are covered?
  3. [How] should we contribute to an (OKFN-co-ordinated) open research [data] handbook initiative (c.f. the Open Data Handbook; Data Journalism Handbook) instead of—or as well as—writing our own operational help guides? Contributing to and re-consuming community-written RDM materials will be more efficient than writing our own guidebook from scratch, but we need to make sure our local documentation is relevant to Lincoln.
  4. I’ve already started collated a list of other peoples’ RDM help materials (Joss has collected many more) – I’ll publish the list to this blog soon. I’ll be looking to see what we can re-use. There are some very good, openly-licensed training materials available, but I don’t want us to use them uncritically.
  5. How do we use our (still not-yet-accepted) RDM policy as a jumping-off point for training events?
  6. What did we learn from our recent(ish) Data Asset Framework exercise? How can we use researchers’ priorities as identified in the DAF to inform training? Should we re-run the exercise and/or follow it up with more detailed discussions?
  7. It possible/likely that we will shortly have a new member of staff to work with the Lincoln Repository and the University’s REF submission. What responsibility might that person have for RDM training and support?

Next I need to organise a meeting with the Research & Enterprise department to plan our ‘version 0.1′ training programme, possibly consisting of (i) a discussion of the issues raised in our DAF survey and people’s current RDM practice, (ii) a discussion of the RDM policy, and (iii) presentation of the various VRE tools available (CKAN, ownCloud, EPrints, DataCite, DMPOnline). We’ll probably pilot this on a group of willing PhD students in the School of Engineering.

Notes on Orbital v0.2.1

Posted on June 19th, 2012 by Paul Stainthorp

A few notes on some of the new features in the latest version of Orbital: these were presented to Dr Bingo Wing-Kuen Ling on 15 June 2012.

  1. ‘Your Projects’ now includes an Activity Timeline of comments and file changes aggregated across all projects in Orbital; each project page also displays a timeline for that project.
    Screenshot of the Orbital timeline
  2. Files from the File Archives can be organised using Collections (which are ‘tag-like’ rather than ‘folder-like’: i.e. a file can belong to more than one Collection).
    Screenshot of Orbital project
  3. You can now edit project information and add new members to a Project. To do this, go to the Project within Orbital, click on the ‘edit’ button, and scroll down to Project Members.
    Screenshot of the Orbital project page
    Screenshot of the Orbital add members section
  4. Finally, a bug which was preventing the upload of files using Internet Explorer has now been fixed.

Orbital notes, 24 May 2012

Posted on May 24th, 2012 by Paul Stainthorp

The Orbital project team met today (24 May 2012) and agreed the following:

  • Documentation
    • User documentation will focus on the “why”s of Research Data Management, rather than being a point-and-click guide to the Orbital UI (which should not require detailed explanations).
    • JW will create a changelog (human readable text file) for each major release of Orbital, so that documentation for each feature is review if that feature is updated.
    • PS will lead on writing documentation (as HTML pages, stored in the GitHub repository), with documentation for release v0.N completed and available by the launch of v0.N+1
    • PS will email colleagues from the Library and Research/Enterprise for assistance on writing documentation.
  • Training
    • JW will invite Melanie Bullock and David Sheppard on to the Orbital working group. He is meeting Annalisa Jones to discuss RDM training for staff.
  • Releases/development
    • Orbital v0.1.1 (including bug fixes) met all of the initial ‘minimum viable product‘ requirements specified by Dr Tom Duckett, and also includes the basics of project administration.
    • v0.2 will include improvements to the file upload/management, project management, and license management interfaces, as well as clearer distinction between language files and operating code.
    • NJ demoed the current version of Orbital to Siemens staff. He now has access to Siemens machine data for testing within Orbital.
    • The group discussed the LNCD plans for internal servers/private cloud, and about the disk space requirements and costs.
  • Integration
    • The current version of the DMPOnline tool has been installed on a test server. The group discussed our approach to integration between external tools/software (such as DMPOnline, R, Gephi) and Orbital.
    • NJ is going to email Adrian Richardson at the DCC to ask when the DMPOnline APIs will become available.
  • RDM policy
    • JW presented the draft policy to the University RIEC committee. The committee have been asked to send comments to Joss. (One comment at the committee meeting was that our having a policy too geared around the requirements of the Research Councils may not be appropriate for Lincoln, which generates a lot of non-RC income. However it was noted that the good practice specified by the RCs is good practice for management of all research data, whatever the funding source.)
  • Conferences and meetings
  • Data Asset Framework survey
    • The group discussed the recent DAF survey which we conducted at the University of Lincoln.
    • JW will convene a sub-group to consider the responses in detail, and plan follow-up interviews.
  • Business case
    • JW is currently gathering costs for long-term data storage. This will form the first strand of the Orbital business case, which will be presented to University SMT (along with the agreed RDM policy) in September 2012.

1.8 million library loans from the University of Lincoln under CC0 – Copac Activity Data/SALT2 project

Posted on May 16th, 2012 by Paul Stainthorp

Today we published data on approximately 1.8 million items loaned from the University of Lincoln’s libraries since 2001. The data is available to re-use under a CC0 licence, and can be downloaded from:

We’ve done this as part of our involvement in the Copac Activity Data Project, a.k.a. SALT2. Along with data from the universities of Manchester, Sussex, Cambridge and Huddersfield, our circulation data will be used to power a ‘recommender API‘, which libraries will be able to use to build “People who borrowed X also borrowed Y“-type services. The API will benefit from the power of aggregated data from multiple institutions of different types, containing tens of millions of circulation events.

You’ll notice as well that we’ve chosen to host the data on our brand-new Orbital (v0.1) research data management application. Each dataset has a persistent citable URI. We’ll be keeping the data up-to-date, and generating a new activity data file from our library circulation logs shortly after the end of each academic year.

The data consists of a number of CSV files (one for each academic year since 2000-01, plus a huge file of all the data), containing the following fields:

Field index Field name Description
0 CREATE_DATE The date and time of the loan event, in the format: dd/mm/yyyy hh:mm
1 BORROWER_ID A cryptographic hash of the internal system ID associated with the borrower of the item, as used in the University of Lincoln’s library system.
2 WORK_ID A cryptographic hash of the internal system ID associated with the bibliographic work borrowed, as used in the University of Lincoln’s library system.
3 CONTROL_NUMBER The ISBN of the work borrowed (10 or 13 digits).
4 AUTHOR_DISPLAY The main author of the work borrowed.
5 TITLE_DISPLAY The title of the work.
6 PUB_DATE The publication year of the work in the form: yyyy

I’ll blog in detail another time about exactly how we created the data extracts. In short:

  1. There is a table in the SirsiDynix Horizon library management system called circ_tran which records every instance of item number X borrowed by user number Y at time Z. [#1]
  2. There is another table which provides a lookup between item numbers and the numbers of the bibliographic works of which they are a copy. [#2]
  3. Dave Pattern at the University of Huddersfield wrote a Perl script which scrapes all the bibliographic data (title, author, ISBN) for each work from our OPAC (Horizon Information Portal) and writes it to a text file. [#3]
  4. Developer, Jamie Mahoney of CERD/LNCD then stepped in, using some pretty heavy SQL on the original 3 data extracts, to:
    • Hash the internal Horizon user and work ID numbers to provide anonymity;
    • Convert the internal Horizon date and time stamps in extract [#1] from a version of Unix time into a readable datestamp (formula hint: cko_date*86400 + cko_time*60);
    • Used the item/work lookup table [#2] to pull in the bibliographic details for each loan in [#1] from the bibliographic table [#3] (an epic SQL JOIN query), removing items which are no longer represented in our library system;
    • Removed any items without an ISBN, which are of no use to the SALT recommender API;
    • Tweaked the punctuation and formatting;
    • Split the data into separate files for each year.

Once again, the data is at:

Thanks are due to Chris Leach and Dave Pattern for Horizon-fu, and to Jamie Mahoney for his patient wrangling of several millions of lines of data!

You can find out more about the Copac Activity Data Project/SALT2, at: http://copac.ac.uk/innovations/activity-data/

Over the moon (again): Orbital v0.1 released

Posted on May 16th, 2012 by Paul Stainthorp

Screenshot of OrbitalJoss Winn has blogged this morning about a signficant milestone in the Orbital project. Today we released Orbital v0.1, and, from today, invited researchers at the University of Lincoln have access to an alpha ‘minimum viable product‘ environment for managing their research data.

For the time being, sign-in access to Orbital (http://orbital.lincoln.ac.uk/) is restricted to invited individuals only.

Orbital v0.1 allows a researcher to:

  • sign in using their University of Lincoln credentials
  • create and describe a project
  • upload their data to the project
  • choose a license for the data
  • add a Google Analytics code to measure project analytics
  • published data at a persistent URI (id.lincoln.ac.uk)
  • leave feedback on the Orbital application

You can read more about this first release on the Orbital blog. We’ve also written a basic development roadmap for Orbital which gives an idea of the kind of features you should see becoming available between now and Christmas 2012.