Posts Tagged ‘preservation’

Notes from RDMF7 workshop

Posted on November 3rd, 2011 by Paul Stainthorp

Long day on the trainI’ve been at the University of Warwick today, for a workshop organised by the Digital Curation Centre (DCC), entitled RDMF7: Incentivising Data Management & Sharing. There appeared to be a wide range of attendees, from data curators & data scientists, ICT/database folk. actual researchers and academics, as well as at least one fellow library/repository rat.

Unfortunately I was only able to attend part of the event (which ran over two days). The following notes have been reconstructed from the Twitter stream (hashtag #RDMF7)!

The first speaker I heard was Ben Ryan of the funding council, the EPSRC. He talked about the “long-established” principles of responsible data management [links below]… this may be my own interpretation of Ben’s presentation, but I don’t think I was imagining undertones of “…so there’s really no excuse!“. He also covered individual and institutional motivations for taking care of data [much more about which later], policy and the enforcement of policy, dataset discoverability/metadata, funding (including the EPSRC’s expectation that institutions will make room in existing budgets to meet the costs of RDM), and embargo periods (inc. researchers’ entitlement to a period of “privileged use of the data they have collected, to enable them to publish” first – important to stress this in order to allay fears/get researchers on board?).

Some links:

Next up was Miggie Pickton, ‘queen bee’ of the University of Northampton‘s repository (and self-described RDM “novice”, indeed!), talking about their participation in the multi-institution, JISC-funded KeepIt project, which aimed to design “not one repository but many that, viewed as a whole, represent all the content types that an institutional repository might present (research papers, science data, arts, teaching materials and theses).” This work lead almost by chance to Northampton’s undertaking of a university-wide audit of its research data management processes using the DCC’s Data Asset Framework (DAF) methodology. This helped them to make the case for an institutional research data management working group and [eventually, and not without resistance] to establish a mandatory, central policy for RDM. (Show of hands at this point: how many other institutions have completed a DAF? I counted perhaps only three, Lincoln certainly not being amongst them. Q. Should the University of Lincoln complete a Data Asset Framework exercise as part of the Orbital project?)

After coffee, we heard a third presentation from Neil Beagrie of (management consultancy partnership) Charles Beagrie Ltd. Neil delivered a very comprehensive explanation of the KRDS (“Keeping Research Data Safe”) project, which has developed both an activity model and a benefits analysis toolkit for the management and preservation-of-access to ‘long-lived data’. I have to come clean here and admit that I was a little bewildered by the detail: much of it went through both ears without sticking to the brain on the way through. I need to go back over the tweets more carefully and have a look at the KRDS toolkit and reports at: beagrie.com/krds.php

The morning’s presentations over, we split into three groups for breakout discussion.

I attached myself to the second of the three groups, led by (JISC programme manager for Orbital) Simon Hodson; our job to consider the question: “What really are the sticks and carrots that will make a long-term difference to the pursuit of structured data management processes?“. After spending some time picking apart the terminology, and what each of the various ‘processes’ might include, we had a wide-ranging (and allocated-time-overrunning) discussion about the things that genuinely motivate scientists, universities, and funding councils(!) to care about RDM; about some of the problems caused by the complexity and inconsistency of metadata for datasets; also about the issue of citations/digital object identifiers for data—how those citations might be treated by publishers and citation data services—and how that relates to any notions of ‘peer review’ in experimental data.

As requested, our group came up with three actions which we believe will help address the question of motivation:

  1. Data citation – publishers should consistently include e.g. DOIs for datasets in final published articles, so that citations of the data can be measured.
  2. Measurement of RDM “maturity” – departments and whole institutions should adopt a standardised quality mark for research data management, to give [potential] researchers, funding bodies, and the public confidence in their ability to handle data appropriately.
  3. Discovery – the research councils (probably) should push for common metadata standards for describing datasets and underlying data-generating research/experimental processes.

Lunch followed, and I had time to hear two more presentations in the afternoon before I had to run for a bus:

Catherine Moyes of the Malaria Atlas Project: in effect, demonstrating what really clear and consistent management of large-scale (geo)data looks like. This seems to consist of an extremely rigorous approach to requesting, tracking, and licensing data from the contributors of the project’s data… and an equally strict (but in a good way) expectation of clarity when dealing with requests from third parties to use the data. If that all comes across as restrictive, I’d point to Catherine’s slide on ‘legalities’ of the data that the Malaria Atlas Project has released openly – it’s about as open as it gets, with no registration needed, no terms & conditions placed on re-use of the published data, and all software/artefacts released under very permissive and free licences (Creative Commons or GNU). N.B. the Orbital project should look at the Malaria Atlas Project’s “data explorer”, available via map.ox.ac.uk, as an example of a really nifty set of applications built on top of openly accessible and re-usable data.

Finally (and I’m sorry I only got to hear part of his presentation), University of So’ton chemistry professor Jeremy Frey on their IDMB (Institutional Data Management Blueprint) Project—southamptondata.org—and some rather funny anecdotes about the underlying knowledge, expectations, and problems faced by researchers managing their own data, which emerged when they were surveyed as part of the above project.

Lots to take in (lots). But some useful suggestions for Orbital, which I’ll be bringing to the next project meeting: and plenty more reading material which I’ll add to the project reading list asap.

Paul Stainthorp, lead researcher on the Orbital project.

Repository team news & report on RSP Winter School #rspws11

Posted on February 24th, 2011 by Paul Stainthorp

The latest news from the Repository team at the University of Lincoln:

RSP Winter School 2011

I was lucky enough to attend the three-day Repositories Support Project Winter School (#rspws11), which this year was held in the impressive surroundings of Armathwaite Hall near Bassenthwaite in the Lake District. As you can see from my photos, it was a real hardship.

Avenue of trees #rspws11

The programme included a keynote address by the immensely switched-on Professor Martin Hall, V-c of the University of Salford (and the first UK V-c on Twitter!), which touched on archaeology, museums, data preservation, open access, mobile learning, and the meaning of the modern university. The remaining speakers and discussions over the three days seemed to relate to two main topics:

  1. Data preservation and OA to datasets: Max Wilkinson on the work of the British Library and the BL datasets programme (bl.uk/datasets); Miggie Pickton from the University of Northampton about their ‘KeepIt‘ project to preserve university research data.

The consensus about research data seems to be this: don’t rely on your existing processes for your ‘publications’ repository. Keep a clear wall between a publications repository and a data archive. The requirements for describing/cataloguing, preserving, and providing access (sensitive data, etc.) are all just too different for datasets and publications. Also, there seems to be a general agreement that a more national, shared approach is appropriate for datasets than the strongly institutional focus of publication repositories.

_DSC9268

  1. The options for CRISes and Repositories when gathering data for the REF: presentations from Keith Jeffery; Mark Cox

It slowly emerged that there seem to be at least two different approaches to REF data-preparation that universities are taking: some [generally large, research-intensive universities] are investing heavily in a CRIS (which is impacting on the role of the Repository); others [generally the smaller HEIs, though with notable exceptions] are developing and enhancing their existing Repository systems, and relying on EPrints/DSpace to do more heavy lifting.

Bassenthwaite Lake

Interestingly, there was relatively little talk of e-theses in all this. We did however manage to slip in an advert for the UKCoRR members’ meeting (tomorrow!)

Slides and notes from the various presentations and workshops are available to download from the RSP’s website.

Tweets bearing the Winter School’s hashtag #rspws11 are preserved in a Twapper Keeper archive.

Armathwaite Hall

Meanwhile, back home in Lincoln…

And at our regular Repository team meeting on Friday, 18 February. It seems to be a particularly busy time, Repository-wise, at the moment. Welcome to David Young who came to his first Friday team meeting.

Present: Bev Jones (BJ), Paul Stainthorp (PS), Rosaline Smith (RS), David Young (DY).

  1. We’ve hit 2,800 items on the Repository, which is a credit to Lincoln’s academic staff, as well as to the tireless efforts of RS and BJ! We’re aiming for 3,000 items by the end of April, 2011. If we hit that target, I’ll be doing some more baking.
  2. There are a number of useful training events on at the moment: some organised by the RSP (e.g. this one), as well as this extremely valuable-looking non-RSP event in Glasgow. Many of the events relate in some way to getting data in/out of repositories for REF purposes (c.f. the discussions at the Winter School, above). Unfortunately, Lincoln people aren’t able to attend many of these events, so PS and DY are going to meet to discuss the possibility of running/arranging a similar event in the East Midlands.
  3. The group discussed some EPrints tweaks: publisher search, the ability to ‘bounce’ a Repository record from one owner to another, the perennial unique author IDs …all of which are possible and in place in at least one other EPrints repository. We also touched upon our succession/emergency planning (i.e. how would the Library cope if and when the volume of Repository traffic outstrips our resource to deal with it: our “Plan X“.)
  4. RS updated us on the Kultivate project: there’s another workshop in London on Monday, 28 February; RS is still planning a meeting with the Faculty of Art, Architecture & Design. RS has issued her final reminder by mass email to academic staff, asking them to attend a Repository workshop or/and to get in touch to discuss depositing their items.
  5. BJ reported that all Repository records from the calendar years 2010/2011 (so far) are now identifiable to a quarter. (We need this level of specificity to produce our Quarterly Research Output Reports.) However, there’s still some confusion over exactly how we can construct date-limited queries in EPrints – BJ is going to ask on the eprints_tech and UKCoRR mailing lists to see if we can get a definitive answer.
  6. Now-quite-finally, I (PS) ran through a number of things I’m going to bring to the next Repository steering group: including technical developments and where we might need to take EPrints in the run-up to the REF, as well as improving the Repository’s presence on our corporate website. I’m also going to speak to the chair of the steering group (University Librarian, Ian Snowley) about the date of the next meeting.
  7. Did I mention it’s the UKCoRR meeting tomorrow?

Bassenthwaite morning reflection