Posts Tagged ‘repositories’

High-level overview of University Research Information Systems (VRE:RIM)

Posted on May 24th, 2012 by Paul Stainthorp

As part of Orbital, we’ve started to seriously consider how RDM (Research Data Management) fits into the whole range of University systems to support research information. In some contexts (usually research support/administration), we might use the term RIM (“Research Information Management”); in other, academic, contexts, we might talk about a VRE (“Virtual Research Environment”) – but I prefer to think of both as functions of the same set of systems.

Below is a first attempt to model various University research systems and what information might be shared/pass between them. This is a first draft only and will be discussed/developed over time – it’s here for comments!

Screenshot of the VRE RIM diagram

It’s available to view on Lucidchart.

Repository feeds on university staff profile webpages: some examples

Posted on March 28th, 2012 by Paul Stainthorp

There is a project going on at the University of Lincoln at the moment to rebuild our directory of academic staff profiles on the web, in line with our new corporate website.

As I mentioned in my presentation to library managers last week, it’s turning out to be a nice example of how new web applications can be spun up quickly at Lincoln using our existing [open and non-open] data sources (in this case, HR staff data, BuddyPress social profile data, Repository feeds, Gravatar images, and our OAuth authentication framework/Common Web Design), plus a bit of developer magic.

Screenshot of the new staff directory

You can search our staff profile directory (still in development) at: http://phone.online.lincoln.ac.uk/

There is a growing tendency for universities in all groupings—certainly for the research-intensive universities—to publish the entirety of an author’s publications to their web profile as embedded content from their repository and/or Current Research Information System (CRIS). Here are a few examples of staff profiles on other UK universities’ sites which incorporate publication lists derived from their repositories or CRISes:

We’re pulling the publication details from the Lincoln Repository for each author into their web profile (example), using a search on their University of Lincoln staff ID (which forms part of their standard HR data profile) – e.g. http://lncn.eu/ep/000157. We can then get at the Repository data in almost any format we want (BibTeX, JSON, XML, RSS, etc.). I’m also keeping a close eye on the development of the EPrints Shelves plugin, which might be an interesting tool for giving authors more flexibility and control over how their Repository publication list(s) are displayed on their web profile.

List of cross-repository search tools

Posted on March 9th, 2012 by Paul Stainthorp

I’ve been wondering for a while why [national] aggregated/cross-repository search services haven’t really taken off – why aren’t they as well-known as union library catalogue services (e.g. Copac, which is part of the standard librarian’s armoury)?

Is it because aggregated search of repository-only content wouldn’t be particularly useful to researchers; perhaps because Google [Scholar] provides them with what they already need? Is it because no subset of all the repositories in the world would really meet researchers’ needs; i.e., they aren’t interested in finding articles just from one ‘showcase’, country-specific repo search tool? Because it’s too difficult? (Can’t believe that; not compared to the aggregation of catalogue data.) Or because OA is too far off 100% to make it a worthwhile exercise?

It’s certainly not for the want of initiatives and projects to build ‘em. A presentation at the recent UKCoRR members’ meeting made me realise just how many there are.

Here’s a list of ten eleven websites, tools and projects which relate to inter-repository search:

  1. Google Scholar (scholar.google.com), “a simple way to broadly search for scholarly literature” – the de facto cross-repository search tool. Google’s inclusion guidelines for webmasters (inc. of repositories). A journal article about finding repository content via Google (doi:10.1177/0961000606070587).
  2. Institutional Repository Search  (IRS) demonstrator from Mimas (irs.mimas.ac.uk/demonstrator), retrieves content “across 130 UK academic repositories”, from a project completed in 2009.
  3. KMi CORE (COnnecting REpositories) Portal (core.kmi.open.ac.uk/search), a newer project with its own project website and blog. “The CORE project aims to make it easier to navigate between relevant scientific papers stored in Open Access repositories. ” Recently extended by the ServiceCORE project
  4. OAIster (oaister.worldcat.org), developed by the library at the University of Michigan and adopted by OCLC in 2009. “More than 23 million records representing digital resources from more than 1,100 contributors.”
  5. OpenDOAR search (www.opendoar.org/search.php) – using Google’s Custom Search Engine (CSE) to search the full-text of material held in open access repositories listed in the OpenDOAR directory of repositories. At the time of writing this blog post, the service had been temporarily withdrawn since 25 January 2012.
  6. RepUK (repuk.ukoln.ac.uk), a project to build a central cache of metadata from institutional repositories in the UK (currently harvesting from 159 repositories).
  7. RIAN (rian.ie), a national portal to the contents of the institutional repositories of the seven university libraries in Ireland; “your route to Open Access Irish research publications” – this is the kind of thing I had in mind: why isn’t there one for the UK?
  8. ROAR (roar.eprints.org/content.html) – also uses Google’s Custom Search Engine across all 2000-odd repositories registered in ROAR.
  9. Subject and discipline-specific repositories including such venerable initiatives as arXiv (arxiv.org) and PubMed Central (www.ncbi.nlm.nih.gov/pmc): offering different approaches to aggregating content that—for the most part—ignore the role of the institution and work directly with authors and publishers, respectively.
  10. Mendeley (www.mendeley.com)… not searching repositories, but achieving much the same result, and, sez Les Carr, spanning the public/institutionalised (OA) and private/social (peer-to-peer) methods of providing access to papers.
  11. BASE (base.ub.uni-bielefeld.de/en/); “BASE is one of the world’s most voluminous search engines especially for academic open access web resources. BASE is operated by Bielefeld University Library.” (Added at the suggestion of John Murtagh, 12 April 2012)

Any others I’ve missed?

Now let us “thank” OAI-PMH (and quite possibly SWORD, too), for making all of this possible… other shared repository tools and projects include:  AEIOUJULIETNamesOA-RJORCIDOpen Depot, OpenDOAR, ORI, PIRUS2, RoMEO, and about 9,997½ more.

Notes from RDMF7 workshop

Posted on November 3rd, 2011 by Paul Stainthorp

Long day on the trainI’ve been at the University of Warwick today, for a workshop organised by the Digital Curation Centre (DCC), entitled RDMF7: Incentivising Data Management & Sharing. There appeared to be a wide range of attendees, from data curators & data scientists, ICT/database folk. actual researchers and academics, as well as at least one fellow library/repository rat.

Unfortunately I was only able to attend part of the event (which ran over two days). The following notes have been reconstructed from the Twitter stream (hashtag #RDMF7)!

The first speaker I heard was Ben Ryan of the funding council, the EPSRC. He talked about the “long-established” principles of responsible data management [links below]… this may be my own interpretation of Ben’s presentation, but I don’t think I was imagining undertones of “…so there’s really no excuse!“. He also covered individual and institutional motivations for taking care of data [much more about which later], policy and the enforcement of policy, dataset discoverability/metadata, funding (including the EPSRC’s expectation that institutions will make room in existing budgets to meet the costs of RDM), and embargo periods (inc. researchers’ entitlement to a period of “privileged use of the data they have collected, to enable them to publish” first – important to stress this in order to allay fears/get researchers on board?).

Some links:

Next up was Miggie Pickton, ‘queen bee’ of the University of Northampton‘s repository (and self-described RDM “novice”, indeed!), talking about their participation in the multi-institution, JISC-funded KeepIt project, which aimed to design “not one repository but many that, viewed as a whole, represent all the content types that an institutional repository might present (research papers, science data, arts, teaching materials and theses).” This work lead almost by chance to Northampton’s undertaking of a university-wide audit of its research data management processes using the DCC’s Data Asset Framework (DAF) methodology. This helped them to make the case for an institutional research data management working group and [eventually, and not without resistance] to establish a mandatory, central policy for RDM. (Show of hands at this point: how many other institutions have completed a DAF? I counted perhaps only three, Lincoln certainly not being amongst them. Q. Should the University of Lincoln complete a Data Asset Framework exercise as part of the Orbital project?)

After coffee, we heard a third presentation from Neil Beagrie of (management consultancy partnership) Charles Beagrie Ltd. Neil delivered a very comprehensive explanation of the KRDS (“Keeping Research Data Safe”) project, which has developed both an activity model and a benefits analysis toolkit for the management and preservation-of-access to ‘long-lived data’. I have to come clean here and admit that I was a little bewildered by the detail: much of it went through both ears without sticking to the brain on the way through. I need to go back over the tweets more carefully and have a look at the KRDS toolkit and reports at: beagrie.com/krds.php

The morning’s presentations over, we split into three groups for breakout discussion.

I attached myself to the second of the three groups, led by (JISC programme manager for Orbital) Simon Hodson; our job to consider the question: “What really are the sticks and carrots that will make a long-term difference to the pursuit of structured data management processes?“. After spending some time picking apart the terminology, and what each of the various ‘processes’ might include, we had a wide-ranging (and allocated-time-overrunning) discussion about the things that genuinely motivate scientists, universities, and funding councils(!) to care about RDM; about some of the problems caused by the complexity and inconsistency of metadata for datasets; also about the issue of citations/digital object identifiers for data—how those citations might be treated by publishers and citation data services—and how that relates to any notions of ‘peer review’ in experimental data.

As requested, our group came up with three actions which we believe will help address the question of motivation:

  1. Data citation – publishers should consistently include e.g. DOIs for datasets in final published articles, so that citations of the data can be measured.
  2. Measurement of RDM “maturity” – departments and whole institutions should adopt a standardised quality mark for research data management, to give [potential] researchers, funding bodies, and the public confidence in their ability to handle data appropriately.
  3. Discovery – the research councils (probably) should push for common metadata standards for describing datasets and underlying data-generating research/experimental processes.

Lunch followed, and I had time to hear two more presentations in the afternoon before I had to run for a bus:

Catherine Moyes of the Malaria Atlas Project: in effect, demonstrating what really clear and consistent management of large-scale (geo)data looks like. This seems to consist of an extremely rigorous approach to requesting, tracking, and licensing data from the contributors of the project’s data… and an equally strict (but in a good way) expectation of clarity when dealing with requests from third parties to use the data. If that all comes across as restrictive, I’d point to Catherine’s slide on ‘legalities’ of the data that the Malaria Atlas Project has released openly – it’s about as open as it gets, with no registration needed, no terms & conditions placed on re-use of the published data, and all software/artefacts released under very permissive and free licences (Creative Commons or GNU). N.B. the Orbital project should look at the Malaria Atlas Project’s ”data explorer”, available via map.ox.ac.uk, as an example of a really nifty set of applications built on top of openly accessible and re-usable data.

Finally (and I’m sorry I only got to hear part of his presentation), University of So’ton chemistry professor Jeremy Frey on their IDMB (Institutional Data Management Blueprint) Project—southamptondata.org—and some rather funny anecdotes about the underlying knowledge, expectations, and problems faced by researchers managing their own data, which emerged when they were surveyed as part of the above project.

Lots to take in (lots). But some useful suggestions for Orbital, which I’ll be bringing to the next project meeting: and plenty more reading material which I’ll add to the project reading list asap.

Paul Stainthorp, lead researcher on the Orbital project.

OA week buddies: putting the Repository at the centre of an institutional research information system

Posted on October 28th, 2011 by Paul Stainthorp

As promised, here’s the first of three blog posts about this week’s trip to the University of Glasgow, sponsored by the RSP for Open Access week 2011, on the theme of Repositories and REF preparation.

Old university library

Three of us made the trip north: myself, the Library’s Repository Officer, Bev Jones, and the University of Lincoln’s new REF Co-ordinator, Melanie Bullock. We were received and looked after very generously by the team at Glasgow, including Susan AshworthMarie Cairney, Morag GreigValerie McCutcheon, Robbie Ireland and William Nixon. Thanks to them all for making our visit pleasant as well as useful.

Over the course of a morning, we discussed many aspects of research information management, the REF, and developments to our own institutional repositories (and repositories in general).

I made copious notes, and reading them back I thought it might be useful to identify and list some of the factors that seem to be necessary (or at least desirable) in successfully placing the repository at the heart of an institutional research information/administration system – what makes it possible for the repo to play its part?

Before started: this is my own interpretation, filtered through my brain, notes, and prejudices. It doesn’t necessarily bear any relationship to the real situation at the University of Glasgow. Nor, for that matter, at the University of Lincoln…

Here’s a checklist of what’s making repo-REF integration work for Glasgow:

  1. Good data. Because it’s not about your component systems, it’s about your data. Decide what data you have/what data you need and what you need to do with that data – then any tool that matches those requirements is the right system for you. You can always change your systems, but your data are here to stay. Design your system around your data, not the other way around. It’s necessary also to decide early what information to store, then to defend that decision vigorously – best to store the complete record of a publication or a project, NOT the filtered, controlled, ‘for-public-consumption’ version of it.
  2. Good relationships: between the library, research/enterprise, ICT services, schools/faculties, etc.: but not only at an operational/service/development level; it’s essential to have joined-up thinking about research data and systems at a management–strategic level. Glasgow seem to have this in spades.
  3. An idea of where you’re headed. Glasgow have received JISC funding to do interesting development work across a number of projects (the most notable in 2009/10 being the Enrich project), but haven’t let the funding distort their overall plan – they haven’t lost sight of the overall aim. While the outside world sees separate projects [until our visit I was personally bewildered about how it all fit together…], Glasgow have the bigger picture in mind! It’s the Research and Enterprise Operations Manager‘s job to make sure it all hangs together, working closely with the repository manager and the head of ICT services (see 2).
  4. A good development culture. The way Glasgow manage their development depends on the bit of the system in question. They have developers in each bit of the university (and centrally as part of ICT services). It’s important to recognise the [occasionally attractive] danger of rushing off and building something to meet a local need, while at the same time jeopardising the bigger picture for research administration.
  5. Taking your users’ needs seriously. Glasgow have a rigorous approach to stakeholder analysis and ‘workload modelling’. Quite often, people working in universities aren’t used to being asked what they actually need a system to do. Genuine user engagement has paid dividends.
  6. Mandatory data processes – not just mandated deposit of the final publication. Achieved through diktat of the research strategy committee; the attitude of senior management is “…if it’s not in Enlighten, it doesn’t exist!”. High-level advocacy win! The respository/research information system plays a part in the staff appraisal process and for SMT planning. “Advocacy is beating people with a big carrot.”
  7. Internal miniREF-type exercises. Glasgow had a big internal drive, and more than 1,200 staff responded. Suddenly, people became much more interested in the quality of their own data(!) and in the completeness of their publication record. Having information about all known publications  in one place has increased interest in metrics from the repository. Publication “healthcheck” exercises – informing a university-wide publication policy.
  8. Useful reporting tools – make it as easy as possible for your users to get data out of the system, via intuitive, meaningful export tools, and in useful formats (Excel output is always good!). Basically, reduce the temptation for people to build their own local silos of data by making it more attractive for people to invest in the repository/institutional research system, safe in the knowledge they can always get the data displayed and/or exported the way they want it.
  9. A secret agent in every faculty. Offer training and additional administrative rights to research administrators in academic departments – encourage a culture of devolved/outsourced deposit, advocacy and administration. Allow administrators to ‘impersonate’ academic authors for deposit/editing. Use bibliographic services (e.g. the Web of Knowledge) to send alerts to schools as a trigger to initiate deposit; allow schools to use these alerts/feeds to create records en-masse through filtering. Learn to talking in a language appropriate to different subject areas. Let the schools/faculties add value to the repository!
  10. Time and a head start. Glasgow’s overall research information infrastructure is well-established. Probably 90% of the system was in place 2 years ago. While we don’t have a time machine(!) we should at least recognise that proactive, consistent, ongoing development is far better than a reactive approach (“Quick! Build me something to deal with the REF!”). Invest in the repository/research information system now, and you’ll reap the benefits when an information need does arise in future.

That’s it for now – except for some links:

OA week banner

I belong to GlasgOA

Posted on October 21st, 2011 by Paul Stainthorp

Next week (24-30 October) is Open Access week.

Open Access week banner

To ‘celebrate’ OA week 2011, I and two colleagues are taking part in a Repositories Support Project (RSP)-sponsored visit to another group of University repository staff: in fact we’re travelling the 300-ish miles up to the University of Glasgow to spend a day with the team responsible for Glasgow’s repository, Enlighten (William Nixon and his colleagues), discussing Repositories and REF preparation.

I’ll be blogging our visit and the themes of our discussion, both here and on the RSP blog at: http://rspproject.wordpress.com/

I’ve never been to Glasgow—university or city—and I’m personally very much looking forward to visiting! But I do know from previous RSP events and from Will’s Enlighten blog posts that Glasgow are involved in a load of very interesting work around building ‘CRIS-like’ functionality by extending and developing EPrints software.

Thanks to Will and his fellow repo rats for playing host for the day!

Kelvingrove & Glasgow University

Developing the UKCoRR website

Posted on October 19th, 2011 by Paul Stainthorp

I was at the University of Nottingham, yesterday, for the annual face-to-face meeting of the UKCoRR committee. (Unfunded as UKCoRR is, all other committee meetings—we have one every couple of months—are teleconferences using Powwownow. But it’s immensely valuable to get together in person at least once a year.) Amongst other things, we discussed the recent survey of UKCoRR members, and the next members’ meeting, planned for January 2012.

My #1 priority as UKCoRR ‘Web & Publicity Officer‘ is to upgrade the group’s website (www.ukcorr.org).

Screenshot of the old UKCoRR website

The old website – graciously funded and hosted by the CRC at Nottingham for the past n years, is beginning to show its age. I’m copying over all the content to a WordPress site hosted at the University of Lincoln; as soon as it’s the equal of the ‘old’, current site, we’ll transfer the *.ukcorr.org domain over, and take it forward from there.

You can see the (extremely very much still in-development) new UKCoRR website, for the time being, at: http://ukcorr.blogs.lincoln.ac.uk/

Screenshot of the new UKCoRR website - in development

Rough notes from a JISC emerging bibliographic tools workshop, 5th October 2011

Posted on October 12th, 2011 by Paul Stainthorp

I was at Goodenough College in London last Wednesday, 5th October 2011, for a workshop organised under the JISC Discovery programme (discovery.ac.uk), to discuss approaches to publishing, managing, and using Open Bibliographic data (OBD) on the web. Here are some of the notes that I made on the day. I’ve left them rather rough because I don’t have time to bully them into proper paragraphs.

The workshop started with a general overview and discussion of the current picture of OBD.

  • We’re dealing with a growing number of technologies for open library discovery: Linked Data, BibJSON, OPDS (based on Atom), Lincoln’s NoSQL/API-centric approach, even SuperMARC(!?).
  • Few if any people have a good handle on all of these approaches, but we ought to be at least conversant with them.
  • We’re a room full of experimenters! But how can we communicate Discovery/OBD to others? How can JISC funding be used to support the work? We need to surface not only tools and data but also skills.
  • Possibility of looking to e.g. DevCSI/Netskills to help with addressing the skills gap. Are CompSci graduates being encouraged to exercise their skills in open/community development?

We then split into two groups to brainstorm “what’s interesting in bibliographic data at the moment?”: the two groups managed to fill around 8 flipchart sheets :-)

Photo of a flipchart covered in writing

A few quotes and themes I picked up on:

  • What will be the value of OA repositories in hindsight? Will it be open data (some are skeptical) or rather will it be their effect on the publishing industry?
  • A really useful application would be a fits-all API to identify possible identifiers within a record/page – ”I think this is an identifier, please tell me what sort it is” – which then leads into a web service to aggregate information about the thing itself (rights information, etc.) – jokingly called “Rate my Regex”! – some interest in this as a project.
  • Paul Walk: “Please an we have a day off from Linked Data!?
  • Idea of the role of “data doctor/data wrangler” gaining some currency in institutions.
  • There are plenty of code libs for dealing with bibliographic data: pymarc, MARC4JMARC::Record (perl). solrmarc.
  • Owen Stephens: “MARCXML is the worst of MARC combined with the worst of XML. It’s rubbish.
  • A colleague of Peter Murray-Rust (sorry, I didn’t catch your name!). Citable data is not copyrightable. Java library containing ~20,000,000 open article records???
  • Mark MacGillivray[?]: “To most people, this [taps laptop] is just a plastic box full of magic.

After lunch we split again, this time into three groups, each to consider a different aspect of managing Open Bibliographic Data; each to consider opportunities, costs, pitfalls, etc. relating to the technologies themselves as well as to the skills needed in exploiting those technologies:

  1. Transforming data
  2. Munging data (both groups 1. and 2. agreed that the two steps are really the same thing – just “more transformation” – also that ‘munging’ is an awful word…)
  3. Exploitation of data

I was part of the ‘Munging data’ group.

Challenges

  • Problems in the move from a unitary system to distributed data services – loss of control (quality of 3rd-party data can be a problem for the librarian mindset!), worries over sustainability of mashup-style approaches (c.f. dbpedia, BBC RDF, the now-defunct Talis Silkworm project). However, openness itself provides some guarantee against things becoming defunct (i.e. Open Source Software)_.
  • Need to think about the capacity (and the uneven geographic distribution) of local skills
  • “Any data is better than no data”. Use of third-party open data is not really a challenge for management any more (only cataloguers care!)? But still important are notions of provenance, attribution, putting power back in the hands of the end user.
  • We need to think at the citation level – is there a big difference between personal and institutional data?
  • Character encoding!

Gaps

  • Skills. Not enough developers. Unevenly distributed geographically. (Can we construct a course/curriculum for open community development skills?).
  • #ukdiscovery is somewhat distant from the mundane concerns of libraries. Ed Chamberlain is speaking to a group of cataloguers in Oxford about OBD – that’s the sort of thing we want!
  • Thinking about the role of CILIP and ‘professionalism’ – keeping [technical] skills up to date. Portfolios/competency framework approaches. Can we get a push from the top of the library profession?
  • Technology gaps, on the other hand, have mostly gone away. There are enough interesting and easy things to keep us busy without having to worry too much about the things that still don’t work. JISC can help to convince (smaller?) institutions that open development should be trusted.

Opportunities

  • Still attempting to overcome legacy licensing issues. Instead of concentrating on dealing with old data, why don’t we just take a “line in the sand” approach and make sure we’re being 100% open from now on. Do the OBD principles need to be extended?
  • Make use of feedback loops. Learn something about your data by feeding how it’s been used back into the system. Use this usage to inform your transformations.

</end>

Building an e-library in a new university in Ghana

Posted on September 9th, 2011 by Paul Stainthorp

Kumasi RailtracksDr Kofi Appiah, a postdoctoral researcher from the University of Lincoln School of Computer Science, is spending a year in his native Ghana to help establish a school of technology in a new HE institution there: the Christ Apostolic University College (www.cauc.edu.gh) in the city of Kumasi.

Before he left, Kofi asked me for advice on how he could help the new School of Technology build an e-library infrastructure and/or access to e-library resources for their CompSci students and staff.

I suggested he look at a few things:

What else would you suggest? I’ll forward on any suggestions to Kofi, or you can email him yourself if you prefer.

Thanks!

RSP CRIS event – Tuesday 22 July

Posted on August 3rd, 2011 by Paul Stainthorp

We apologise for the late arrival of this blog post.

On the 22nd of July I was at the University of Nottingham for an RSP (Repositories Support Project) event, Repositories and CRIS: working smartly together. A few of us from the UKCoRR committee were there, giving UKCoRR’s new Twitter account some hammer. My colleagues, David Young from the University Research Office and Elif Varol from the Library, also went.

Here are some very brief notes on the various presentations and activities – all of the slides are on the RSP’s website.

  • Simon Kerridge of ARMA (on the research administration, the CERIF standard, and the EXRI project). This has already led to some movement on the idea of a JISCMail ‘super list’ to allow information to be shared easily between members of ARMA and UKCoRR. All the talk of CERIF and REF requirements has also prompted us (Lincoln people) into action – a separate blog post about this will follow.
  • RePOSIT presentations and breakout discussion – this was great fun. Like being back at the RSP Winter School again. Repository work and advocacy makes far more sense and the panic easiest quelled when I talk to other repository managers around a table.
  • After lunch: more on euroCRIS from Mark Cox of King’s College London. Loads to look at, including the R4R (Readiness 4 REF) plugin for EPrints, and MICE (Measuring Impact under CERIF).
  • The University of Glasgow’s “alternative approach”, involving some hardcore use of EPrints. This is the model Lincoln is following and it’s great to see it working so successfully for Glasgow. See their Research Outcomes work and Will Nixon & colleagues’ Enlighten blog. Also related: EPrints: A Hybrid CRIS/Repository.
  • Finally, a whistlestop tour of EPrints version 3.3 and some of its new features, including one-click installation of plugins from the EPrints “Bazaar”. Looks very cool.

At this point: run for bus.