Tuesday, 9 December 2008

Research Data Management and Curation Services Framework

In the last months we have been conducting a consultation with service units in Oxford to validate the requirements gathered through the researchers interviews as well as to define what data management services are on offer and where the gaps in service provision are.


Researchers' top requirements for services were:

  • Advice on practical issues related to managing data across their life cycle. 
  • A secure and user-friendly solution that allows storage of large volume of data and sharing of these in a controlled fashion way allowing fine-grained access control mechanisms.
  • A sustainable infrastructure that allows publication and long-term preservation of research data for those disciplines not currently served by domain specific services such as the UK Data Archive, NERC Data Centres, European Bioinformatics Institute and others.
Those requirements helped to produce the following framework of research data management services :



Data Management and Sharing Plans

Support and advice to help researchers prepare their data management and sharing plans.

Legal and Ethical

This service includes support to assist researchers with t

he legal and ethical implications of creating, sharing and using data.

Best Formats and Best Practice

Support for researchers to decide which are the best formats and practice for producing and documenting specific data. This service may also include provision of support for database design.

Secure Storage

Secure storage includes infrastructure that allows storing research data providing backup and version control capabilities amongst other things.

Metadata

Tools and support to permit researchers describe their data from the moment of creation

Access and Discovery

A support service as well as tools to help researchers locate and access research data. This service could also include tools to help research groups to find about their data resources using the Data Audit Framework methodology.

Computation, Analysis & Visualization

Software and computing resources that allow analysis and visualization of research data as well as the training needed to equip researchers with the appropriate skills.

Restricted Sharing

Technical infrastructure to share research data with selected individuals or groups.

Data Cleaning

Support to clean and prepare data to the standard required for publication. This service should include help with anonymizing data.

Publication

Infrastructure that permits researchers to publish documented data and link them to research articles and other materials located in other repositories. In some cases researchers may want to exploit their data commercially. DRAMBORA could serve as a tool here to assess repositories that publish the data.

Assess Value

One of the main challenges with research data is deciding what data needs to be kept and for how long.

Preservation

This service would be responsible for looking after the data in the long-term applying the required measures so that the data is accessible through time.

Add Value

Once the data is stored with the metadata associated with it, value can be added by organizing similar data in groups, promoting it, linking it to other materials or allowing annotations.


In order to validate this framework we mapped it to the DCC Curation Lifecycle Model, see below: 

Mapping between DCC Curation Lifecycle and Research Data Management Services

DCC Lifecycle Model Sequential Actions

Research Data Management Services

Description

Conceptualise

Data Management/ Sharing Plans; Best Formats and Best Practice; Legal and Ethical

This stage is related to services to support researchers in the production of data management and data sharing plans. It is also related to the advisory services for best formats and best practice for data creation as well as legal and ethical services for data creation (for instance to clearly define the ownership of the data to be created or how can they be used) and sharing.

Create or receive

Best Format and Best Practice; Metadata

At this point researchers need support to figure out best formats and best practice as well how to best document their data with appropriate metadata.

Appraise and select/ dispose

Assess Value

This phase relates to services to assess value of the data.

Ingest

Data Cleaning, Add Value

Before data are ingested, they will need to be prepared and cleaned. During ingestion other information can be added to enhance them.

Preservation Action

Preservation

Obviously relates to preservation services.

Store

Secure Storage

This stage clearly relates to the secure storage.

Access, Use and Reuse

Publication; Legal and Ethical; Computation, Analysis and Visualization

This phase relates to several of the research data management services. Publication of data as well as access and discovery belong to this stage.  When publishing data there is a legal aspect that needs to be addressed and hence the relation here to legal services. In addition to this, the use and reuse of data is tightly coupled to analysis and computational services.

Transform

Computation, Analysis and Visualization; Preservation

Transforming the data relates to producing new derived version of them by either analysis, visualization or for preservation purposes.


And now we are using this framework to establish the levels of service provided for each of the services in the framework in order to identify those that need to be develop further. 

Thursday, 20 November 2008

A bright future for research libraries


The Research Information Network (RIN) has just published "Ensuring a bright future for research libraries", a guide aimed at vice-chancellors and senior institutional managers to ensure that library and information services evolve in tune with researchers' needs. I participated as a member of the working group and it turned out to be a rather interesting and useful exercise.


The guide provides a framework of issues that need to be considered when developing library and information services. One of the framework headings curation, preservation and disposal includes a section on research data:


"cooperate with research funders, others institutions and specialist agencies in developing a coherent and comprehensive framework of services to ensure that valuable research data are managed effectively from the point of creation, preserved and made accessible to others."
A selection of good practice examples are also offered and the LSE Data Library features there as a research support service that establishes close connections between library and researchers.

Friday, 14 November 2008

Datasets Seminar in Madrid

On Monday I will be participating in a seminar organized to discuss the issues around the inclusion of data in digital repositories. The event has been organized by the Consorcio MadroƱo, a consortium of universities in Madrid with the aim of fostering inter-library collaboration. The seminar will also bring Alicia Medina from UNED, Stuart Macdonald from the University of Edinburgh and Dr. Celia Russsell from Manchester University.


I have been told that the presentations will be filmed and made available on the web. As soon I know where I will post it here. 

 

Monday, 27 October 2008

Institutional and National Research Data Management Services


Last week the second event organized as part of the Scoping Digital Repository Services for Research Data Management project took place at the Said Business School. The aim was to hear about examples of services to support researchers with their data management duties and encourage discussion amongst service units in Oxford. We had a great group of speakers coming from: 
  • San Diego Super Computer Data Central, 
  • the UK Research Data Service, 
  • the Digital Curation Centre, 
  • the UK Data Archive, 
  • the NERC Environmental Bioinformatics Centre, 
  • the Archaeology Data Service and 
  • Oxford Legal Services. 
Throughout the day a wide range of services were described including infrastructure and tools for storage, access, discovery, use or preservation as well as services related to support, advice and training starting as early as possible in the research lifecycle.  The final panel discussion between some of the service units in Oxford evidenced the need for coordination and funding to provide the range of support services that Oxford researchers need. 

Saturday, 11 October 2008

iPres'08 and DCC Digital Curation 101 Course

In the last couple of weeks I have attended the iPres 2008 conference at the British Library and theDCC Digital Curation 101 course at NeSC. Both events have been great to learn more about latest developments, tools and concepts on digital preservation and data curation.

iPres 2008 has been widely commented in other blogs such as the DCC's and Neil Beagrie's. I particularly liked the keynotes from both days.  Lynne Brintley ,CEO British Library, highlighted the increasing maturity of the digital preservation field,  called for join up of pieces of the big jigsaw and for a different label like 

digital life or preservation for long-term access. The second day's keynote brought Dr Horst Forster, from the European Commission, who  argued that “less than 20% of UK organizations have preservation as per legal requirements and 50% still print the documents to preserve them” and ensured that the European Commission will continue to fund activities in this area.

Some of the tools that called my attention included JHOVE 2 and PRONOM. The former helps to identify, validate and characterize digital objects whilst the latter is a format registry. The Danish Data Archive DANS presented their Data Seal of Approval, a collection of guidelines for producing data future-proof.  It was also of interest to see some presentations touching on issues around collaboration between advanced computational science and digital preservation services (see: Data Center for Library of Congress Digital Holdings and The Chronopolis Digital Preservation Archive and Demonstration). 


The DCC Digital Curation 101 course combined an excellent set of lectures and hands-on exercises provided 

by leading experts. The sessions were structured using the stages of the DCC Curation Lifecycle Model: 

  • conceptualize, 
  • create&receive, 
  • appraise&select, 
  • ingest&store, 
  • preserve, 
  • access&reuse.  

During the four days the course lasted, the lectures covered the previous stages pointing out the issues present and highlighting the importance of early (in the lifecycle) collaboration between researchers, information professionals and IT specialists. The hand-on sessions allowed us to:
  • use some sample data sets to identify their significant properties, 
  • practice using the Data Audit Framework, 
  • look at data quality issues and assessed file format robustness,  
  • find and use data from the British Atmospheric Data Centre, 
  • create a detailed data management plan putting in practice everything learnt during the course.
The course was brilliantly organized and delivered. There was the perfect mixed of researchers, information professionals and IT specialists. Perhaps in the future the course could focus on specific disciplines but I really hope there will be more. 


Thursday, 2 October 2008

QUT's Practical Data Management Guide

Queensland University of Technology has just published "Practical Data Management: A Legal and Policy Guide". This is a very useful resource for researchers and repository managers with an interest in managing legal rights in data.


The guide provides a detailed introduction to copyright covering what it protects, who owns it and how is relevant for data available in digital repositories.  It then considers other legal aspects of data management such as moral rights, confidential information and privacy. After looking at data management policies that all research projects should have in place, it covers what data management plans (DMP) should cover :

  • ownership of the data to be generated
  • who is responsible for managing the data
  • address the legal controls applying to collection or use of the data 
  • security to protect the data collected
  • the long-term preservation of the data
It is highlighted that " [a] data management plan will need to be consistent with the research institution's IP policies and data management policies"
 

Friday, 26 September 2008

Spatial Data Visualization Tools Briefing Paper - DataShare

The JISC funded DataShare project has just published the second part of the briefing paper Data Visualization Tools: Spatial Data in a Web 2.0 Environment and Beyond by Stuart Macdonald. The paper describes an incredible collection of spatial mash-up and geo-browser tools and shows examples of academic research use of some of these technologies.


The vision proposed is that "by opening up their code to repository developers (e.g. through APIs) numeric and spatial data tools could be utilised further within this environment to engage potential depositors, to enhance output, and to provide analysis and visualisations as part of ‘value-added’ functionality."

This is a must read for anyone interested in Geo Web or Geo 2.0!



 


Friday, 19 September 2008

New JISC report on role and skills of data scientists

I wrote a summary of this report for DISC-UK DataShare 

"In my opinion, this is another very useful report that, amongst other things, calls for urgent action from research funders, universities and the library community to not only train future professionals in this field but to develop ways to recognise their work. I am very pleased to see the strong emphasis on libraries and data librarians and I am looking forward to see how the recommendations can be addressed. "  

Wednesday, 17 September 2008

Upcoming Project Workshop - 20th October

The Institutional and National Services for Research Data Management Workshop is the second event organized as part of the Scoping Digital Repository Services for Research Data Management project. 


Speakers from San Diego Supercomputer Centre, the UK Data Archive, the Archaeology Data Service and the NERC Environmental Bioinformatics Centre will show through examples and case studies the types of research data services that could potentially be delivered at Oxford. After this, a panel formed by representatives of Oxford service providers will discuss their role in supporting researchers with their data management.

To find more information and register please visit:

Saturday, 13 September 2008

UKRDS Interim Report

SERCO has published the UKRDS Interim Report where "The Committee is asked to consider the role of UKRDS as a vehicle for achieving coherence in data management strategy and service provision across the UK."

This report analyses the current situation in the UK with a detailed look at relevant literature and funders policies, it then moves into describing the emerging trends of local data repositories and national facilities to finally look at Australia, the US and the EU, competitors and partners of the UK. It suggests three options for UKRS models:

  • No change
  • Massively centralised
  • Hybrid/Umbrella

The Hybrid/Umbrella model is presented as a lower risk and more managable option and promotes the UKRDS "representing the interests of many UK data repositories, both those based around single institutions and those based on storage for a single discipline." Amongst the functions of this model of "grid computing and cloud-base data storage", UKRDS would play the role of mediator, standards-setting body, source of information similar to the Digital Curation Centre (DCC) and in time a data repository in its own.

I highly recommend everyone with an interest in research data management to have a look at this report as not only it captures the current state of affairs in the UK and elsewhere but also offers possible ways forward. The Hybrid/Umbrella model seems like a sound option to me, specially if UKRDS would be able to address what Liz Lyon's termed as the big gap during the Oxford Workshop, the joint up from funding bodies and councils for a national data strategy . As far as I am aware, UKRDS will report to HEFCE at the end of the year and hopefully the UK will have a new body to support and guide those institutions that, like Oxford, have an interest in developing services for research data management.

Thursday, 21 August 2008

Oxford to use the Data Audit Framework as part of DataShare

In the coming months the Universities of Southampton and Oxford will conduct data audits using the JISC Data Audit Framework Methodology as part of the JISC DISC-UK DataShare project. 

JISC funded the Data Audit Framework (DAF) Development Project to develop a methodology to enable universities to carry out exercises to map research data resources and data management practices across departments and research centres. This should help departments and institutions to identify what data they have, where they are located and how they are being looked after as well as to raise awareness of the importance of actively managing research data. Four projects were funded to pilot the methodology at University College London, Imperial College London, University of Edinburgh, and King's College London.


In the University of Oxford, Luis Martinez-Uribe will conduct the data audit as part of the Scoping Digital Repository Services for Research Data Management project. After having captured requirements for services to assist researchers with their data management, Luis will use DAF to gain deeper understanding of the research data held in some departments and those data published on departmental websites.


Harry Gibbs and Teresa McGowan will be leading the DAF work at the University of Southampton aiming to build a picture of the research data used and generated in the School of Social Sciences and to gain understanding of the School’s current data management processes.


Sunday, 3 August 2008

Repository Fringe Presentation

The Repository Fringe took place in Edinburgh on July 31st and 1st August. A very interesting and successful event that I hope will have continuation in future years.

Stuart Macdonald blogged Dorothea Shalo's keynote, some post conference notes and very kindly a report from my presentation

Friday, 25 July 2008

Findings Report-Oxford Data Management

The findings from the scoping study interviews with Oxford researchers and the complementary workshop are now available at : http://www.ict.ox.ac.uk/odit/projects/digitalrepository/findings.xml


The findings report documents current data management practices amongst Oxford researchers and it also captures the top requirements for services to help researchers manage their data more effectively.


Thanks to everyone who participated in the interviews and the workshop for their time and support. If you would like to comment on the report please send your feedback to me.


Tuesday, 22 July 2008

JISC Innovation Forum: Research data-Whose problem is it?


On the 15th and 16th of July the JISC Innovation Forum took place at Keele University. One of the four themes of the forum was research data and it included three sessions:

 - legal and policy 
 - capacity and skills
 - technical and infrastructure     

These sessions generated a lot of interesting discussion and are blogged at: http://jif08.jiscinvolve.org/theme-2-the-challenges-of-research-data/ 

Particularly useful to this project is Matthew Dowey's summed up of the brainstorm on the technical&infrastructure session with the four priority areas for JISC:

   1. within a research environment – can we facilitiate the data curation using the carrot of sharing systems? (IT systems in the lab)
   2. additional context beyond the metadata
   3. how do we help institutions understand their infrastructural needs
   4. what has to happen with the various dataset systems (fedora etc) to help them link with the library and institutional systems

Friday, 20 June 2008

Workshop presentations available

The Research Data Management Workshop was successfully held  on Friday 13th of June at the Said Business School. There were 46 attendees throughout the day from 24 departments, research centres and colleges. There were also representatives from JISC, UKRDS, RIN, DCC, EBI and the NERC Centre for Ecology and Hydrology.


Feedback from the workshop suggested that participants liked particularly the discussion at the end, the range of examples shown and hearing about data management issues from researchers. They missed the perspective of Humanities researchers and funding bodies. For future workshops participants ask for more information on IP related issues, more practical examples tailored to the different divisions and learning about who can help them with their data management problems.

Presentation slides and audio recordings can be found at : http://www.ict.ox.ac.uk/odit/projects/digitalrepository/Workshops.xml

Friday, 6 June 2008

Data Librarianship

My colleague Stuart Macdonald, Edinburgh University Data Library, and I were interviewed last year by CILIP's editor Elspeth Hyams. The interview focused on our roles as Data Librarians and the emerging need of Librarians to participate in the active curation of research data.


The interview is available at:

Tuesday, 20 May 2008

Research Data Management Workshop

I am glad to announce the first event organized under the umbrella of the Scoping Digital Repository Services for Research Data Management. This first workshop is aimed at Oxford researchers to hear about good examples and interesting practice on data management across disciplines. So far we have secure an outstanding group of speakers and I am sure it will be a extremely useful event for researchers in Oxford.


To find out more information and register go to : www.ict.ox.ac.uk/odit/projects/digitalrepository/Workshops.xml

Wednesday, 23 April 2008

Scoping Study Interviews - Call for volunteers

As I mentioned in the previous post, I am conducting interviews from the end of April until mid June and welcome candidates who wish to participate in the study to share the views and issues of their research community.


This is an excellent opportunity for researchers at all levels to specify their requirements for data management services (including creation, storage, sharing, publishing, preservation, etc), so that these can be addressed in future developments.  

If you would like to take part in the scoping study, interviews should not take longer than 1 hour, please get in touch with me at Luis.Martinez-Uribe@oerc.ox.ac.uk

In case you don't have the time to participate in the interviews but still want to comment about your data management needs, you can do so in the previous post

Tuesday, 22 April 2008

Scoping Study - Space for Comments

I am currently conducting interviews until mid June for the Scoping Digital Repository Services for Research Data Management.  These interviews aim to document how researchers at Oxford manage their data and capture their requirements for services to help them doing this more effectively.


We are aware of the busy schedule of our researchers and the impossibility for many of them to participate in these interviews. This is the reason why we invite researchers from Oxford who cannot attend the interviews to use this space to comment on their data management needs.

Please leave your comments below!

Friday, 21 March 2008

A UK Research Data Management Forum

The UK Research Data Management Forum inaugural workshop was organized by the Digital Curation Centre (DCC) and the Research Information Network (RIN). A mixed audience  formed by staff from data centres, data managers, some research councils, JISC, librarians, repository managers and IT specialists came together to hear about, discuss and share ideas around data management.


Michael Jubb, Director of RIN, gave the keynote address delving on RIN's Research Data Principles. A number of data categorizations were mentioned to show the variety and inconsistency of those and it was highlighted the need to reward researchers for making their data available.

Martin Lewis, Director of Library Services and Librarian at the University of Sheffield , talked about the division of responsibilities between data centres, institutions, libraries and others pointing out the need for a national data strategy. After this he introduced the UK Research Data Service (UKRDS) , a feasibility study to assess costs of a national share service to deal with research data generated in Higher Education Institutions (HEIs). It is worth mentioning here that Oxford is participating in the UKRDS as one of the four case studies.

The next presentation brought Mark Thorley, NERC's Data Management Coordinator,  to discuss how to provide appropriate skills in data curation. Mark pointed out the importance of understanding why data needs to be managed to help identifying the skills needed and proposed three key players : researchers, data managers and informaticians. The shortage of data managers and the need to professionalize their career were also issues raised in this presentation.

Finally, the breakout sessions had the mission of discussing the remit of the forum and suggest what could the DCC  do to help providing appropriate skills for data curation. Most attendees agreed on the usefulness of having this forum to discuss practical issues but also recognized that a lobbying group is also required. It was noted throughout the workshop that dealing with the data deluge is a huge task and there is a need to provide evidence of the benefits of curating research data and nurturing a workforce of data managers.


 
 
 


ShareThis