eResearch projects, quick update

[update 2013-04-09 – a couple of minor corrections]

This week the eResearch steering committee at the University of Western Sydney is meeting for the first time. We will be bringing the new committee up to speed with all the existing projects, and diving into detail on some key projects.

This is a very quick high-level overview of the status of our major projects, all of which have been reported-on here on the blog before, apart from the very newest.

Finished

The Seeding the Commons project was recently successfully completed. This project was funded by the Australian National Data Service (ANDS) to establish infrastructure for a Research Data Catalogue (RDC). ANDS call these kinds of catalogues ‘Metadata Stores’. This was not just about software, it was about taking the first steps to creating an organisation-wide culture of data management, along with the DC21 data capture project described below, which is still going.


The UWS library team led this project, and they will be providing a project summary, including lessons learned and benefits accrued for publication here soon. Thanks team! (Their report will name all the names that need to be named).

Ongoing

HIEv (Nee DC21)

Another ANDS-funded project, DC21, Data Capture for Climate Change and Energy Research is nearing completion.

There has been some solid progress with this one since Peter Bugeia from Intersect took over the project management late last year:

  • The software application has a new name: HIEv. The name is not an acronym. It’s pronounced ‘hive’. The HIE bit is a reference to the Hawkesbury Institute for the Environment.

  • It’s in production, gathering data from four major research facilities for use within the institute.

  • Version 1.8 was rolled out this week, with training for early-adopters in the institute to follow soon, to be delivered by Peter B and new institute data manager Gerard Devine.

The next steps are to do detailed real-life trials of two major workflows:

  1. Making sure facility data can be presented to researchers in usable cleaned up form in a way that minimises redundant effort and ensures that everyone is working with the same citable data-sets.

  2. Working out how to enable researchers to create research publish data and code that is as complete as possible in support of research data publications.

Over the next few months Gerry Devine will work to get as much (appropriate) data as possible from the institute into the system, and gather requirements to feed into a business-case for a further phase of the project.

The code for HIEv/DC21 is available on github.

Enterprise Research Data Catalogue (‘Metadata Stores’ – MS23)

The Metadata Stores project is nearly complete. We see this as an extension of the Seeding the Commons project, which recently concluded. Like that project this is as much about working with the research community to create new ways of working in an increasingly data-driven research landscape, as installing software. But install software we have – the library has implemented the open ReDBOX research data management software funded by ANDS and now used by more than a dozen Australian Universities.

The work on the catalogue has always been seen as part of a larger effort at UWS: the Research Data Repository Project.

Research data repository (RDR)

The RDR is a key part of the eResearch strategy at UWS (we don’t have a formally endorsed strategy, mind, that’s what the new committee is there for). There are lots of ways to carve-up ‘eResearch’ but we are working with a simple model underpinned by three ‘pillars’:

  1. Research Data Management.

  2. Research Computing (including all kinds of devices from puny smart phones and tablets to cloud servers and High Performance Computing (HPC)).

  3. eResearch Collaboration tools and services.

The raw infrastructure is only part of the picture but it is the foundation. At UWS the Research Data Repository Project is the current focus for building this infrastructure.

image002

Figure 1 The eResearch model for UWS – by Peter Sefton & Sarah Chaloner

Project manager Toby O’Hara has driven the rollout of the RDR – including project managing the Research Data Catalogue and the first basic Research Data Storage services for working data. On the working data front we now have some dedicated research data storage that can be accessed in various ways:

  1. As ‘R Drive’ shares.

  2. Mounted directly to research applications as database storage.

  3. Linked to replicated file-management service, such as Dropbox.com. A group of early-adopter are testing a process for sharing their files with a UWS Research Data account that links Dropbox (and soon other services) with backed-up university-provided services.

Buying storage is simple enough, but in an organisation with several thousand users, making sure that the help-desk know how to turn-on that storage for the right people, and help them use it is far from trivial, and definitely not quick. We’re on the way, though.

Next up, the draft plan calls for:

  • Providing services for our researchers who use code-version-control systems. Git and Mercurial are the current favourites – the researchers who live by these are the poster-children for reproducible research, and

  • Developing formal research data management plans across all parts of the university.

  • A campaign to put in place data capture projects for as much strategically important research data as possible.

  • Establishing a link between working and archival storage via a project with the working title Crate It – Cr8it! – see the new projects below.

Provided, that is, that we can get the resources to keep going.

 

New projects

Human Communications Science Virtual Lab

The major new eResearch project at UWS is the Human Communications Science Virtual Laboratory. This is a NeCTAR-funded project with a total budget in the region of three million dollars, 1.4 of which came from the Australian Government and the rest from a number of Australian institutions, led by UWS.  The HCSVlab has its own website with:

  • A statement of the problem we’re attacking.

    THE PROBLEM OF

    a lack of awareness, access and proficiency in the use of the full range of corpora, tools and techniques available to researchers of the diverse disciplines that constitute the human communication science research field

  • A description of the project.

    The HCS virtual Laboratory (HCS vLab) will connect HCS researchers, their desks, computers, labs, and universities and so accelerate HCS research and produce emergent knowledge that comes from novel application of previously unshared tools to analyse previously difficult to access data sets. The HCS vLab infrastructure will overcome resource limitations of individual desktops; allow easy access to shared tools and data; and provide the guided use of workflow tools and options to allow researchers to cross disciplinary boundaries.

RDR / Research Data Catalogue Spin-off: Cr8it!

The Research Data Repository we’re building at UWS encompasses two kinds of data in the Research Data Storage (RDS) component – there’s working data which is fluid, and archival data which needs to be managed for the long-term (or however long is required by the data management plan for a particular project).

Cr8it is designed to tackle the problem that many organisations are reporting ‘We bought a petabyte of storage, let people use it, and now that it’s full, we’re wondering what’s in all those files! What to keep?’

Cr8it will provide a web-view of research data files in a way that:

  • Makes it easy to see what there is in the working part of the Research Data Store.

  • Allows researchers to identify, describe and package data at various points in the research lifecycle to deposit end-of-project data sets or create published data for papers.

Creative Commons License
eResearch projects, quick update by Peter Sefton is licensed under a Creative Commons Attribution 3.0 Unported License.