UWS eResearch in Research Cloud Land

[Parts of this post appeared on Peter Sefton’s blog last week – go there for some more detail about the demonstration discussed here]

UWS eResearch at the NeCTAR Research Cloud day

Last week NeCTAR and the Australian National Data Service hosted a Research Cloud day in Sydney designed to show researchers and research-support units like the eResearch team at UWS how to use the free virtual servers provided by NeCTAR. Every Australian researcher with a login to the Australian Access Federation (AAF) should have access to the service and is entitled to a couple of months of server up-time on two basic machines with no application process. Various flavours of Linux are available, no Windows. If the service suits, then you can apply via an online form for more (free) time. These machines are good for running analysis or modeling process, can be configured to run desktop software via a remote desktop, and are useful for development work. (NOTE: when I say you should have access, you may not. If you can’t log in using your institutional credentials contact your local IT helpdesk and if you like, send me an email, I will let the NeCTAR people know so they can track these problems).

First, what is NeCTAR? (National eResearch Collaboration Tools and Resources).This is what it says on their ‘about’ page:

NeCTAR is an Australian Government project to build new infrastructure specifically for the needs of Australian researchers.

In this era of digital connectivity, NeCTAR is using existing and new information and communications technologies to create new digital efficiencies specifically for the needs of Australian researchers.

Australian researchers and their technical partners will drive the design of what NeCTAR will look like. NeCTAR is building:

NeCTAR is collaborating with a broad mix of international and national technology partners and research disciplines. From scientists to historians, archaeologists, software engineers and arts disciplines.

NeCTAR is a $47 million dollar, Australian Government project, conducted as part of the Super Science initiative and financed by the Education Investment Fund. The University of Melbourne is the lead agent, chosen by the Commonwealth Government.

There were a number of UWS people in attendance at the day apart from us, including researchers, a PhD student and one of the developers from the library systems team – we’ll be in contact with each of you to get your impressions of the day and to work out how to help UWS exploit the service.

Andrew Leahy and I (Peter Sefton) ran a workshop at the Research Cloud day, to help people get familiar with the process of starting up and connecting to virtual machines, and to explore some of the possibilities of the cloud. We put together a demonstration that showed how multiple cloud servers could be used in a chain to pull data from the Research Data Australia (RDA) portal, index it in another catalogue application, ReDBox, extract and re-index the geo-spatial data that’s in RDA and hook that up to Google Earth. The result is a demonstration app that lets you browse around Earth and see which areas have data.

Here’s a picture of all the senseis, ie workshop leaders, on the day – we’re the two furthest to the right. At far left is David Flanders from ANDS who has put a huge amount of energy into building strong communities for technical people working in eResearch both in the UK and now here in Australia.

Figure 1 The ‘senseis’ explaining what they’re going to be working on in their ‘dojos’. Photo by Steven Manos of Melbourne uni.

The UWS demo data browser

In our session we covered some of the basics of how to run servers in the Research Cloud, then focused on getting a particular application installed and running. To finish, we gave a brief walk-through of our demo.

Figure 2 The dojo demo. Photo by Steven Manos of Melbourne uni.

We have now installed the geo-index at UWS so you can try this out if you have Google Earth by downloading a KML file. This is a demo service only – let us know how you go.

We will work some more on this little very part-time project, but here’s a snapshot of where we are at the moment with our mission of getting RDA data into Google Earth.

Figure 3 Screenshot of data sets from Research Data Australia mapped onto the earth (visualized via their bounding boxes from the data collection metadata) from a long way up (the sets with the (claimed) broadest geographical coverage show up first, more appear as you zoom in)

Right now the experience is basic, but engaging, just using the geometry provided in RDA collections and without changing any of the default ANDS styles you can see a ‘heat map’ of where data have been collected.

Figure 4 Zooming in on research hotspots like The Reef shows how crowded it is there – we’re limiting this to 40 boxes or points per screen, so keep drilling to find more

Figure 5 You can click on a marker to pull up a description

Why?

Why are we doing this?

  • To test out the NeCTAR Research Cloud, so we can advise researchers at UWS on how and when to use it, and integrate it into our UWS Research Computing strategy – we’ll cover this in future posts and publish advice on the UWS eResearch site. Initial observations include:

    • It’s good for testing out software ideas quickly – you can fire up a tool, pull in data from a networked source and do something to it, but some planning is required to package the tools and get them to operate as turnkey appliances. At the moment you need command-line skills and patience.

    • This infrastructure is a game-changer – it can take months for university IT departments to deliver a virtual machine to a research or dev group and policy prevents many of us from buying commercial cloud services on corporate credit cards.

    • The lack of easy-to access persistent cloud storage which had not yet come on line from NeCTAR’s sister project Research Data Storage Infrastructure RSDSI is a bit of a problem, there is no simple way to mount a device with all your data on it.

    •  A specific lesson from our demo: being able to fire up servers on demand to handle peak loads, such as building a big geo-spatial index (we only built a tiny one) seems promising but we’d like to see this made as easy as possible for people who want to get research done rather than mess with computers.

  • To see how Andrew Leahy’s considerable expertise in practical, easy to deploy low-end visualization combined with my experience in repositories, harvesting metadata etc might be applied to resource discovery. Being able to fly around the landscape means you can follow geographical features and maybe discover useful data sets along the way.

  • To explore how the ReDBOX Research Data Catalogue application might be able to geo-index collections (see notes on this below).

It turns out there is a competition running for the best demonstration of data re-use using the cloud. The work that Andrew and I have started looks like the nucleus of an entry. We’d welcome others from UWS or beyond to join us particularly those with geo-located data we could add to the ANDS data set, or people with good ideas about information discovery and/or coding skills. Drop us a line if you’d like to join us in extending this demo.  

For a longer discussion of the demo see Peter Sefton’s post.

Copyright Peter Sefton 2012. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. <http://creativecommons.org/licenses/by-sa/2.5/au/>

Local Data Global Impact – The Seeding the Commons Data Project

University of Western Sydney Library

June 2012

Context

Many funding bodies now hold the position that the public has a right to freely access publications and data resulting from publicly funded research. Several major overseas funding bodies have mandated the deposit of such publications and related data in an institutional repository. The National Medical and Health Research Council (NHMRC)  has lead the charge in Australia, mandating from July 2012 the deposit of publications arising from NHMRC funded research into an institutional repository within 12 months of publication. Professor Warwick Anderson, CEO of the NHMRC states, at The Conversation, that improving access to publicly funded data is next on their agenda.

Big Picture

Publications containing analysis of data have until recent years been seen as the most legitimate output of research. The related data has stayed in the background, stored in hardcopy in filing cabinets or on laptops/hard drives with no backup. Sharing of the raw data was not encouraged, either to verify research claims or to use for new purposes. But culture around data sharing is changing and participation in this national project will provide an additional portal to showcase UWS researchers and their data to the international scholarly community.

Along with most major Australian research institutions, UWS has embarked on an Australian National Data Service (ANDS)( funded research data catalogue project. Seeding the Commons aims to register, describe and where possible provide open access to Australia’s research data – In essence building a National data library for Australia. Research Data Australia (RDA) is the portal through which this information is made available to the international scholarly community. RDA aims to link all information surrounding the data, such as researchers, their affiliations, publications analysing the data, grants funding the data, Field of Research (FOR) codes, keywords and related datasets- a complete picture of a research project. The actual data remains with the researcher/institution.

The UWS Journey To Date

At UWS the project is a collaboration between the Library, eResearch team, Office of Research Services, UWS research community and ITS.

Core stakeholders collaborated to identify UWS researchers or units having data which could be shared, or at least described. Identified parties are systematically being invited to participate and offered the following reasons to do so:

  • Data can be stored (and a citable profile created) in the new UWS Research Data Repository to ensure its long term preservation and accessibility.

  • Studies show a citable profile for research data can increase citations to that data and related publications.

  • May increase collaborative opportunities as metadata describing research is included in Research Data Australia (RDA) along with the data. RDA is harvested by Trove (National Library of Australia’s repository of Australian material), Google, Google Scholar and other search engines ensuring research may be discovered by a wide international audience.

  • Opportunity to draw data from other fields to assist with multidisciplinary research questions

  • Avoid duplication of research effort

  • Verify research claims

  • Data deposit in RDA will ensure compliance with some journal publication requirements, funding body mandates and the Australian Code for Responsible Conduct for Research.

  • A number of reputable scientific journals now require data related to publications be provided along with the article.

To date three data interviews have been completed and are currently undergoing quality assessment by ANDS before being tweaked and uploaded to RDA. Roger Dean from MARCS has made available (as an open access link) some data measuring the acoustical intensity of several musical pieces. These measurements have been stored and secured on the UWS Research Data Repository. Dr Gang Zheng prefers interested researchers contact him to obtain his diffusion coefficient data so he is aware of who is interacting with it. Five films depicting various aspects of the Hawkesbury Agricultural College are currently available on the UWS Archives site. A link to these films, and metadata about them will also be included in RDA.

In the Pipeline

A PhD candidate from the MARCS Institute was required to provide an open access link to his raw data with a paper he wanted to submit to an open access journal. This data is now in the RDR, the link provided and the paper submitted.

The Nanoscale Organisation and Dynamics Group is regularly asked to supply pulse sequences to external researchers. They plan to deposit sets of sequences into the RDR and provide an unmediated link in RDA so interested parties can ‘help themselves’ to the data, thus saving time for all concerned.

The Challenges

“Sharing my data will erode my competitive advantage”

Researchers with this concern could still participate by describing their data and requesting interested parties contact them to discuss the possibility of sharing. This could create new opportunities for collaboration.  

“My data is covered by ethics approval so can’t be shared”

Not necessarily. An ethics variation may be possible under some circumstances to allow the sharing of de-identified data. Other universities have gone down this path and we are investigating it at UWS. Alternatively the data could still be described and may still provide collaborative opportunities.

“I don’t have time to participate”

We understand time is precious, so the data interview questionnaire is pre-populated by Library staff with information already available, to minimise the amount of time away from research activities.

Moving Forward

If you would like to know more about the project, or have data to describe, secure, and/or share, we would love to hear from you. To participate in the national data registry project please contact Susan Robbins s.robbins@uws.edu.au or 9852 5458. To request data storage space, please contact Peter Sefton p.sefton@uws.edu.au.

Copyright Susan Robbins 2012. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia.

Research Data Repository June Update

By Toby O’Hara and Peter Sefton

This post is an update on the Research Data Repository (RDR) project. The short version is that initial efforts will be focussed on two areas where we can make an immediate difference:

  1. Work is proceeding as quickly as possible to get dedicated research storage (which we already have set up) to the point where it can be quickly and easily provisioned to researchers as a simple share (“the R drive”).

  2. We are also keen to make  immediate progress on the externally-funded portion of the RDR project, the Research Data Catalogue and to get the basic deliverables on that ticked-off; to set the scene for adding real value to the university’s research.

At UWS eResearch projects relating to research data are overseen by a very effective committee chaired by a representative of the Pro Vice Chancellor, Research, who represents the interests of Researchers. Also represented are IT, The Library and the Office of Research Services, and, of course eResearch. The chair is Professor Deborah Sweeney, Associate Pro Vice-Chancellor (Research) Health & Science.  Last Friday, our  brand-spanking new project/implementation manager for the Research Data Repository, Toby O’Hara faced the Data Related Projects Steering Committee for the first time. He was able to report progress and get agreement from the committee on the priorities mentioned above.

Progress to date has been to take initial work on defining and describing the RDR, and flesh it out a little more, through discussion with other stakeholders. For example, eResearch is working collaboratively with the Office of Research Services to firstly, understand researcher’s needs, and secondly meet as many needs as possible. This solution includes:

  • Advice and guidance on Data Management policies, practices, and approaches

  • Some technical assistance which will provide connected storage and data sharing options

The technical solution was discussed with Information Technology Services, and it was agreed with them that a good starting point would be a disk drive with storage space which researchers can use as data storage. In the office we are calling it the “R:\” for ‘research’. The reason it’s referred to as a starting point is because we view it as a very basic solution, and that there are many other storage and computing possibilities out there which need to be tamed, groomed, and made available.

This “R:\” storage option was discussed with the DRPSC and approved by them as a reasonable and doable first step.

The Research Data Catalogue (AKA ANDS project MS23), which is a portion of the RDR, was also officially kicked off, and a milestone schedule established. This milestone schedule was presented to the steering committee, and accepted with no concerns raised.

There is still some work to be done to further scope and define specifically what the RDR will look like, technically, and operationally.

To identify a problem that some researchers may not fully understand, eResearch may very well need to provide a strong source of information about data management, what it is and who is requiring it. Once that is established, we might get a few more people excited about what UWS is doing to make their data management easier.

We also hope to keep exploring local, remote, and cloud-y options “behind the scenes” that introduce technological efficiency and cost savings and at the same time be simple enough in presentation that it is easy for an interested researcher to take advantage of the services.

Comments and suggestions are welcome.

Copyright Toby O’Hara and Peter Sefton, 2012. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. <http://creativecommons.org/licenses/by-sa/2.5/au/>

University of Western Sydney Enterprise Research Data Catalogue Project

[This document is a lightly-edited version of an approved project proposal written by staff at the University of Western Sydney for the Australian National Data Service (ANDS) metadata stores funding stream – we are publishing it here to assist in collaborating with other universities on their Metadata Stores projects. Some ANDS boilerplate text and financial information have been removed, and links added to materials that add context.]

ANDS Project Description

for

Enterprise Research Data Catalogue

ANDS Project Code: MS23

Document Version 1.0

Prepared by Peter Sefton and Peter Bugeia

University of Western Sydney

6/12/2011


 Project Description

Organisation responsible for the project (Subcontractor)

University of Western Sydney

Organisation that will undertake the work (Sub-Subcontractor)

ABN or ACN

530 140 698 81

Name of  Contact Person

Peter Sefton

Complete address and contact details of Contact Person 

eResearch Capability Team

Office of the Pro Vice Chancellor (Research)

Academic and Research Division

University of Western Sydney

Campus : Penrith (Werrington North)

Building : AD

Room : AD.G.15

Locked Bag 1797

South Penrith NSW  2751

T: 61 2 4736  0072

F: 61 2  4736 0905

p.sefton@uws.edu.au

ANDS Program

Metadata Stores

Project Summary

This project adheres to NCRIS funding requirements.

Funded activities are limited to: installation, configuration and testing of software; manual creation of metadata (beyond that required for software specification and testing); scoping exercises or studies in the amount of research data available at an institution.

The project does not use NCRIS funds for the following activities:

  • purchasing of IT hardware for storage or any other purpose;

  • ongoing staffing; “proof of concept” software development;

  • funding of work by parties based outside Australia.

Any software development will be made available as open source.

Funding Sought

<removed>

Proposed project timeframe

10 months

Name of the person responsible for contract administration

<removed>

Names and affiliations of all collaborators if any

University of Newcastle – Vicki Picasso.

Other collaborators will be identified during the course of the project.

Background

The University of Western Sydney is undertaking the early stages of an internally funded project to establish a Research Data Repository [link added] (RDR) and associated infrastructure to support it. This project is being led by the eResearch Unit with the participation of IT, the Library and the Office of Research Services. The repository will consist of:

  • scalable, managed file storage for both working and archived data; 

  • access to virtualized computing infrastructure so that researchers can run data analysis tasks;

  • a research data catalogue containing metadata about data at a collection level for code-compliance, strategic research management and discovery purposes.

The storage component of the RDR was established in 2010. The next steps are to design the architecture that links the storage to computing infrastructure and cataloguing applications. This architectural work will be undertaken by the eResearch Unit, IT, and the University Library.

UWS has a nascent research data catalogue which is being established under ANDS project SC20.

Throughout this document the ‘metadata store’ for research data will be referred to as the ‘Research Data Catalogue’ to emphasise its role in the institution using a term that should be understandable to all stakeholders.

2.  Aims and Objectives

Alignment with ANDS Objective

already

to be

no

To manage metadata about data collections held at the institution

(some progress on SC20)

X

To enable discovery and reuse of data collections held at the institution

X

To support strategic planning for research in the institution

X

To ensure high quality metadata

X

Overview of project

The proposed metadata stores work outlined in this document will contribute to the RDR project by implementing the research data catalogue (metadata store) in the institutional context, establishing data sources for parties and activities from research and library systems, and providing an expanded platform for describing collections.

This will be built into an integrated system for recording catalogue-descriptions of research data collections with a view to it becoming the institutional research data catalogue for the university. There is opportunity for it to be collaboratively built to fulfil a broader set of institutional requirements than just those of the University of Western Sydney’s.

The University has chosen the ReDBox application as the research data catalogue to fulfil functional requirements under SC20. This Metadata Stores project will explore how it can be expanded to be the basis of the University’s institutional research data catalogue, and seek alternative and additional software solutions if necessary. It is proposed to conduct this analysis in concert with other institutions using the same software and/or with similar requirements, so that any software developed or purchased has a broad user base.

Scope and boundaries

The project will focus on the following:

  • implementation of the core deliverables (D1-D6) suggested by ANDS, as none of these are fully established at UWS,

  • the establishment of workflows for identifying collections, and

  • the integration of data management planning into the broader research lifecycle.

The primary driver for this work is to establish a picture at UWS of where research data resides and to establish infrastructure for researchers to be able to store and describe their data for later re-use by themselves, their research teams and students, and more globally. This work will aim to meet UWS requirements for research management and practice as well as the ANDS goal of sharing collection descriptions.

The full scope of the final project will be refined and specified in Deliverable D15, Project Management Plan.

Dependencies

This project depends on the SC20 project to establish the basic application. This is considered low risk as the same application is now in production at both the University of Newcastle and at Flinders University.

Overall Approach

Strategy and methodology

This project will use an agile project methodology for software development tasks and for other tasks such as evaluation of data sources. The exact nature of the project will be developed with the project manager and team and documented in deliverable D15, Project Management Plan.

UWS is aiming to collaborate with other institutions that are using similar software and with similar approaches to research data in general. This will provide an opportunity to work together to specify and deliver new software features which meet a common need. We have identified one partner, the University of Newcastle and will work with them to recruit more.

Technical issues

Some technical issues which have presented themselves in the formative stages of this project include:

  • The relationship between storage infrastructure and the metadata catalogue and how these should be linked. Some attention will be given to specifying this interface in DC21 and SC20.

  • The relationship between NLA party IDs, local IDs and the forthcoming ORCID system, and the interfaces to all of these systems. This issue will need to be investigated with ANDS and the ANDS community.

Internal Resources

The exact breakdown of the resources needed for this project is not yet known but it will be lead by the eResearch Unit and will involve library staff in sourcing data collections.

External resources

It is not known at this stage if external resources will be engaged but it is highly likely that if software development is required, expressions of interest will be sought from QCIF (where ReDBox is currently maintained) and Intersect, the NSW eResearch service provider, and possibly via the internal teams of universities partnering in this work.

Stakeholders

The project steering committee will consist of representatives from:

  • The eResearch unit.

  • Research Services.

  • IT

  • The Library.

  • Researchers from various disciplines, by invitation, as needed.

4. Project Deliverables

D1

A working feed of records describing Collections and associated Activities, Parties and Services to Research Data Australia, in the current version of RIF-CS (1.3), demonstrated to meet the quality requirements for RIF-CS records as set by ANDS. This feed will contain additional descriptive metadata for newly identified collections, over and above the feed established in SC20 and will be available for use by researchers in an expanded range of discipline areas as per D2. RIF-CS 1.3 support will require an upgrade to ReDBox. The new Research Data Catalogue is expected to import the contents of the SC20 metadata store.

D2

A feed of collections from at least three distinct Faculties (or equivalent organisational units) within the institution to Research Data Australia.

UWS is in the process of establishing 5 new flagship research institutes in addition to 10 existing Schools.  Priority will be given to collections sourced from the institutes, which represent a broad range of disciplines, under criteria based on those used in SC20. The most established of these include:

  • Hawkesbury Institute for the Environment (Climate Science)

  • Institute for Culture and Society.

  • MARCS Institute for Brain and Behaviour.*

  • Civionics* (Civionics is a discipline concerned with the interface of the use of electronic devices for the monitoring of civil engineering infrastructure)

*These are currently research centres in the process of becoming fully-fledged institutes.

D3

Demonstrated alignment of metadata records about Parties with an institutional name authority (HR or Library), with the authoritative form of the name sourced external to the metadata store, and with new researcher descriptions added to the metadata through regular updates from the name authority.

Party information will be sourced from the software system used by Research Services for administering UWS research, grants and projects, this will be integrated with the Research Data Catalogue via a name authority system with an automatic update. Party IDs will be minted using the local UWS Handle server.

D4

Demonstrated alignment of metadata records about Parties with the ARDC Party Infrastructure Project, with researcher descriptions contributed to the NLA, and with People Australia identifiers for researchers recorded against researchers.

The project will evaluate the different options for feeding data to the NLA , choosing between a feed to ANDS in RIF-CS format or to the NLA, and if the latter, choosing which metadata format to use, either RIF-CS or EAC-CPF. The project will also investigate a solution for importing or aligning local IDs with NLA IDs and how to interoperate with the global ORCID system when it comes online.

D5

Demonstrated alignment of metadata records about Activities with institutional and external sources of truth (Research Office, ARC and NHMRC grant registries), with the authoritative description of the Activity sourced external to the metadata store, and with new researcher project added to the metadata through regular updates from the sources of truth.

This deliverable will use the same data sources and processes as D3, with the addition of processes to import globally defined IDs for activities, such as ARC grants, with a process for aligning these with local views of the same data.

D6

Demonstrated workflow for registering new Collections in the university; this can include automated update, or semi-automated (notification-based).

This project will explore the following workflows for data collection registration, with the community of ReDBox user-organisations:

  • The existing library-mediated registration process established in SC20 with data-interviews informing curated descriptions.

  • Automated feeds from data capture systems, feeding into template records which have been curated as in the point above by the library. This will be piloted in the DC21 project.

  • A new system that will integrate the process of applying for data storage, and creating a data management plan into a single form, to integrate the process of describing and capturing data into institutional processes.

  • An system that allows researchers to capture and  view data in the RDR-managed storage system or on local storage, and to curate it into collections, both by manually selecting items, and by rule (such as a metadata query or by location). This will have a plugin architecture to allow it to be adapted for different disciplines and file types and build on the integration work between DC21 and SC20.

D7

A software system to realise deliverables D1–D6 (and D8, D13–D14 if applicable), with robust storage and management of metadata.

The starting point for a software system used will be the one used for implementing SC20, which is the ANDS-funded ReDBox application. We will aim to undertake this work in concert with other institutions and evaluate the most appropriate way to create the new functionality, either by extending ReDBox or by using other systems.


Optional Deliverables

If your institution has already implemented some of the foregoing deliverables at an institutional level, ANDS expects that you will also include some of the following optional deliverables:

D8

Demonstrated ability to manage the following aspects of the collection lifecycle through recording and exposing relevant metadata related to:

  • D8.1 embargo dates for collections, where applicable

  • D8.2 current online location of collection (on internal store or external store)

  • D8.3 current offline location of collection

  • D8.4 intellectual property rights (licensing, restrictions on reuse)

  • D8.5 retention policy (disposal date, deposit date)

D8.6 policy framework (data management plan relevant, ethics clearance forms relevant)

Many of these functions are delivered by the ReDBox application out of the box, the implementation will make sure that they are adopted at UWS.

D9

A public researcher or research profile portal, exposing publishable metadata about the research data being held at the institution.

Not a priority.

D10

Demonstrated ability to feed a selected subset of the collection records relating to a particular discipline to a discipline registry, following the metadata schema and conventions of that registry

Not a priority.

D11

Demonstrated ability to manage the following aspects of the collection lifecycle through recording and exposing relevant metadata:

  • citation requirements (authoritative identifiers, including DOI, preferred citation format)
  • citation tracking of collections
  • audit information (refer to publications audit)
  • proprietary tools and formats used in collecting the collection
  • Not a priority.

    D12

    Strategic reporting on contents and coverage of metadata store for internal use

    This is a key area for informing the establishment of a Research Data Repository and the organisational cultural environment in which it will exist. This project will aim to produce reports that can be used to track the growth of the RDR, via the Research Data Catalogue.

    D13

    Storage and exposure for discovery of object level metadata, and alignment of object level metadata with collection metadata (i.e. ability to navigate from object metadata to collection metadata; update of object metadata aligned with update of collection metadata)

    Not a priority.

    D14

    Storage and management of technical metadata for object and collection reuse, including software and equipment descriptions, methodology, and data interpretation

    Not a priority.

    Procedural Deliverables

    D15

    Project Management Plan, using the ANDS template, specifying the details of the planned activity, with risks, schedules, etc

    D16

    Progress Reports, using ANDS templates

    D17

    Final Report, using ANDS templates

    D18

    Deposit of any software (including stylesheets and schemata) developed in the project for achieving other deliverables, and that can be (usefully) used outside the institution, in either Google Code or SourceForge, including:

  • a Google code comment and tag or SourceForge summary and tag containing the text “ANDS-funded”
  • Developer manuals where applicable, to facilitate reuse
  • Deployment manuals to facilitate external deployment
  • User manuals to facilitate use
  • D19

    A source code report, if any software is developed and publicly deposited under D17

    D20

    A User Acceptance Test online survey

    5. Assumptions, Constraints, Dependencies and Risks

    Assumptions

    Constraints

    Dependencies

    Risks*

    Staffing

    UWS will be able to provide staff to inform the project and recruit a project manager.

    The usual constraints of working in a university.

    This project depends on the RDR project, which is not yet established, but does have a budget.

    Project management and data librarian staff can not be sourced.

    Organisational

    The RDR project will continue to develop, and storage will be available to researchers via some kind of easy-to-use application process.

    UWS project management and governance processes must be followed.

    This depends on the ITS budget.

    RDR storage does not come online.

    Technical

    The scope of the technical work is yet to be established – there are no indications that insurmountable challenges will arise.

    External Suppliers

    Software development can be sourced from QCIF or Intersect

    Legal/Ethical

    Other

    Researchers have limited time to participate.

    Early work on SC20 is finding that sourcing data collections is difficult

    Collections will be hard to source. (Mitigation: try to provide services that are of high value to researchers and collect metadata as a gateway to their provision (eg the process of filling out applications for storage).


    * – Where Risks have been identified, briefly outline your mitigation strategy.

    6. Stakeholder Analysis

    Stakeholder

    Interest / stake

    Importance

    eResearch Unit

    Lead agency

    High

    Library

    Business owner for the Research Data Catalogue – operational responsibility for data curation.

    High

    Research Services

    Custodians of the ancillary data about parties and activities which support the RDC.

    High

    Information Technology Services

    Implementer / supplier of storage infrastructure and environment for the RDR

    High

    7. Project Management

    Project Team, Roles and Responsibilities

    Role

    % EFT

    Responsibilities

    Recruitment required? (yes/no)

    In-kind contribution or ANDS funded?

    Project Manager

    50

  • Deliver the project to ANDS expectations.

  • Assume responsibility and accountability for each Deliverable.

  • Monitor and report to ANDS on project progress.

  • Advise ANDS if project appears to be in danger of non-delivery.

  • Please add more rows as required to describe further responsibilities.

  • yes

    ANDS funded

    Project steering committee

    ?

    Exact composition to TBA –

    [Steering committee now established – chaired by a representative of the office of the Pro Vice Chancellor Reseach, has representatives from ITS, Library, Office of Research Services and eResearch.]

    In Kind

    Data librarians

    50%

  • Source data collections

  • Curate data descriptions

  • In Kind

    eResearch team

    10%

  • Write policy and procedures for data management in the context of the RDR and RDC

  • Report to ANDS on project governance [fixed typo] issues

  • In Kind

    8. Budget

    <removed>

    9. Exit and Sustainability Plans

    <This section was not filled in>

    10. Milestones for Payment

    Amount

    Indicative Timing

    Milestone

    25%
    <removed>

    Day One (1)

  • Contract execution

  • 25%
    <removed>

    Agreed project start date + eight (8) weeks

  • D15
    Project Management Plan, using the ANDS template, specifying the details of the planned activity, with risks, schedules, etc

  • D16
    Progress Report, using ANDS templates

  • 25%
    <removed>

    Agreed project start date + 30 weeks

  • D16
    Progress Report, using ANDS templates

  • D1
    A working feed of records describing Collections and associated Activities, Parties and Services to Research Data Australia, in the current version of RIF-CS (1.3), demonstrated to meet the quality requirements for RIF-CS records as set by ANDS

  • 25%
    <removed>

    52 weeks
    (Completion)

  • [D2–D7 mandatory dellverables]

  • [any optional deliverables, including D8–D14 where applicable]

    • D17
      Final Report, using ANDS templates

    • D18
      Deposit of any software (including stylesheets and schemata) developed in the project for achieving other deliverables, and that can be (usefully) used outside the institution, in an open source repository such as Google Code, SourceForge or GitHub:

  • a comment, summary or tag containing the text “ANDS-funded”

  • developer manuals where applicable, to facilitate reuse

  • deployment manuals to facilitate external deployment

  • user manuals to facilitate use.

    • D19
      A source code report, if any software is developed and publicly deposited under D18

    • D20
      A User Acceptance Test online survey

  • 11. Glossary of Terms

    Term

    Definition

    Collection

    A collection describes a grouping of physical or digital items of interest to the research community, particularly research data sets or physical collections of research materials.

    Activity

    An activity is an undertaking or process related to the creation, update, or maintenance of a collection.

    Party

    A party is a person or group related to an activity, to the creation, update, or maintenance of a collection, or to the provision of a service.

    Parties add to the discoverability of collections and add valuable contextual information, including assisting with determination of value for a collection. A party could be either a

  • group:  one or more persons acting as a family, group, association, partnership, corporation, institution or agency.
  • person:  a human being; or an identity (or role) assumed by one or more human beings.
  • Appendix A. Check list of metadata store functionality

    The purpose of this background check is to determine the scope of the project by structuring an analysis of your institution’s data management readiness, and to provide a check list that reflects the functionality of an effective data collection infrastructure. Completion of the checklist is not mandatory, but may well be useful to your institution.

    Yes

    No

    Developing

    Does your institution have a Data Management Policy?

    X

    Is your institution able to automatically aggregate metadata about data collections from various areas/units within your institution?

    X

    Is any of this metadata exposed for discovery through a discipline portal?

    X

    Is any of this metadata exposed for discovery through an institutional portal?

    X

    Is any of this metadata exposed for discovery through Research Data Australia?

    X

    Are you able to expose and manage metadata about data collections at an object level? (Individual data objects; data collection methods; sample information; etc.)

    X

    Do you manipulate metadata descriptions aggregated from various areas of the institution, in order to align them with an institutional metadata standard?

    X

    Does your institution’s metadata conform or map to RIF-CS?

    X

    Does your institution’s metadata use controlled vocabularies?

    X

    Is your institution’s metadata integrated with institutional sources of truth (e.g. HR for researchers, Research Office for grants)?

    X

    Is your institution’s metadata integrated with national sources of truth (e.g. NLA Party, ARC/NHMRC grants registry)?

    X

    Do you have a process for registering new data collections as they are created?

    X

    When it comes to the core attributes of data collections required for effective data management, are you able to manage the following:

    Yes

    No

    Developing

    embargo dates for collections, where applicable?

    X

    current online location of collection (whether internal store or external store)?

    X

    current offline location of collection?

    X

    intellectual property rights – licensing, restrictions on reuse?

    X

    retention policy e.g. disposal date, deposit date?

    X

    policy framework  e.g. data management plan, ethics clearance forms?

    X