[This document is a lightly-edited version of an approved project proposal written by staff at the University of Western Sydney for the Australian National Data Service (ANDS) metadata stores funding stream – we are publishing it here to assist in collaborating with other universities on their Metadata Stores projects. Some ANDS boilerplate text and financial information have been removed, and links added to materials that add context.]
ANDS Project Description
for
Enterprise Research Data Catalogue
ANDS Project Code: MS23
Document Version 1.0
Prepared by Peter Sefton and Peter Bugeia
University of Western Sydney
6/12/2011
Project Description
Organisation responsible for the project (Subcontractor) |
University of Western Sydney |
Organisation that will undertake the work (Sub-Subcontractor) |
|
ABN or ACN |
530 140 698 81 |
Name of Contact Person |
Peter Sefton |
Complete address and contact details of Contact Person |
eResearch Capability Team Office of the Pro Vice Chancellor (Research) Academic and Research Division University of Western Sydney Campus : Penrith (Werrington North) Building : AD Room : AD.G.15 Locked Bag 1797 South Penrith NSW 2751 T: 61 2 4736 0072 F: 61 2 4736 0905 |
ANDS Program |
Metadata Stores |
Project Summary |
This project adheres to NCRIS funding requirements. Funded activities are limited to: installation, configuration and testing of software; manual creation of metadata (beyond that required for software specification and testing); scoping exercises or studies in the amount of research data available at an institution. The project does not use NCRIS funds for the following activities:
Any software development will be made available as open source. |
Funding Sought |
<removed> |
Proposed project timeframe |
10 months |
Name of the person responsible for contract administration |
<removed> |
Names and affiliations of all collaborators if any |
University of Newcastle – Vicki Picasso. Other collaborators will be identified during the course of the project. |
Background
The University of Western Sydney is undertaking the early stages of an internally funded project to establish a Research Data Repository [link added] (RDR) and associated infrastructure to support it. This project is being led by the eResearch Unit with the participation of IT, the Library and the Office of Research Services. The repository will consist of:
scalable, managed file storage for both working and archived data;
access to virtualized computing infrastructure so that researchers can run data analysis tasks;
a research data catalogue containing metadata about data at a collection level for code-compliance, strategic research management and discovery purposes.
The storage component of the RDR was established in 2010. The next steps are to design the architecture that links the storage to computing infrastructure and cataloguing applications. This architectural work will be undertaken by the eResearch Unit, IT, and the University Library.
UWS has a nascent research data catalogue which is being established under ANDS project SC20.
Throughout this document the ‘metadata store’ for research data will be referred to as the ‘Research Data Catalogue’ to emphasise its role in the institution using a term that should be understandable to all stakeholders.
2. Aims and Objectives
Alignment with ANDS Objective |
already |
to be |
no |
To manage metadata about data collections held at the institution |
(some progress on SC20) |
X |
|
To enable discovery and reuse of data collections held at the institution |
X |
||
To support strategic planning for research in the institution |
X |
||
To ensure high quality metadata |
X |
Overview of project
The proposed metadata stores work outlined in this document will contribute to the RDR project by implementing the research data catalogue (metadata store) in the institutional context, establishing data sources for parties and activities from research and library systems, and providing an expanded platform for describing collections.
This will be built into an integrated system for recording catalogue-descriptions of research data collections with a view to it becoming the institutional research data catalogue for the university. There is opportunity for it to be collaboratively built to fulfil a broader set of institutional requirements than just those of the University of Western Sydney’s.
The University has chosen the ReDBox application as the research data catalogue to fulfil functional requirements under SC20. This Metadata Stores project will explore how it can be expanded to be the basis of the University’s institutional research data catalogue, and seek alternative and additional software solutions if necessary. It is proposed to conduct this analysis in concert with other institutions using the same software and/or with similar requirements, so that any software developed or purchased has a broad user base.
Scope and boundaries
The project will focus on the following:
implementation of the core deliverables (D1-D6) suggested by ANDS, as none of these are fully established at UWS,
the establishment of workflows for identifying collections, and
the integration of data management planning into the broader research lifecycle.
The primary driver for this work is to establish a picture at UWS of where research data resides and to establish infrastructure for researchers to be able to store and describe their data for later re-use by themselves, their research teams and students, and more globally. This work will aim to meet UWS requirements for research management and practice as well as the ANDS goal of sharing collection descriptions.
The full scope of the final project will be refined and specified in Deliverable D15, Project Management Plan.
Dependencies
This project depends on the SC20 project to establish the basic application. This is considered low risk as the same application is now in production at both the University of Newcastle and at Flinders University.
Overall Approach
Strategy and methodology
This project will use an agile project methodology for software development tasks and for other tasks such as evaluation of data sources. The exact nature of the project will be developed with the project manager and team and documented in deliverable D15, Project Management Plan.
UWS is aiming to collaborate with other institutions that are using similar software and with similar approaches to research data in general. This will provide an opportunity to work together to specify and deliver new software features which meet a common need. We have identified one partner, the University of Newcastle and will work with them to recruit more.
Technical issues
Some technical issues which have presented themselves in the formative stages of this project include:
The relationship between storage infrastructure and the metadata catalogue and how these should be linked. Some attention will be given to specifying this interface in DC21 and SC20.
The relationship between NLA party IDs, local IDs and the forthcoming ORCID system, and the interfaces to all of these systems. This issue will need to be investigated with ANDS and the ANDS community.
Internal Resources
The exact breakdown of the resources needed for this project is not yet known but it will be lead by the eResearch Unit and will involve library staff in sourcing data collections.
External resources
It is not known at this stage if external resources will be engaged but it is highly likely that if software development is required, expressions of interest will be sought from QCIF (where ReDBox is currently maintained) and Intersect, the NSW eResearch service provider, and possibly via the internal teams of universities partnering in this work.
Stakeholders
The project steering committee will consist of representatives from:
The eResearch unit.
Research Services.
IT
The Library.
Researchers from various disciplines, by invitation, as needed.
4. Project Deliverables
D1 |
A working feed of records describing Collections and associated Activities, Parties and Services to Research Data Australia, in the current version of RIF-CS (1.3), demonstrated to meet the quality requirements for RIF-CS records as set by ANDS. This feed will contain additional descriptive metadata for newly identified collections, over and above the feed established in SC20 and will be available for use by researchers in an expanded range of discipline areas as per D2. RIF-CS 1.3 support will require an upgrade to ReDBox. The new Research Data Catalogue is expected to import the contents of the SC20 metadata store. |
D2 |
A feed of collections from at least three distinct Faculties (or equivalent organisational units) within the institution to Research Data Australia. UWS is in the process of establishing 5 new flagship research institutes in addition to 10 existing Schools. Priority will be given to collections sourced from the institutes, which represent a broad range of disciplines, under criteria based on those used in SC20. The most established of these include:
*These are currently research centres in the process of becoming fully-fledged institutes. |
D3 |
Demonstrated alignment of metadata records about Parties with an institutional name authority (HR or Library), with the authoritative form of the name sourced external to the metadata store, and with new researcher descriptions added to the metadata through regular updates from the name authority. Party information will be sourced from the software system used by Research Services for administering UWS research, grants and projects, this will be integrated with the Research Data Catalogue via a name authority system with an automatic update. Party IDs will be minted using the local UWS Handle server. |
D4 |
Demonstrated alignment of metadata records about Parties with the ARDC Party Infrastructure Project, with researcher descriptions contributed to the NLA, and with People Australia identifiers for researchers recorded against researchers. The project will evaluate the different options for feeding data to the NLA , choosing between a feed to ANDS in RIF-CS format or to the NLA, and if the latter, choosing which metadata format to use, either RIF-CS or EAC-CPF. The project will also investigate a solution for importing or aligning local IDs with NLA IDs and how to interoperate with the global ORCID system when it comes online. |
D5 |
Demonstrated alignment of metadata records about Activities with institutional and external sources of truth (Research Office, ARC and NHMRC grant registries), with the authoritative description of the Activity sourced external to the metadata store, and with new researcher project added to the metadata through regular updates from the sources of truth. This deliverable will use the same data sources and processes as D3, with the addition of processes to import globally defined IDs for activities, such as ARC grants, with a process for aligning these with local views of the same data. |
D6 |
Demonstrated workflow for registering new Collections in the university; this can include automated update, or semi-automated (notification-based). This project will explore the following workflows for data collection registration, with the community of ReDBox user-organisations:
|
D7 |
A software system to realise deliverables D1–D6 (and D8, D13–D14 if applicable), with robust storage and management of metadata. The starting point for a software system used will be the one used for implementing SC20, which is the ANDS-funded ReDBox application. We will aim to undertake this work in concert with other institutions and evaluate the most appropriate way to create the new functionality, either by extending ReDBox or by using other systems. |
Optional Deliverables
If your institution has already implemented some of the foregoing deliverables at an institutional level, ANDS expects that you will also include some of the following optional deliverables:
D8
|
Demonstrated ability to manage the following aspects of the collection lifecycle through recording and exposing relevant metadata related to:
D8.6 policy framework (data management plan relevant, ethics clearance forms relevant) Many of these functions are delivered by the ReDBox application out of the box, the implementation will make sure that they are adopted at UWS. |
D9 |
A public researcher or research profile portal, exposing publishable metadata about the research data being held at the institution. Not a priority. |
D10 |
Demonstrated ability to feed a selected subset of the collection records relating to a particular discipline to a discipline registry, following the metadata schema and conventions of that registry Not a priority. |
D11 |
Demonstrated ability to manage the following aspects of the collection lifecycle through recording and exposing relevant metadata: Not a priority. |
D12 |
Strategic reporting on contents and coverage of metadata store for internal use This is a key area for informing the establishment of a Research Data Repository and the organisational cultural environment in which it will exist. This project will aim to produce reports that can be used to track the growth of the RDR, via the Research Data Catalogue. |
D13 |
Storage and exposure for discovery of object level metadata, and alignment of object level metadata with collection metadata (i.e. ability to navigate from object metadata to collection metadata; update of object metadata aligned with update of collection metadata) Not a priority. |
D14 |
Storage and management of technical metadata for object and collection reuse, including software and equipment descriptions, methodology, and data interpretation Not a priority. |
Procedural Deliverables
D15 |
Project Management Plan, using the ANDS template, specifying the details of the planned activity, with risks, schedules, etc |
D16 |
Progress Reports, using ANDS templates |
D17 |
Final Report, using ANDS templates |
D18 |
Deposit of any software (including stylesheets and schemata) developed in the project for achieving other deliverables, and that can be (usefully) used outside the institution, in either Google Code or SourceForge, including: |
D19 |
A source code report, if any software is developed and publicly deposited under D17 |
D20 |
A User Acceptance Test online survey |
5. Assumptions, Constraints, Dependencies and Risks
Assumptions |
Constraints |
Dependencies |
Risks* |
|
Staffing |
UWS will be able to provide staff to inform the project and recruit a project manager. |
The usual constraints of working in a university. |
This project depends on the RDR project, which is not yet established, but does have a budget. |
Project management and data librarian staff can not be sourced. |
Organisational |
The RDR project will continue to develop, and storage will be available to researchers via some kind of easy-to-use application process. |
UWS project management and governance processes must be followed. |
This depends on the ITS budget. |
RDR storage does not come online. |
Technical |
The scope of the technical work is yet to be established – there are no indications that insurmountable challenges will arise. |
|||
External Suppliers |
Software development can be sourced from QCIF or Intersect |
|||
Legal/Ethical |
||||
Other |
Researchers have limited time to participate. |
Early work on SC20 is finding that sourcing data collections is difficult |
Collections will be hard to source. (Mitigation: try to provide services that are of high value to researchers and collect metadata as a gateway to their provision (eg the process of filling out applications for storage). |
* – Where Risks have been identified,
briefly outline your mitigation strategy.
6. Stakeholder Analysis
Stakeholder |
Interest / stake |
Importance |
eResearch Unit |
Lead agency |
High |
Library |
Business owner for the Research Data Catalogue – operational responsibility for data curation. |
High |
Research Services |
Custodians of the ancillary data about parties and activities which support the RDC. |
High |
Information Technology Services |
Implementer / supplier of storage infrastructure and environment for the RDR |
High |
7. Project Management
Project Team, Roles and Responsibilities
Role |
% EFT |
Responsibilities |
Recruitment required? (yes/no) |
In-kind contribution or ANDS funded? |
Project Manager |
50 |
Deliver the project to ANDS expectations. Assume responsibility and accountability for each Deliverable. Monitor and report to ANDS on project progress. Advise ANDS if project appears to be in danger of non-delivery. Please add more rows as required to describe further responsibilities. |
yes |
ANDS funded |
Project steering committee |
? |
Exact composition to TBA – [Steering committee now established – chaired by a representative of the office of the Pro Vice Chancellor Reseach, has representatives from ITS, Library, Office of Research Services and eResearch.] |
In Kind |
|
Data librarians |
50% |
Source data collections Curate data descriptions |
In Kind |
|
eResearch team |
10% |
Write policy and procedures for data management in the context of the RDR and RDC Report to ANDS on project governance [fixed typo] issues |
In Kind |
8. Budget
<removed>
9. Exit and Sustainability Plans
<This section was not filled in>
10. Milestones for Payment
Amount |
Indicative Timing |
Milestone |
25% |
Day One (1) |
Contract execution |
25% |
Agreed project start date + eight (8) weeks |
D15 D16 |
25% |
Agreed project start date + 30 weeks |
D16 D1 |
25% |
52 weeks |
[D2–D7 mandatory dellverables] [any optional deliverables, including D8–D14 where applicable]
a comment, summary or tag containing the text “ANDS-funded” developer manuals where applicable, to facilitate reuse deployment manuals to facilitate external deployment user manuals to facilitate use.
|
11. Glossary of Terms
Term |
Definition |
Collection |
A collection describes a grouping of physical or digital items of interest to the research community, particularly research data sets or physical collections of research materials. |
Activity |
An activity is an undertaking or process related to the creation, update, or maintenance of a collection. |
Party |
A party is a person or group related to an activity, to the creation, update, or maintenance of a collection, or to the provision of a service. Parties add to the discoverability of collections and add valuable contextual information, including assisting with determination of value for a collection. A party could be either a
|
Appendix A. Check list of metadata store functionality
The purpose of this background check is to determine the scope of the project by structuring an analysis of your institution’s data management readiness, and to provide a check list that reflects the functionality of an effective data collection infrastructure. Completion of the checklist is not mandatory, but may well be useful to your institution.
Yes |
No |
Developing |
|
Does your institution have a Data Management Policy? |
X |
||
Is your institution able to automatically aggregate metadata about data collections from various areas/units within your institution? |
X |
||
Is any of this metadata exposed for discovery through a discipline portal? |
X |
||
Is any of this metadata exposed for discovery through an institutional portal? |
X |
||
Is any of this metadata exposed for discovery through Research Data Australia? |
X |
||
Are you able to expose and manage metadata about data collections at an object level? (Individual data objects; data collection methods; sample information; etc.) |
X |
||
Do you manipulate metadata descriptions aggregated from various areas of the institution, in order to align them with an institutional metadata standard? |
X |
||
Does your institution’s metadata conform or map to RIF-CS? |
X |
||
Does your institution’s metadata use controlled vocabularies? |
X |
||
Is your institution’s metadata integrated with institutional sources of truth (e.g. HR for researchers, Research Office for grants)? |
X |
||
Is your institution’s metadata integrated with national sources of truth (e.g. NLA Party, ARC/NHMRC grants registry)? |
X |
||
Do you have a process for registering new data collections as they are created? |
X |
||
When it comes to the core attributes of data collections required for effective data management, are you able to manage the following: |
Yes |
No |
Developing |
embargo dates for collections, where applicable? |
X |
||
current online location of collection (whether internal store or external store)? |
X |
||
current offline location of collection? |
X |
||
intellectual property rights – licensing, restrictions on reuse? |
X |
||
retention policy e.g. disposal date, deposit date? |
X |
||
policy framework e.g. data management plan, ethics clearance forms? |
X |