UWS Attends ANDS Data Showcase

Posted on 2015-08-25 by Kim Heckenberg

Katrina Trewin and Kim Heckenberg from the UWS Library and eResearch attended the Australian Open Research Data Showcase in Canberra on the 19^th June 2015. The showcase emphasised the many benefits from sharing and reusing research data and was hosted by the Australian National Data Service, and National Collaborative Research Infrastructure Strategy (NCRIS) facility. We presented our poster about our ANDS MODC project and heard from many interesting speakers.

To download the poster (PDF) follow this link.

Is Omeka aDORAble?

Posted on 2014-11-04 by ptsefton

So, we have been asking looking at a few different software packages, and putting them through their paces at a series of Tuesday ‘tools days’ hosted by UWS eResearch, asking “Is this software going to be one of our supported Working Data Repositories for researcher cohorts?” That is, how does it rate as a DORA, a Digital Object Repository for Academe?

Last month we had our biggest ever tools-day event, with external people joining the usual eResearch suspects. Thanks to Jacqueline Spedding from the Dictionary of Sydney, Michael Lynch & Sharyn Wise from UTS and Cindy Wong and Jake Farrell from Intersect for coming along.

Omeka is a lightweight digital repository / website building solution, originally targeting the Galleries, Archives & Museums space.

TL;DR

So what were we wanting to know about Omeka? The external folks came along for a variety of reasons but at UWS we wanted to know the following (with short answers, so you don’t have to read on).

Is this something we can recommend for researchers with the kinds of research collections Omeka is known for?

Answer: almost certainly yes, unless we turn up any major problems in further testing, this is a good, solid, basic repository for Digital Humanities projects. So, for image and document based collections with limited budgets this looks like an obvious choice.
Can Omeka be used to build a semantically-rich website in a research/publishing project like the Dictionary of Sydney?

(The reason we’re asking this, is that UWS has a couple of projects with some similarities to the Dictionary, and we at UWS are interested in exploring what options there are for building and maintaining a big database like this. The Dictionary uses an open source code-based called Heurist. Anyway, we have some data from Hart Cohen’s Journey to Horseshoe Bend project which was exported from an unfinished attempt to build a website using Heurist).

The verdict? Still working on it, but reasonably promising so far.

Beyond its obvious purpose, is this a potential generic Digital Object Repository for Academe (DORA)?

Maybe. Of all the repository software we’ve tried at tools-days and looked at behind the scenes, this seems to be the most flexible and easily approachable.

Good

Omeka has a lot to recommend it:

It’s easy to get up and running.
It’s easy to hack, and easy to hack well, since it has plugins and themes that let you customise it without touching the core code. These are easy enough to work with that we had people getting (small) results on the day. More on that below.
It uses the Digital Object Pattern (DOP) – ie at the heart of Omeka are digital objects called Items with metadata, and attached files.
It has an API which just works, and can add items etc, although there are some complexities, more on which below.
It has lots of built-in ways to ingest data, including (buggy) CSV import and OAI-PMH harvesting.

Bad

There are some annoyances:

The documentation, which at first glance seems fairly comprehensive is actually quite lacking. Examples of the plugin API are incorrect, and the description of the external API are pretty terse and very short on examples (eg they don’t actually give an example of how to use your API key, or the pagination).
The API while complete is quite painful to use if you want to add anything – to add an item with metadata it’s not as simple as saying {“title”: “My title”} or even {“dc:title”: “My Title”} – you have to do an API call to find elements called Title, from the different element sets, then pick one and use that. And copy-pasting someone else’s example is hard: their metadata element 50 may not be the same as yours. That’s nothing a decent API library wouldn’t take care of, the eResearch team is looking for a student who’d like to take the Python API on as a project (and we’ve started improving the Python library).
Very limited access control with no way of restricting who can see what by group.
By default the MYSQL search is set up to only search for 4 letter words or greater, so you can’t search for CO2 or PTA (Parramatta) both of which are in our test data; totally fixable with some tweaking.
Measured against our principles, there’s one clear gap. We want to encourage the use of metadata to embrace linked-data principles and use URIs to identify things, in preference to strings. So while Omeka scores points for shipping with Dublin Core metadata, it loses out for not supporting linked data. If only it let you have a URI as well as a string value for any metadata field!

But maybe it can do Linked Data?

Since the hack day we have some more news on Omeka’s coming linked data support. Patrick from the Omeka Team says on their mailing list:

Hi Peter,

Glad you asked!

The API will use JSON-LD.

The Item Add interface as we’re currently imagining it has three options for each property: text input (like what exists now), internal reference (sorta bringing Item Relations into core, just with a better design), and external URI. The additional details, like using a local label for an external URI sound interesting, and we’ll be thinking about if/how that kind of thing might work.

Properties, too, will be much more LoD-friendly. In addition to Dublin Core, the FOAF, BIBO, and other vocabularies will be available both for expressing properties, and the classes available (analogous to the Item Types currently available).

Changes like this (and more!) are at the heart of the changes to design and infrastructure I mentioned in an earlier response. We hope that the additional time will be worth it to be able to address needs like these!

You can watch the progress at the Omeka S repo: https://github.com/omeka/omeka-s

Thanks,

Patrick

This new version of Omeka (Omeka-S) is due in “The Fall Semester of 2015”, which is North American for late next year, in Spring. Hard to tell from this short post by Patrick, but this looks promising. There are a few different ways that the current version of Omeka may support Linked Data. The best way forward is probably to use the ItemRelations plugin.

But what can we do in the meantime?

The Item Relations plugin desperately needs a new UI element to do lookups as at the moment you need to know the integer ID of the item you want to link to. Michael Lynch and Lloyd Harischandra both looked at various aspects of this problem on the day.
Item Relations don’t show up in the API. But the API is extensible, so that should be doable, should be simple enough to add a resource for item_realations and allow thevocab lookups etc needed to relate things to each other as (essentially) Subject Predicate Object. PT’s been working on this as a spare-time project.
Item Relations doesn’t allow for a text label on the relation or the endpoint, so while you might want to say someone is the dc:creator of a resource, you only see the “Creator” label and the title of the item you link to. What if you wanted to say “Dr Sefton” or “Petiepie” rather than “Peter Sefton” but still link to the same item?

What we did

Slightly doctored photo, either that or Cindy attended twice!

Gerry Devine showed off his “PageMaker” Semantic CMS: Gerry says:

The SemanticPageMaker (temporary name) is an application that allows for the creation of ‘Linked Data’-populated web pages to describe any chosen entity. Web forms are constructed from a pre-defined set of re-usable semantic tags which, when completed, automatically produce RDFa-enabled HTML and a corresponding JSON-LD document. The application thus allows semantically-rich information to be collected and exposed by users with little or no knowledge of semantic web terms.

I have attached some screenshots from my local dev instance as well as an RDFa/html page and a JSON-LD doc that describes the FACE facility (just dummy info at this stage) – note the JSON-LD doesnâ€™t expose all fields (due to duplicated keys)

A test instance is deployed on Heroku (feel free to register and start creating stuff â€“ might need some pointers though in how to do that until I create some help pages):

https://desolate-falls-4138.herokuapp.com/

Github:

https://github.com/gdevine/SemanticPageMaker

This might be the long-lost missing link: a simple semantic CMS which doesn’t try to be a complete semantic stack with ontologies etc, it just allows you to define entities realtions and give each type of entity a URI, and let them relate to each other and to be a good Linked Data citizen providing RDF and JSON data. Perfect for describing research context.

And during the afternoon, Gerry worked on making his CMS able to be used for lookups, so for example if we wanted to link an Omeka item to a facility at HIE we’d be able to do that via a lookup. We’re looking at building on work, the Fill My List (FML) project started by a team from Open Repositories 2014 on a universal URI lookup service with a consitent API for different sources of truth. Since the tools-day Lloyd has installed a UWS copy of FML so we can start experimenting with it with our family of repositories and research contexts.

Lloyd and Michael both worked on metadata lookups. Michael got a proof-of-concept UI going so that a user can use auto-complete to find Items rather than having to copy IDs. Lloyd got some autocomplete happening via a lookup to Orcid via FML.

PT and Jacqueline chatted about rich semantically-linked data-sets like the Dictionary of Sydney. In preparation for the workshop, PT tried taking the data from the Journey to Horseshoe Bend project, which is in a similar format to the Dictionary, putting it in a spreadsheet with multiple worksheets and importing it via a very dodgy Python Script.

Peter Bugeia investigated how environmental-science data would look in Omeka, by playing with the API to pump in data from the HIEv repository.

Sharyn and Andrew tried to hack together a simple plugin. Challenge: see if we can write a plugin which will detect YouTube links in metadata and embed a YouTube player (as a test case for a more general type of plugin that can show web previews of lots of different kinds of data). They got their hack to the “Hello World, I managed to get something on the screen” stage in 45 minutes, which is encouraging.

Jake looked at map-embedding: we had some sample data from UWS of KMZ (compressed Google-map-layers for UWS campuses), we wondered if it would be possible to show map data inline in an item page. Jake made some progress on this – the blocker isn’t Omeka it was finding a good way to do the map embedding.

Cindy continued the work she’s been doing with Jake on the Intersect press-button Omeka deployment. They’re using something called Snap Deploy and Ansible.

Jake says:

Through our Snapdeploy service Intersect are planning to offer researchers the ability to deploy their own instance of OMEKA with just a click of a button, with no IT knowledge required. All you need is an AAF log in and Snapdeploy will handle the creation of your NeCTAR Cloud VM and the deployment of OMEKA to that VM for you. We are currently in the beginning stages of adapting the Snapdeploy service to facilitate an Omeka setup and hope to offer it soon. We would also like feedback from you as researchers to let us know if there are any Omeka plug-ins that you think we could include as part of our standard deployment process that would be universally useful to the research community, so that we can ensure our Omeka product offers the functionality that researchers actually need.

David explored the API using an obscure long forgotten programming language, “Java” we think he called it and reported on the difficulty of grasping it.

What would an Omeka service look like?

If we wanted to offer this at UWS or beyond as well as use it for projects beyond the DH sphere, what would a supported service look like?

To make a sustainable service, we’d want to:

Work out how to provide robust hosting with an optimal number of small Omeka servers per host (is it one? is it ten?).
Come up with a generic data management plan: “We’ll host this for you for 12 months. After which if we don’t come to a new arrangement your site will be archived and given a DOI and the web site turned off”. Or something.

Is Omeka aDORAble by Peter Sefton, Andrew Leahy, Gerry Devine, Jake Farrell is licensed under a Creative Commons Attribution 4.0 International License.

Is HIEv aDORAble?

Posted on 2014-09-04 by ptsefton

[Update 2014-09-04: added a definition of DORA]

This week we held another of our tool/hack days at UWS eResearch. This time it was at the Hawkesbury Campus, with Gerry Devine, the data manager for the Hawkesbury Institute for the Environment. This week the tool in question is the DIVER product (AKA DC21 and HIEv).

Where did Intersect DIVER come from?

Intersect DIVER was originally developed by Intersect in 2012 for the University of Western Sydney’s Hawkesbury Institute for the Environment as a means to automatically capture and secure time series and other data from the Institute’s extensive field-based facilities and experiments. Called “the HIEv”, HIE has adopted Intersect DIVER as the Institute’s primary data capture application for Institute data. For more information, see here. http://intersect.org.au/content/intersect-diver

We wanted to evaluate DIVER against our Principles for eResearch software with a view to using it as a generic DORA working data repository.

Hang on! A DORA? What’s that?

DORA is a term coined by UWS eResearch Analyst David Clarke for a generic Digital Object Repository for Academe (yes, Fedora‘s an example of the species). We expressed it thusly in our principles:

At the core of eResearch practice is keeping data safe (remember: No Data Without Metadata). Different classes of data are safest in different homes, but ideally each data set or item should live in a repository

It can be given a URI

It can be retrieved/accessed via a URI by those who should be allowed to see it, and not by those who should not

There are plans in place to make sure the URI resolves to something useful as long is it is likely to be needed (which may be "as long as possible").

DORA Diagram

The DIVER software is running at HIE, with more than 50 "happy scientists" (as Gerry puts it) using it to manage the research data files, including those automatically deposited from the major research facility equipment.

HIEv Shot

So, what’s the verdict?

Is DIVER a good generic DORA?

The DIVER data model is based entirely on files, which is quite a different approach from CKAN, which we looked at a few weeks ago, or Omeka, which we’re going to look at in a fortnight’s time which both use a ‘digital object’ model where an object has metadata with zero or more files.

DIVER does many things right:

It has metadata, so there’s No Data without Metadata (but with some limitations, see below)
It has API access to for all the main functionality , so researchers doing reproducible research can build recipes to fetch and put data, run models and so on from their language of choice.
The API works well out of the box with hardly any fuss.
It makes some use of URIs as names for things in the data packages it produces, so that published data packages do use URIs to describe the research context.
It can extract metadata from some files and make it searchable.

But there are some issues that would need to be looked at for deploying DIVER into new places:

The metadata model in DIVER is complicated – it has several different, non-standard, ways to represent metadata, most of which are not configurable or extensible, and a lot of the metadata is not currently searchable.
DIVER has two configurable ‘levels’ of metadata that automatically group files together. At HIE they are "Facility" and "Experiment". There’s no extensible metadata per-installation; like CKAN’s simple generic name/value user-addable metadata. This is the only major configuration change you can make to customise an installation. This is a very common issue with this kind of software, no matter how many levels of hierarchy there are a case will come along that breaks the built-in model.

In my opinion the solution is not to put this kind of contextual stuff into repository software at all. Gerry Devine and I have been trying to address this by working out ways to separate out descriptions of research context from the repository, so the repository can worry only about keeping well-described content and the research context is described by a human-and-machine-readable website, ontology or database as appropriate; with whatever structure the researchers need to describe what they’re doing. Actually Gerry is doing all the work, building a new semantic CMS app that can describe research context independently of other eResearch apps.
There are a couple of hard-wired file preview functions (for images) and derived files (OCR and speech recognition) but no plugin system for adding new ones, so any new deployment that needed new derived file types would need a customisation budget.
The only data format from which DIVER can extract metadata is the proprietary TOA5 format owned by the company that produces the institute’s data-loggers. NETCDF would be more useful.
There are some user interface issues to address, such as making the default page for a data-file more compact.

Conclusion

There is a small community for the open source DIVER product, with two deployments, using it for very different kinds of research data. To date the DIVER community doesn’t have an agreed roadmap for where it might be heading and how the issues above might be addressed.

So at this stage I think it is suitable for re-deployment only into research environments which closely resemble HIE, probably including the same kinds of data-logger (I haven’t seen the other installation so can’t comment on that). It might be possible to develop DIVER into a more generic product, but there is no obvious business case for that at the moment over adapting a more widely adopted, more generic application. I think the way forward is for the current user-communities (of which I consider myself a member) to consider the benefits of incremental change, towards a more generic solution as they maintain and enhance the existing deployments, balancing local feature development over the potential benefits of attracting a broader community of users.

And another thing …

We discovered some holes in our end-to-end workflow for publishing data from HIEv to our Institutional Data Repository, and some gaps in the systems documentation, which we’re addressing as a matter of urgency.

Is HIEv aDORAble? by Peter Sefton is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

eResearch manager’s report 2014-07-28

Posted on 2014-07-28 by ptsefton

Introduction

Since the last meeting of the UWS eResearch Committee on May 22nd we have updated the eResearch roadmap to reflect where we are in relation to the plan as it was set out at the beginning of 2014.

In June I attended the Open Repositories conference and a couple of other events to do with open access to publications and data, including organising an open-data publications text-mining hackfest in Edinburgh.

Looking to the future, the eResearch team has been involved in two internal funding bids in the last week:

Research Portal 2 (P2): to develop a joined up research presence for the university, like the Research hub projects at Griffith and JCU.
More end-to end data management via more support for the AAAA data management program we’re already running.

UWS Events – Research Bazaar

Now that UWS has all our staff positions filled, we’re making a big push to do more outreach to researchers via a number of channels, including visiting departmental meetings and research forums, along with attempting to run as many eResearch-relevant training events as we can get takers for. This is all done with the help of the eResearch Communications Working Group chaired by Susan Robbins from the UWS library.

To build eResearch capability we’re trying out Research Bazaar approach, which started in Melbourne with Steve Manos and David Flanders.

What exactly, might you ask, is the ‘Research Bazaar’ aka “ResBaz”? #ResBaz is, first and foremost, a campaign to empower researchers in the use of the University’s core IT services:

Empowering researchers to collaborate with one another through the use of research apps on our cloud services.

Empowering researchers to share the data with trusted partners via our data services.

Empowering researchers to establish their reputation through our parallel computing and supercomputing services.

Empowering researchers to invent new ways of experimenting through our emerging technology services.

Our eResearch partners Intersect are helping with this; they offer a number of Learning and Development courses, and we’re talking to them about developing and importing more.

Speaking of importing eResearch training expertise, we ran the first of a series of Research Bazaar events: Mapping for the Digital Humanities powered Melbourne eResearcharians Steve Bennet and Fiona Tweedie.

Right at the beginning of July Alveo, the virtual laboratory for Communications Science was launched by the NSW chief scientist Mary O’Kane and UWS vice-Chancellor Barney Glover with a two-day event, starting with a hackfest day to generate ideas and interest, promote use of the lab and provide some hands on training. While we didn’t brand this as a Research Bazaar activity is certainly in the #resbaz spirit.

Projects

DC21/HIEv Wraps up

The HIEv project, née DC21 is now completed and HIEv has about 50 regular users at HIE. Thanks to Peter Bugeia at Intersect for project managing the final stages of the rollout and Gerry Devine, HIE data manager for promoting the software, and putting it to good use to build dashboards etc.

New features include:

Log in using your account at any.university.edu.au using the Australian Access Federation.
Share data securely with a research cohort until you’re ready to publish it to the world for re-use and citation.

New: Major Open Data Collection for the humanities

Our latest project, the Major Open Data Collections project funded by the Australian National Data Service is in the establishment phase:

Carmi Cronje is working with the ITS Project Management Office to establish the project and its various steering committees, boards etc.
The key staff member for the project, the data librarian has been appointed. Katrina Trewin, currently working in the UWS Library joins us on August 4th.

Adelta is nearly finished

The Adelta project is nearing completion, with users now testing the service:

User Interface work by Intersect is nearly done, pending some discussions with the Library about accessibility requirements.
Final bug fixes and tweaks are being applied, as per this milestone.
We are working with Sydney development company hol.ly to integrate the service with the Design And Art Online database, so that we have a true linked-data approach, with Adelta authors being identified using DAAO URIs. This builds upon one of the Developer Competition entries from Open Repositories 2014 – the Fill My List URI lookup service.

Wonderama

Andrew Leahy consulted for the Google Atmosphere event (Tue July 22) at the Australian Technology Park, Eveleigh. This was a Wonderama demonstration in collaboration with NGIS www.ngis.com.au, showcasing some of the NSW state government data hosted with Google’s geo platform.

Cr8it project rolls on

Cr8it is a collaboration between Newcastle, Intersect and UWS to build an application which live in a dropbox-like file Share Sync See service, so that people can move their research data from being sets of files, to well-described data collections in a repository.

User testing has started on parts of the software to do with selecting, and managing files.
Recent development work has been focussing on re-factoring the application to make it more testable, and easier to build on, once this is done we’re on the home straight to hook it up to the Research Data Repositories at UWS and Newcastle and start publishing data.

We are now seeing a lot of uptake of Cloudstor+, the AARNeT researcher-ready version of ownCloud, on which we are planning to put Cr8it from UWS users, for example Andrew Leahy reports that a few users a week are adopting it at his suggestion.

AAAA data management

Project to establish data management practices and infrastructure in the BENS group and the Structures Lab at IIE are continuing and we are developing new AAAA projects to start soon.

Meet DORA

New eResearch Analyst David Clarke has coined the term DORA: Digital Object Repository for Academe, a name for a generic service-oriented component for storing research data, which adheres to a set of eResearch principles David and the rest of the team are working on. We are currently evaluating software against the ideal DORA model. David’s happy to talk to you about this, as he has an Open-DORA policy ☺.

eResearch manager’s report 2014-7-28 by Peter Sefton is licensed under a Creative Commons Attribution 4.0 International License.

Internal update: UWS eResearch roadmap 2014 Q3 & 4

Posted on 2014-07-28 by ptsefton

About this document

This is the mid-year revision of University of Western Sydney eResearch team roadmap for 2014. This document will be consulted at the eResearch committee and working-group meetings to track progress throughout the year.

Summary

The timelines below have traffic-light colours to show progress. Green means things are going according to plan. Yellow means there have been delays or setbacks but these are being managed and monitored. Red means targets were not met. The main ‘red’ area is the Open Access policy – a draft has been developed and has received support from the eResearch committee and DVCR&D, is undergoing review in the office of the DVCR&D.

Assumptions

This plan assumes the current level of staffing and resources for the eResearch team and does not make any assumptions about further project funding, apart from the ANDS Major Collections project, which is in its initiation phase.

Vision

The eResearch team vision statement:

Support the objectives of the UWS research plan by creating an eResearch Ready UWS, where information and communications technologies support the collaborative conduct of high-impact, high-integrity research with minimal geographical and organisational constraints. eResearch will assist in the transition to a research culture where IT and communications technologies are integral to all research, from the fundamental underpinnings of data acquisition and creation, management and archiving, to analytical and methodological processes. Our aim is to work with stakeholders within and beyond the university to ensure UWS researchers have the information and communications technology resources, infrastructure, support and skills required, wherever they are on the path to an eResearch ready UWS.

How does this fit with the UWS research plan?

The eResearch plan is aligned with and supports the UWS Research plan. (Note this plan is now obsolete, a new one is coming with a greater emphasis on impact and community engagement and broadening research income beyond competitive grant income).

Objectives 1-3

Objective 1 – Increase external research income to the University
Objective 2 – Increase the number of fields of research at UWS operating above or well above world standard
Objective 3 – Increase the number and concentration of funded research partnerships

These objectives depend on UWS having a high-integrity research environment in which the institution will be able to support researchers in meeting their obligations under the Australian Code for the Responsible conduct of Research and funder expectations about data management, which is attractive to researchers, funders and collaborators. Building eResearch infrastructure, via the projects discussed below, and the forthcoming ITS research infrastructure roadmap will help create an environment conducive to successful income generation, and improve support for researchers aiming for high research performance.
During 2014 eResearch will begin replicating the successful roll out of end-to-end data management at HIE by creating small, tightly focused projects with clear success criteria which are aligned to the research goals of the university (Via the AAAA data management project methodology currently in development).
eResearch will continue to work closely with eResearch-intensive groups, for example by supporting phase two of Alveo (formerly HCS vLab) a NeCTAR grant ($1.3M, with a total project budget of ~ $3M) to set-up and implement a Virtual Laboratory for the multiple partners in the project: Above and Beyond Speech, Language and Music: A Virtual Lab for Human Communication Science.

Objective 4 – Ensure UWS attracts and graduates high quality Higher Degree Research (HDR) students to its areas of research strength.

During 2014 eResearch will be implementing programs to support HDR students, along with early-career researchers and the rest of the research community. This includes the establishment of self-supporting eResearch communities via a trial of the University of Melbourne ‘Research Bazaar’ model.

eResearch will work with our eResearch partner, Intersect to start delivering a broad range of eResearch training, building on previous training that has been delivered for High Performance Computing, see Communications and Organisational Development. HDR students will be key to this, as both one of the main audiences for training, and also serving as trainers, promulgating eResearch techniques and mind-set throughout the university.

Resources

Assumed Core Resources

eResearch Manager – Peter Sefton
eResearch Technical Advisor (~0.8 FTE) – Andrew Leahy
eResearch Support Officer / eResearch Analyst – TBA
eResearch Project Implementation Officer / Communications – Cornelia (Carmi) Cronje.
Intersect eResearch Analyst – Peter Bugeia

Other Resources

The resources are from other areas of the university and are financed by that cost centre. They are currently on loan to the eResearch team until October 2014.

Application Developer, ITS
Web Application Developer (provided by ITS – until ITS restructure unfolds)

Associates

The eResearch Associates are employed in key UWS research institutes or schools and work closely with the eResearch team and provide technical expertise to assist researchers.

Gerard Devine – HIE Data Manager –
Jason Ensor – Research Development Officer (Digital Humanities)
Nathan Mckinlay – ICT Professional Officer – IIE
James Wright – Technical Officer in Bioelectronics & Neuroscience – BENS

Funding

The eResearch team has no formal budget separately from the office of the DVCR&D. Recommendation: consolidate remaining project funds into an eResearch projects account to support projects in the eResearch portfolio.

Money that’s in the MS23 financial account ~ $22,244.24
RDR budget remaining ~ 100K (subject to confirmation from ITS)

Focus areas

Policy Working Group

The policy working group is chaired by Kerrin Patterson, Associate Director Performance and Quality (Acting), Office of Engagement, Strategy & Quality. The group has identified two priorities:

Establishing an Open Access (OA) policy for both research publications and research data.
Creating a Research Data Management (RDM) policy.

The working group has made substantial progress on the Open Access (OA) policy, and has asked the Manager, eResearch to review the policy framework at UWS, particularly the Research Code before starting on the Research Data Management (RDM) policy. Recent changes to Australian Research Council (ARC) funding rules for Discovery grants, mean this is now a pressing issue for both the OA and RDM policies at UWS:

A11.5.2 Researchers and institutions have an obligation to care for and maintain
research data in accordance with the Australian Code for the Responsible
Conduct of Research (2007). The ARC considers data management
planning an important part of the responsible conduct of research and
strongly encourages the depositing of data arising from a Project in an
appropriate publicly accessible subject and/or institutional repository.

Q1

Q2

Q3

Q4

Open Access Policy

Draft presented to DVCR

Policy adopted

Support DVCR&D in progressing policy thru the UWS process
Revise materials to support the policy, new Powerpoint slide show possible statements from Scott Holmes

See communications working group plan

Research Data Management policy

Review of UWS policy, particularly the Research Code

Review of UWS policy complete

Policy WG finish gap-analysis/comparison of UWS policies
Policy WG recommend whether we need an RDM policy and what its scope should be

Policy working group produce draft of RDM policy and/or updates to related policies

Communications and Organisational Development

The Communications working group is chaired by Susan Robbins, Research Services Coordinator for the UWS library. The following table sets out the broad goals for this area.

During 2014 the eResearch will be working with Intersect to establish an organisational development approach to eResearch under the “Research Bazaar” banner.

	Q1	Q2	Q3	Q4
Communications plans	Generic matrix to be used for eResearch messaging	Implement for eResearch website	Communications WG publish updated plan eResearch publish an events calendar	As directed by comms WG
Awareness campaign for OA policy	Launch of some sort?	Web pages published Webinars and face to face briefings	Publish web pages about the policy on main site Set up calendar for webinars and other outreach	Library to run OA promotion campaign to get more deposits ORS to include comms about OA in research lifecycle touchpoints
Capability- Building in research groups*	Planning	Produce training resources and communicate they exist?	Run 1 #ResBaz* workshop from Melb.* Book in 2 Intersect courses	1 event run at each of HIE, DHRC, MARCS Trial 1 Software Carpentry
Alignment of eResearch with research lifecycle	Planning / development	Two diagrams HDR and Researchers	Produce draft of lifecycle Get feedback on draft from stakeholders (lib, ORS, eResearch, researchers) Physical posters for use by key stakeholders	Publish lifecycle on eResearch website Integrate lifecycle into stakeholder websites
Dissemination: Conference presentations, journal articles, The Conversation etc	Identify potential topics and co-authors	Contact collaborators and commence writing online opinion pieces, blog posts etc. Submit conference abstracts	Open Repositories	eResearch Australasia Facilitate BOF session
eResearch included in Research Training agenda and materials	Planning	Plan established with ORS	Plan with ORS (Mary Krone, Luc Small)	As per plan
Work with Intersect on establishing Research Bazaar	Planning	Run as many existing Intersect courses as possible/relevant. Initial pilot of Melbourne Uni courses	Run existing Intersect courses. Expanded pilot of Melbourne Uni courses Software carpentry	Research Bazaar established, program to be maintained jointly by Intersect and eResearch team
Wonderama internal & organisational dev	Developing Wonderama as platform for the Digital Humanities and the Project For Western Sydney outreach and consulting. Developing a consulting/business model ($$)
		Google Summer of Code
	PX students UWS Solar Racer
		CompSci Advanced projects?
Wonderama external and outreach activities ($$) = paid gig	UWS HiTech Fest (Careers market)	iFly Downunder launch at Panthers (indoor skydiving) ($$) CeBIT conference (SCEM to sponsor?)	Google Atmosphere ($$)	TBD

** #resbaz = Research Bazaar

Measures of success

*Capability building: Count number of figures/tables/citations/programs in publications/theses produced using workshop tools and/or programming languages.

eResearch Projects

The following table lists projects which report to the eResearch Projects Working Group committee. This table shows the broad project stage for each project over the year, a separate schedule/dashboard which will be presented to the eResearch Projects committee will show detailed targets for each.

	Q1	Q2	Q3	Q4
Adelta	Phase 1 finished	Discuss library hosting of Adelta	Possible Integration into Library search box for greater discoverability HI Sandra,	Google analytics to measure use
Cr8it core app	Negotiate sustainable support offer from Intersect/AARNET	Start of trials	Implementation	Realisation
ANDS Major collection	Scoping complete	Project running	Project running	Project running
AAAA Data Management Projects
HIEv	Realisation	Realisation (Set up reporting of research-focused metrics)	Realisation	Realisation
IIE Structures Lab	Planning, initiation	Implementation	Realisation	Realisation
MARCS BENS	Planning, initiation	Implementation	Realisation	Realisation
To Be Advised (Digital Humanities)		Planning	Initiation	Implementation
To Be Advised (something sciency)		Planning	Initiation	Implementation
Establish “AA” data management for facilities (Acquire & Archive)
AMCF (SEM+)
SIMS	Planning	Implementation	Realisation
NGS (Sequencing)	Planning	Implementation	Realisation
BMRF (NMR)			Planning	Implementation
MSF (MassSpec)			Planning	Implementation
CBIF (Confocal)			Planning	Implementation

AAAA projects: measures of success

Each AAAA data management project will be measured with a variety of metrics. Targets will be agreed with the project stakeholders both at project initiation and in the realisation phase and maintained in a separate AAAA dashboard. These metrics are designed to show not just raw use of the AAAA methodology in terms of users or data sets (both of which are gameable metrics) but to focus on the effect of the AAAA program on research performance and ‘eResearch readiness’.

R#	Number of researchers who have been inducted/trained and have access to AAAA infrastructure
DAR	Datasets Archived in RDR
ACD	Total # of articles in UWS publications repository citing datasets in RDR (including via repository metadata)
IDMP	Institute or research-cohort Data Management Plan(s) in place
GRDMP	Number and value of current grants which reference formal data management plans

Infrastructure Working Group

Infrastructure planning is in discussion with ITS Strategy. A technology roadmap is being produced with the ITS Roadmap Builder Tool. This will be published as a separate plan.

Intersect Engagement

The relationship between Intersect and UWS is covered by a member engagement plan (in development for 2014).

eResearch Team organisational Development

Capability	Q1	Q2	Q3	Q4
eResearch tool awareness	Team familiarity with data capture applications (eg): CKAN MyTardis	“Notebook programming” Rstudio Python Notebooks ShaderToy	Academic authoring tools: LaTex, Markdown, Pandoc, EPUB etc	TBA
Communications		Visual comms/ whiteboard training	TBA	TBA
Software development		eResearch tech people to attend workshop in one language*	eResearch tech people to attend workshop in one language	Team familiarity with modern programming principles and environments**
Conferences	Australasian Digital Humanities (Perth)	Open Repositories (Helsinki) Google I/O (SF)		eResearch Australasia (Melb) Google Open Source Summit (SF) OzViz workshop (Bris?)

Metrics

*Certificate in Software Carpentry (Python/R)
**Team members to complete one MOOC or otherwise demonstrate professional development

First Research Bazaar event at UWS, Mapping for humanities

Posted on 2014-07-28 by ptsefton

The eResearch team, just finished running a two day session on mapping tools for the humanities, delivered by visiting trainers from the University of Melbourne eResearch team, under the Research Bazaar #resbaz umbrella. Resbaz is about enabling communities of practice for eResearch, rather than building expensive centralized support. We had lots of positive feedback from participants, and a good vibe; you know it’s working when people sit at the computers and keep playing well after the lunch has arrived.

The session served-up two main packages:

CartoDB – a nice online tool for map building – putting (fancy) dots on online maps. See the slides. CartoDB is available as a paid service, but stay tuned for a version that’s free for researchers.edu.au.
Tilemill, a more comprehensive tool for making publication quality print and online maps (available as a desktop app).

More workshops coming soon – see these offerings from Intersect, our eResearch partner. The Open Refine course in particular is really useful for anyone who deals with spreadsheet or table data.

5 August 2014 Cleaning & exploring your data with Open Refine at UWS.
5 August 2014: Data Visualisation with Google Fusion Tables at UWS.

We don’t have all the results in from the official feedback survey yet, but the verbal feedback was positive from the participants. One thing we’d like to look at for future #resbaz training is making sure we add a little dash of data management and consideration of the end-to-end research process to each workshop.

Depending on the course, take the time at the start to set people up with Cloudstor+ storage, a git repository or another appropriate management system for working data and a place to publish results, maybe github, maybe figshare, or a discipline specific or institutional repository.
Keep online notes, maybe using one of the online lab/research notebook platforms – (we’re watching Egon Willighagen’s ongoing review of these systems attentively – please keep it up Egon!).
At the end of the workshop, publish something – in the case of the maps it would be good to actually work though the process of getting a good print or web version of the map, and making sure all the data and code used to create it are saved and published.
Oh, and I’d love to be able to offer a prize for the first published map in an article or submitted thesis to come out of the workshop.

First Research Bazaar event at UWS, Mapping for humanities by Peter Sefton is licensed under a Creative Commons Attribution 4.0 International License.

Trip report: Peter Sefton @ Open Repositories 2014, Helsinki, Finland

Posted on 2014-07-04 by ptsefton

Trip report: Peter Sefton @ Open Repositories 2014, Helsinki, Finland by Peter Sefton is licensed under a Creative Commons Attribution 4.0 International License.

From June 9th-13 ^th I attended the Open Repositories conference way up North in Helsinki. This year I was not only on the main committee for the conference, but was part of a new extension to the Program Committee, overseeing the Developer Challenge event, which has been part of the conference since OR2008 in Southampton . I think the dev challenge went reasonably well, but probably requires a re-think for future conferences, more on that below.

In this too-long-you-probably-won’t read post I’ll run through a few highlights around the conference theme, the keynote and the dev event.

Summary: For me the take-away was that now we have a repository ecosystem developing, and the OR catchment extends further and further beyond the library, sustainability is the big issue , and conversations around sustainability of research data repositories in particular are going to be key to the next few iterations of this conference. Sustainability might make a good theme or sub-theme. Related to sustainability is risk; how do we reduce the risk of the data equivalent of the serials crisis if there is such a crisis it won’t look the same, so how we will stop it?

View from the conference dinner

Keynote

The keynote this time was excellent. Neuroscientist Erin McKiernan from Mexico gave an impassioned and informed view of the importance of Open Access: Culture change in academia: Making sharing the new norm (McKiernan, 2014). Working in Latin America McKiernan could talk first-hand about how the scholarly communications system we have now disadvantages all but the wealthiest countries.

There was a brief flurry of controversy on Twitter over a question I asked about the risks associated with commercially owned parts of the scholarly infrastructure and how we can manage those risks. I did state that I thought that Figshare was owned by McMillan’s Digital Science, but was corrected by Mark Hahnel; Digital Science is an investor, so I guess “it is one of the owners” rather than “owns”. Anyway, my question was misheard as something along the lines of “How can you love Figshare so much when you hate Nature and they’re owned by the same company”. That’s not what I meant to say, but before I try to make my point again in a more considered way, some context.

McKiernan had shown a slide like this:

My pledge to be open

I will not edit, review, or work for closed access journals.

I will blog my work and post preprints, when possible.

I will publish only in open access journals.

I will not publish in Cell, Nature, or Science.

I will pull my name off a paper if coauthors refuse to be open.

If I am going to ‘make it’ in science, it has to be on terms I can live with.

Good stuff! If everyone did this, the Scholarly Communications process would be forced to rationalize itself much more quickly than is currently happening and we could skip the endless debates about the “Green Road” and the “Gold Road” and the “Fools Gold Road”. It’s tragic we’re still debating in this using this weird colour-coded-speak twenty years in to the O A movement .

Anyway, note the mention of Nature .

What I was trying to ask was: How can we make sure that McKiernan doesn’t find herself, in twenty years time, with a slide that says:

“I will not put my data in Figshare”.

That is, how do we make sure we don’t make the same mistake we made with scholarly publishing? You know, where academics write and review articles, often give up copyright in the publishing process, and collectively we end up paying way over the odds for a toxic mixture of rental subscriptions and author-pays open-access, with some risk the publisher will ‘forget’ to make stuff open.

I don’t have any particular problem with Figshare as it is now, in fact I’m promoting its use at my University, and working with the team here on being able to post data to it from our Cr8it data publishing app . All I’m saying is that we must remain vigilant. The publishing industry has managed to transform itself under our noses from: much needed distribution service of tangible goods ; to rental service where we get access to The Literature pretty-much only if we keep paying ; to its new position as The custodian of The Literature for All Time , usurping libraries as the place we keep our stuff.

We need to make sure that the appealing free puppy offered by the friendly people at Figshare doesn’t grow into a vicious dog that mauls our children or eats up the research budget.

So, remember, Figshare is not just for Christmas.

Disclosure: After the keynote, I was invited to an excellent Thai dinner by the Figshare team, along with Erin and a couple of other conference-goers. Thanks for the Salmon and the wine, Mark and the Figshare investors. I also snaffled a few T-Shirts from a later event ( Disruption In The Publishing Industry: Digital, Analytics & The Future ) to give to people back home.

Figshare founder and CEO Mark Hahnel (right) and product manager Chris George hanging out at the conference dinner

Conference Theme, leading to discussions about sustainability

The conference theme was Towards Repository Ecosystems .

Repository systems are but one part of the ecosystem in 21st century research, and it is increasingly clear that no single repository will serve as the sole resource for its community. How can repositories best be positioned to offer complementary services in a network that includes research data management systems, institutional and discipline repositories, publishers, and the open Web? When should service providers build to fill identified niches, and where should they connect with related services? How might these networks offer services to support organizations that lack the resources to build their own, or researchers seeking to optimize their domain workflows?

Even if I say so myself, the presentation I delivered for the Alveo project (co-authored with others on the team) was highly theme-appropriate; it was all about researcher-needs driving the creation of a repository service as the hub of a Virtual Research Environment, where the repository part is important but it’s not the whole point .

I had trouble getting to see many papers, given the dev-wrangling, but there was definitely a lot of eco-system-ish work going on, as reported by Jon Dunn :

Many sessions addressed how digital repositories can fit into a larger ecosystem of research and digital information. A panel on ORCID implementation experiences showed how this technology could be used to tie publications and data in repositories to institutional identity and access management systems, researcher profiles, current research information systems, and dissertation submission workflows; similar discussions took place around DOIs and other identifiers. Other sessions addressed the role of institutional repositories beyond traditional research outputs to address needs in teaching and learning and administrative settings and issues of interoperability and aggregation among content in multiple repositories and other systems .

One session I did catch (and not just ‘cos I was chairing it) had a presentation by Adam Field and Patrick McSweeney on Micro data repositories: increasing the value of research on the web (Field and McSweeney, 2014). This has direct application to what we need to do in eResearch, Adam reported on their experience setting up bespoke repository systems for individual research projects, with a key ingredient missing in a lot of such systems; maintenance and support from central IT. We’re trying to do something similar at the University of Western Sydney, replicating the success of a working-data repository at one of our institutes ( reported at OR2013 ) across the rest of the university, I’ll talk more to Adam and Patrick about this.

For me the most important conversation at the conference was around sustainability. We are seeing more research-oriented repositories and Virtual Research Environments like Alveo, and it’s not always clear how these are to be maintained and sustained.

Way back, when OR was mainly about Institutional Publications Repositories (simply called Institutional Repositories, or IRs) we didn’t worry so much about this; the IR typically lived in The Library, the IR was full of documents and The Library already had a mission to keep documents. Therefore the Library can look after the IR. Simple.

But as we move into a world of data repository services there are new challenges:

Data collections are usually bigger than PDF files, many orders of magnitude bigger in fact making it much more of an issue to say “we’ll commit to maintaining this ever-growing pile of data”:
“There’s no I in data repostory (sic)” – i.e. many data repositories are cross-institutional which means that there is no single institution to sustain a repository and collaboration agreements are needed. This is much, much more complicated that a single library saying “We’ll look after that”.

And as noted above, there are commercial entities like Figshare and Digital Science realizing that they can place themselves right in the centre of this new data-economy. I assume they’re thinking about how to make their paid services an indispensible part of doing research, in the way that journal subscriptions and citation metrics services are, never mind the conflict of interest inherent in the same organization running both.

Some libraries are stepping up and offering data services, for example, work between large US libraries.

The dinner venue

The developer challenge

This year we had a decent range of entries for the dev challenge, after a fair bit of tweeting and some friendly matchmaking by yours truly. This is the third time we’ve run the thing a clearly articulated set of values about what we’re trying to achieve .

All the entrants are listed here, with the winners noted in-line. I won’t repeat them all here, but wanted to comment on a couple.

The people’s choice winner was a collaboration between a person with an idea, Kara Van Malssen from AV Preserve in NY, and a developer from the University of Queensland, Cameron Green, to build a tool to check up on the (surprisingly) varied results given by video characterization software . This team personified the goals of the challenge, creating a new network, while scratching an itch, and impressing the conference-goers who gathered with beer and cider to watch the spectacle of ten five-minute pitches.

My personal favorite came from an idea that I pitched (see the ideas page ) was the Fill My List framework, which is a start on the idea of a ‘ Universal Linked Data metadata lookup/autocomplete ’. We’re actually picking up this code and using it at UWS. So while the goal of the challenge is not to get free software development for the organizers that happened in this case (yes, this conflict of interest was declared at the judging table). Again this was a cross-institutional team (some of whom had worked together and some of whom had not). It was nice that two of the participants, Claire Knowles of Edinburgh and Kim Shepard of Auckland Uni were able to attend a later event on my trip at a hackfest in Edinburgh . There’s a github page with links to demos .

But, there’s a problem. The challenge seems to be increasingly hard work to run, with fewer entries arising spontaneously at recent events. I talked this over with members of the committee and others. There seem to be a range of factors:

The conference may just be more interesting to a developer audience than it used to be. Earlier iterations had a lot more content in the main sessions about ‘what is a(n) (institutional) repository’ and ‘how do I promote my repository and recruit content’ whereas now we see quite detailed technical stuff more often.
Developers are often heavily involved in the pre-conference workshops leaving no time to attend a hack day to kick of the conference.
Travel budgets are tighter so if developers do end up being the ones sent they’re expected to pay attention and take notes.

I’m going to be a lot less involved in the OR committee etc next year, as I will be focusing on helping out with Digital Humanities 2015 at UWS. I’m looking forward to seeing what happens next in the evolution of the developer stream at the OR conference. At least it’s not a clash.

The Open Repositories Conference (OR2015) will take place in Indianapolis, Indiana, USA at the Hyatt Regency from June 8-11, 2015. The conference is being jointly hosted by Indiana University Libraries , University of Illinois Urbana-Champaign Library , and Virginia Tech University Libraries .

This pic got a few retweets

References

Field, A., and McSweeney, P. (2014). Micro data repositories: increasing the value of research on the web. http://eprints.soton.ac.uk/364266/.

McKiernan, E. (2014). Culture change in academia: Making sharing the new norm. http://figshare.com/articles/Culture_change_in_academia_Making_sharing_the_new_norm_/1053008.

[This document was originally posted on Peter Sefton’s blog.]

By Peter Sefton and Peter Bugeia, with input from the UWS eResearch community and beyond

About this post

During 2012 The University of Western Sydney (UWS) will be rolling out a Research Data Repository (RDR). Peter Bugeia from Intersect put together the project proposal which secured the budget to move ahead with this project and Andrew (Alf) Leahy has been organising storage for research groups on request, ahead of the RDR becoming properly operationalised.

Part of this RDR project will be a project funded by the Australian National Data Service (ANDS) under the Metadata Stores programme. There will be a number of meetings over the next couple of months where Australian Universities discuss how they are going to spend their Metadata Stores money. There is a great opportunity for organisations running similar software architectures and solutions to pool their development resources; we’ve already spoken with Newcastle about this. We hope this preliminary material will help others in the community work out how much we have in common or tell us where we might be able to improve things. More on that at the end of the post.

Before we go on, about that term ‘Metadata Stores’

In ANDS speak, a Metadata Store is something that has all the metadata about research data and associated entities like parties (people, organisations) and activities (grant funded projects for example), in order that collection descriptions (of research data) can be published to their search service Research Data Australia (RDA).

But in many organisations we’re lacking the most important thing. We don’t have the descriptions of research data to suck up into a ‘metadata store’. Yes, there are data capture projects, lots of them which will attempt to automate to some extent the packaging process, but there’s a lot of stuff out there that won’t get captured this way. And even with automated data capture, someone still has to write the human readable description as a framework for the automated part. Many of us are getting librarians involved in this enterprise Why? They’re good at it. From a library point of view, this is a growth area, in a time when many other functions of the library are changing and/or disappearing.

Describing collections is a key part of this whole ANDS agenda, which strangely sometimes gets a bit forgotten. For example, in this diagram from the ANDS website, “collection descriptions” appear only outside of the institution. When working on these projects it’s important to keep thinking about where these descriptions are going to come from.

ANDS diagram showing Collection Descriptions in the Research Data Commons. But think: who’s going to provide the descriptions in the institution – they’re not going to manifest spontaneously as they appear to here!

The RDR

The repository will consist of two main components:

A scalable storage service linked to a combination of local and cloud-based high performance computing. Some data may also reside in other, trusted storage systems such as national infrastructure or discipline repositories with suitable governance in place.
A catalogue of research data for internal use in management, and external use in dissemination and collaboration.

Unlike a typical monolithic Institutional Repository (IR) the storage and catalogue services are disaggregated, because the data involved can be large and is much more varied in nature than the typical contents of an IR. Also, some data will reside in trusted data stores outside of the central storage supplied by IT. Not to mention that some of it is on paper and some on obsolete digital media.
But the project is about much more than supplying storage and computing. It is about creating an organisational capability and culture of managing research data throughout the research lifecycle. We aim:
To enable research in all disciplines at UWS to take place efficiently and effectively on existing and new data sets.
To enable the validation of research through appropriate management of data inputs and outputs.
For re-use in new research which will cite the creators of data sets at UWS.
For compliance with funder requirements and codes of practice.

The RDR will be a locally governed service linked to national infrastructure for dissemination of research data, data storage and computing

The repository software stack will consist of:

Scalable, managed file storage for both working and archived data, including any discipline-specific metadata describing that data
Access to virtualized and non-virtualized computing infrastructure at UWS, national and commercial facilities so that researchers can run data analysis tasks.
A research data catalogue containing metadata about data at a collection level for code-compliance, strategic research management and discovery purposes.
Interfaces to national research infrastructure including storage and computing (RDSI, NCI, NeCTARetc).

The initial storage component of the RDR was established in 2010. The next steps in the RDR project are to design the architecture that links the storage to computing infrastructure and cataloguing applications, and buy a lot more discs. This architectural work will be undertaken by the eResearch Unit (That’s Peter Sefton, Andrew Leahy and our Intersect eResearch Analyst Peter Bugeia),Information Technology Services, and the University Library.

The RDR will serve several communities:

Community	RDR services
Researchers	A safe place to put working data. A platform for collaboration around data. Archival storage, to meet researcher and departmental obligations to preserve data. A platform for describing and advertising data sets for discovery and re-use. A platform for publishing open access data A platform for protecting confidential or trade-secret data, managing embargoes and disposal dates. Direct connection to scalable computing resources. Minimal extra work for researchers over current interactions with ORS (grant applications, reporting etc).
Office of Research Services	An institutional view of where data resides. Integration with the research lifecycle and existing ORS systems Improved visibility for UWS research
The Library	Curation tools for research-services librarians to describe data sets Descriptions of data collections become part of the library’s holdings.
Information Technology Services	Consolidation of research data into one service model (currently storage provisioning is distributed and ad hoc).
UWS eResearch team	University processes around the RDR will make it easier to identify where eResearch needs are at UWS. Basic data storage is a ‘gateway’servicefor researchers into embracing more advanced computing and data-driven research (ieeResearch).
General community	The public will have an improved view of what research is conducted at UWS. Potential future Australian Government or other reporting on data re-use and citation rates.
External researchers	Discoverable data, people and projects Immediate re-use of open-access data where possible Mediated use data where possible, in collaborative projects or by arrangement with data creators

Fitting in to UWS

The RDR will be integrated with processes for managing the research lifecycle. It is a design goal that all researcher interaction with the system will have as little impact on the researchers as possible, by aligning form-filling and bureaucracy with existing and/or inevitable processes such as grant application or government reporting.

Storage allocation will be tied to stages in the research lifecycle where researchers already interact with the Office of Research Services, such as grant application time. Thus all funded research projects will have appropriate working and archive storage allocated from the project start.

Reporting on published data collections will be aligned with events such as grant-funded projects concluding.

More about the repository software architecture

A repository is not just a software application. It’s a lifestyle. It’s not just for Christmas. And if you build it they almost certainly won’t come. The main repository-like components of the RDR, the storage and catalogue, will be loosely coupled, but there will need to be overall repository governance in place to make sure that data is well looked after. (We’re setting aside discussion of the computing component for now, more on that soon.)

More detail about the RDR

To implement the Research Data Catalogue UWS will be using the ReDBox Research Data Catalogue application working alongside a scalable storage system (we have a post about what kind of storage and how it might be organised coming soon).

Above, we talked above about the need for systems that can describe research data collections. In most organisations we don’t know what data we have, because there is no catalogue or registry with metadata about data collections sitting there ready to be aggregated. The ReDBox application will fill this gap; it is the place where the library team will work with researchers to describe their collections, check the descriptions and finally publish them, with metadata that is as high in quality as we can manage.

Some collection descriptions will come, at least in part, from data capture software systems. It’s worth noting that many data capture projects could be a lot more ephemeral than the repository. They could be project based, or they could use software that does not have a long working life, so one of the core assumptions we’ve made is that for many data capture applications there will be a step when data crosses the curation boundary into the repository, where we know it will be looked after for the appropriate length of time. Data may or may not be moved into the storage component in the repository depending on a number of factors, including its size, the nature of the data capture application and whether it has the right governance in place to ensure that data can be managed to the standards set for the RDR. There’s a lot more to say about this interface, we’re preparing another post with a detail based on a specific implementation being built by Intersect and UWS.

Open questions for the UWS deployment include:

Will the catalogue itself be used for discovery or do we want to publish information about research data holdings via some other means, such as the VITAL-powered institutional repository, or via a discovery layer service?
Will we insist on a single RDR API to program against, or be more relaxed about how various services talk to the storage, catalogue or name authority service directly? At stake here is the integrity of the contract with researchers and the world, the contract that the catalogue matches what data we have.
Experience suggests that trying to make everything go through one tiny little software keyhole won’t work. Governance is governance, software that gets in the way gets routed-around, and it will take people to honour the various contracts involved such as the one that says once you assign a DOI (an ID, as seen in journal publishing) to a data set you don’t change the data.

Opportunities for collaboration

A number of Australian Universities have money under the ANDS Metadata Stores funding stream for projects like the UWS Research Data Catalogue. There will be lots of models for what a metadata store looks like, but if we can find some common ground then we should be able to:

Form a consortium of some kind, of like-minded institutions, using similar software componentry.
Work out which software, documentation, training course requirements are shared across sites.
Pool some of our funding to develop the above.

Simple, apart from the complexities of legal contracts etc. But there might be ways to divide up work, do it at different institutions and contribute it back to the commons; all ANDS-funded software deliverables are open source, and documentation etc can be made open access.

A good place to look for commonality is in the ANDS recommended deliverables.

D1	A working feed of records describing Collections and associated Activities, Parties and Services to Research Data Australia, in the current version of RIF-CS (1.3), demonstrated to meet the quality requirements for RIF-CS records as set by ANDS.
D2	A feed of collections from at least three distinct Faculties (or equivalent organisational units) within the institution to Research Data Australia.
D3	Demonstrated alignment of metadata records about Parties with an institutional name authority (HR or Library), with the authoritative form of the name sourced external to the metadata store, and with new researcher descriptions added to the metadata through regular updates from the name authority.
D4	Demonstrated alignment of metadata records about Parties with the ARDC Party Infrastructure Project, with researcher descriptions contributed to the NLA, and with People Australia identifiers for researchers recorded against researchers.
D5	Demonstrated alignment of metadata records about Activities with institutional and external sources of truth (Research Office, ARC and NHMRC grant registries), with the authoritative description of the Activity sourced external to the metadata store, and with new researcher project added to the metadata through regular updates from the sources of truth.
D6	Demonstrated workflow for registering new Collections in the university; this can include automated update, or semi-automated (notification-based).
D7	A software system to realise deliverables D1–D6 (and D8, D13–D14 if applicable), with robust storage and management of metadata.

To summarise, a comment about these deliverables in the light of the architecture sketched above. D6 is huge. “Demonstrated workflow for registering new Collections in the university” Most of the other deliverables are dealing with data that’s already well described, and needs to be integrated into a metadata store. One could argue that we could quite easily link a collection to a URL about a grant funded project (there’s your activities taken care of), forget about describing people at the national library or trying to set up yet another local ID, and put resources into getting ORCID up and running. But what about the all-important collections? Isn’t that the whole point? Note to other Metadata Stores project people: don’t forget, the workflow for ‘registering’ collections presupposes that you have a way of describing the collections in the first place, and making sure you can manage those collection descriptions even if some of the more ephemeral data sources (such as data capture projects) disappear.

Copyright Peter Sefton and Peter Bugeia, 2012-02-14. Licensed under Creative Commons Attribution-Share Alike 2.5 Australia. <http://creativecommons.org/licenses/by-sa/2.5/au/>

Key Facts
Funding Source/Amount:	ANDS – Data Capture Program – $200k
Lead Organisation/CIs:	University of Western Sydney Hawkesbury Institute of the Environment Prof. Ian Anderson
Timeframe:	Development commenced in December 2011 and the system will go live in the latter half of 2012.
Related Projects:	TERN/OzFlux: The present project is best regarded as supporting the precursor activities that enable the delivery of quality assured data to a facility such as OzFlux.

TL;DR

Good

Bad

But maybe it can do Linked Data?

What we did

More on stretching Omeka

What would an Omeka service look like?

Where did Intersect DIVER come from?

So, what’s the verdict?

Conclusion

And another thing …

eResearch manager’s report

2014-07-28

Introduction

UWS Events – Research Bazaar

Projects

DC21/HIEv Wraps up

New: Major Open Data Collection for the humanities

Adelta is nearly finished

Wonderama

Cr8it project rolls on

AAAA data management

Meet DORA

About this document

Summary

Assumptions

Vision

How does this fit with the UWS research plan?

Objectives 1-3

Objective 4 – Ensure UWS attracts and graduates high quality Higher Degree Research (HDR) students to its areas of research strength.

Resources

Assumed Core Resources

Other Resources

Associates

Funding

Focus areas

Policy Working Group

Communications and Organisational Development

Measures of success

eResearch Projects

AAAA projects: measures of success

Infrastructure Working Group

Intersect Engagement

eResearch Team organisational Development

Metrics

Keynote

Conference Theme, leading to discussions about sustainability

The developer challenge

Do you have data that needs to be displayed on a map?

What will I learn?

Who should come?

When and how will these workshops happen?

About the Trainers

When

Where

Workshop details

Cost

RSVP

What’s Alveo?

Who should attend?

Which day to attend?

Join the Research Bazaar

Q. Why are we here? A. Impact & Integrity

Research Integrity

Advisors…

Research Impact

Research Impact: new ways of working

Q. Why are we here? A. Training and organizational development

We’re building capability by grass-roots training and engagement

Q. Why are we here? A. To help set the agenda for our IT department

Number one take-away from today! To get help or advice

Penrith (Kingswood) – Wonderama

Penrith (Werrington South) – the Research Data Repository

Bankstown

Hawkesbury

Parramatta

A look at the (potential) new airport

Historical images in modern context

What’s in the CKAN?

UWS eResearch