How eResearch-y are you?

Posted on 2014-11-30 by ptsefton

How eResearch-y are you? An extremely serious quiz comprised of eight multiple-choice questions.

Your latest published article has a graph in it. If the eResearch police asked you to reproduce the plot exactly using the original data you’d:
1. Check out the code archived with the article, and re-run the make-file, which would not only re-generate the plot using Knitr, but the whole article, which would also be made available as an interactive website using Shiny with an option to re-run the models on data which is crowd-transcribed from the logs of 17th century slave ships.
2. Redo the diagram in Excel, using the clearly set out method and supplemental material from the article.
3. Find the data (by borrowing back last year’s laptop from a postgrad), then fiddle around with what you think is the right spreadsheet make something that looks pretty much like the one in the paper.
4. Plot? What plot? And what was all that babble in option A?

Turns out that some of the photos and recordings you made when documenting a research site contain images and sounds of a Yeti. If you can provide complete records of where and when you collected this data, you can collect a $1,000,000 prize from a cable TV station. Your next step is to:
1. Provide the DOI to the dataset which you have archived in your institution’s data repository. The repository record with the data attached provides all the information required to support your claim.
2. Scan the relevant pages from your field notebook and annotate these with supporting information specific to the Yeti sighting.
3. Rummage around the office: you last saw that scrap of paper you scribbled on during the fieldwork with the pile on top of your filing cabinet.
4. Quickly throw together some handwritten notes and scorch them with a candle so they look old. No actually, you couldn’t be bothered. Also you don’t believe in Yetis or Santa.
You have so much data to analyse and your models are getting so complicated that your laptop is getting hot, so you:
1. Use Docker to create a 128 Node compute cluster in the NeCTAR cloud, get some results, archive all the code, data and outputs with DOIs and go home early.
2. Enrol in Intersect’s High Perfomance Computing (HPC) courses and learn how to run your job on shared infrastructure.
3. Give it to one of the PhD students to sort out.
4. We have you mixed up with someone else – your iPad never gets hot unless you watch too much YouTube in the sun.

When archiving data you always:
1. Take care to use standard file-formats that are easily machine readable, and make sure all code and as much provenance information as possible, are also archived.
2. Fill in the metadata fields on the institutional data catalogue application as carefully as you can.
3. Try to change the worksheet names on your Excel files from Sheet 1 to something more meaningful, if you get time.
4. Use the shredder in the research office. It’s more fun than the old technique of scrunching up the envelope on which the data were written and trying to get it in the bin for a three-pointer.

The best place to store research data during your project is:
1. On a secure, backed-up, cloud storage server (with data held in an appropriate jurisdiction) which you can access from anywhere with an internet connection, and share with designated collaborators.
2. On a secure, backed-up drive accessible only from your office.
3. On a Dr Who USB stick.
4. You delete your raw data after you’ve analysed it. Although, actually, sometimes raw data doesn’t agree with you; so you cook some up to better fit your conclusions.

A data management plan is:
1. An important tool which facilitates planning for the creation, storage, access and preservation of research data. Creating this at the start of a research project and referring to it as a living document informs the research workflow and specifies how data will be managed.
2. Something to think about once you’ve collected some data.
3. More paperwork to bog down the research process, like Ethics. Oh for the good old days when we used to be able to electrocute the students without filling out so many forms.
4. Data management plan? I’m not even in management so don’t interupt me, I’m enjoying my holidays

Collaborative research is:
1. Enabled by eResearch technologies and supported by Open Access to published research data.
2. Maximising the funding universities receive by sharing resources and equipment for a research project.
3. Popping next door to ask a colleague a question.
4. Not something you’re interested in. Your data will die with you.

If you wanted to share your completed research dataset with others you would:
1. Contact the Library or eResearch and discuss publishing the data and related methodology to the institutional data catalogue, which can then also be included in the Research Data Australia discovery portal. The data would be described using appropriate metadata, and linked to related collections, fields of research, people and facilities.
2. Publish the data on your personal website and ask people to contact you via a hotmail address for more information.
3. Email the file to colleagues you think would be interested
4. You told us before – your data will die with you.

Your score:

Mostly As – We’d love to talk to you about becoming an eResearch champion. You have embraced the benefits of eResearch technology and methodology and have put comprehensive plans in place for the use and re-use of your valuable data.

Mostly Bs – You understand that technology is a useful tool but you’re hesitant to rely on it for your research. Try putting aside your trust issues and play around with one new tool or habit this week – it might spark an idea or save you valuable time. There are lots of opportunities to attend training or do a self-paced online course to increase your comfort level.

Mostly Cs – It might be time to chat to the eResearch team about joining the 21st century. Although your existing research process may be valid, eResearch boosts the research process through opportunities to add computing power, streamline workflows, and collaborate with like-minded researchers from around the globe.

Mostly Ds – Bah, humbug.

How eResearch-y are you? An extremely serious quiz comprised of eight multiple-choice questions. by Peter Sefton & Katrina Trewin is licensed under a Creative Commons Attribution 4.0 International License.

Thanks Kim Heckenberg for your input and sorry Alf, we didn’t put in anything about multi-screen immersive visualization.

Is Omeka aDORAble?

Posted on 2014-11-04 by ptsefton

So, we have been asking looking at a few different software packages, and putting them through their paces at a series of Tuesday ‘tools days’ hosted by UWS eResearch, asking “Is this software going to be one of our supported Working Data Repositories for researcher cohorts?” That is, how does it rate as a DORA, a Digital Object Repository for Academe?

Last month we had our biggest ever tools-day event, with external people joining the usual eResearch suspects. Thanks to Jacqueline Spedding from the Dictionary of Sydney, Michael Lynch & Sharyn Wise from UTS and Cindy Wong and Jake Farrell from Intersect for coming along.

Omeka is a lightweight digital repository / website building solution, originally targeting the Galleries, Archives & Museums space.

TL;DR

So what were we wanting to know about Omeka? The external folks came along for a variety of reasons but at UWS we wanted to know the following (with short answers, so you don’t have to read on).

Is this something we can recommend for researchers with the kinds of research collections Omeka is known for?

Answer: almost certainly yes, unless we turn up any major problems in further testing, this is a good, solid, basic repository for Digital Humanities projects. So, for image and document based collections with limited budgets this looks like an obvious choice.
Can Omeka be used to build a semantically-rich website in a research/publishing project like the Dictionary of Sydney?

(The reason we’re asking this, is that UWS has a couple of projects with some similarities to the Dictionary, and we at UWS are interested in exploring what options there are for building and maintaining a big database like this. The Dictionary uses an open source code-based called Heurist. Anyway, we have some data from Hart Cohen’s Journey to Horseshoe Bend project which was exported from an unfinished attempt to build a website using Heurist).

The verdict? Still working on it, but reasonably promising so far.

Beyond its obvious purpose, is this a potential generic Digital Object Repository for Academe (DORA)?

Maybe. Of all the repository software we’ve tried at tools-days and looked at behind the scenes, this seems to be the most flexible and easily approachable.

Good

Omeka has a lot to recommend it:

It’s easy to get up and running.
It’s easy to hack, and easy to hack well, since it has plugins and themes that let you customise it without touching the core code. These are easy enough to work with that we had people getting (small) results on the day. More on that below.
It uses the Digital Object Pattern (DOP) – ie at the heart of Omeka are digital objects called Items with metadata, and attached files.
It has an API which just works, and can add items etc, although there are some complexities, more on which below.
It has lots of built-in ways to ingest data, including (buggy) CSV import and OAI-PMH harvesting.

Bad

There are some annoyances:

The documentation, which at first glance seems fairly comprehensive is actually quite lacking. Examples of the plugin API are incorrect, and the description of the external API are pretty terse and very short on examples (eg they don’t actually give an example of how to use your API key, or the pagination).
The API while complete is quite painful to use if you want to add anything – to add an item with metadata it’s not as simple as saying {“title”: “My title”} or even {“dc:title”: “My Title”} – you have to do an API call to find elements called Title, from the different element sets, then pick one and use that. And copy-pasting someone else’s example is hard: their metadata element 50 may not be the same as yours. That’s nothing a decent API library wouldn’t take care of, the eResearch team is looking for a student who’d like to take the Python API on as a project (and we’ve started improving the Python library).
Very limited access control with no way of restricting who can see what by group.
By default the MYSQL search is set up to only search for 4 letter words or greater, so you can’t search for CO2 or PTA (Parramatta) both of which are in our test data; totally fixable with some tweaking.
Measured against our principles, there’s one clear gap. We want to encourage the use of metadata to embrace linked-data principles and use URIs to identify things, in preference to strings. So while Omeka scores points for shipping with Dublin Core metadata, it loses out for not supporting linked data. If only it let you have a URI as well as a string value for any metadata field!

But maybe it can do Linked Data?

Since the hack day we have some more news on Omeka’s coming linked data support. Patrick from the Omeka Team says on their mailing list:

Hi Peter,

Glad you asked!

The API will use JSON-LD.

The Item Add interface as we’re currently imagining it has three options for each property: text input (like what exists now), internal reference (sorta bringing Item Relations into core, just with a better design), and external URI. The additional details, like using a local label for an external URI sound interesting, and we’ll be thinking about if/how that kind of thing might work.

Properties, too, will be much more LoD-friendly. In addition to Dublin Core, the FOAF, BIBO, and other vocabularies will be available both for expressing properties, and the classes available (analogous to the Item Types currently available).

Changes like this (and more!) are at the heart of the changes to design and infrastructure I mentioned in an earlier response. We hope that the additional time will be worth it to be able to address needs like these!

You can watch the progress at the Omeka S repo: https://github.com/omeka/omeka-s

Thanks,

Patrick

This new version of Omeka (Omeka-S) is due in “The Fall Semester of 2015”, which is North American for late next year, in Spring. Hard to tell from this short post by Patrick, but this looks promising. There are a few different ways that the current version of Omeka may support Linked Data. The best way forward is probably to use the ItemRelations plugin.

But what can we do in the meantime?

The Item Relations plugin desperately needs a new UI element to do lookups as at the moment you need to know the integer ID of the item you want to link to. Michael Lynch and Lloyd Harischandra both looked at various aspects of this problem on the day.
Item Relations don’t show up in the API. But the API is extensible, so that should be doable, should be simple enough to add a resource for item_realations and allow thevocab lookups etc needed to relate things to each other as (essentially) Subject Predicate Object. PT’s been working on this as a spare-time project.
Item Relations doesn’t allow for a text label on the relation or the endpoint, so while you might want to say someone is the dc:creator of a resource, you only see the “Creator” label and the title of the item you link to. What if you wanted to say “Dr Sefton” or “Petiepie” rather than “Peter Sefton” but still link to the same item?

What we did

Slightly doctored photo, either that or Cindy attended twice!

Gerry Devine showed off his “PageMaker” Semantic CMS: Gerry says:

The SemanticPageMaker (temporary name) is an application that allows for the creation of ‘Linked Data’-populated web pages to describe any chosen entity. Web forms are constructed from a pre-defined set of re-usable semantic tags which, when completed, automatically produce RDFa-enabled HTML and a corresponding JSON-LD document. The application thus allows semantically-rich information to be collected and exposed by users with little or no knowledge of semantic web terms.

I have attached some screenshots from my local dev instance as well as an RDFa/html page and a JSON-LD doc that describes the FACE facility (just dummy info at this stage) – note the JSON-LD doesnâ€™t expose all fields (due to duplicated keys)

A test instance is deployed on Heroku (feel free to register and start creating stuff â€“ might need some pointers though in how to do that until I create some help pages):

https://desolate-falls-4138.herokuapp.com/

Github:

https://github.com/gdevine/SemanticPageMaker

This might be the long-lost missing link: a simple semantic CMS which doesn’t try to be a complete semantic stack with ontologies etc, it just allows you to define entities realtions and give each type of entity a URI, and let them relate to each other and to be a good Linked Data citizen providing RDF and JSON data. Perfect for describing research context.

And during the afternoon, Gerry worked on making his CMS able to be used for lookups, so for example if we wanted to link an Omeka item to a facility at HIE we’d be able to do that via a lookup. We’re looking at building on work, the Fill My List (FML) project started by a team from Open Repositories 2014 on a universal URI lookup service with a consitent API for different sources of truth. Since the tools-day Lloyd has installed a UWS copy of FML so we can start experimenting with it with our family of repositories and research contexts.

Lloyd and Michael both worked on metadata lookups. Michael got a proof-of-concept UI going so that a user can use auto-complete to find Items rather than having to copy IDs. Lloyd got some autocomplete happening via a lookup to Orcid via FML.

PT and Jacqueline chatted about rich semantically-linked data-sets like the Dictionary of Sydney. In preparation for the workshop, PT tried taking the data from the Journey to Horseshoe Bend project, which is in a similar format to the Dictionary, putting it in a spreadsheet with multiple worksheets and importing it via a very dodgy Python Script.

Peter Bugeia investigated how environmental-science data would look in Omeka, by playing with the API to pump in data from the HIEv repository.

Sharyn and Andrew tried to hack together a simple plugin. Challenge: see if we can write a plugin which will detect YouTube links in metadata and embed a YouTube player (as a test case for a more general type of plugin that can show web previews of lots of different kinds of data). They got their hack to the “Hello World, I managed to get something on the screen” stage in 45 minutes, which is encouraging.

Jake looked at map-embedding: we had some sample data from UWS of KMZ (compressed Google-map-layers for UWS campuses), we wondered if it would be possible to show map data inline in an item page. Jake made some progress on this – the blocker isn’t Omeka it was finding a good way to do the map embedding.

Cindy continued the work she’s been doing with Jake on the Intersect press-button Omeka deployment. They’re using something called Snap Deploy and Ansible.

Jake says:

Through our Snapdeploy service Intersect are planning to offer researchers the ability to deploy their own instance of OMEKA with just a click of a button, with no IT knowledge required. All you need is an AAF log in and Snapdeploy will handle the creation of your NeCTAR Cloud VM and the deployment of OMEKA to that VM for you. We are currently in the beginning stages of adapting the Snapdeploy service to facilitate an Omeka setup and hope to offer it soon. We would also like feedback from you as researchers to let us know if there are any Omeka plug-ins that you think we could include as part of our standard deployment process that would be universally useful to the research community, so that we can ensure our Omeka product offers the functionality that researchers actually need.

David explored the API using an obscure long forgotten programming language, “Java” we think he called it and reported on the difficulty of grasping it.

What would an Omeka service look like?

If we wanted to offer this at UWS or beyond as well as use it for projects beyond the DH sphere, what would a supported service look like?

To make a sustainable service, we’d want to:

Work out how to provide robust hosting with an optimal number of small Omeka servers per host (is it one? is it ten?).
Come up with a generic data management plan: “We’ll host this for you for 12 months. After which if we don’t come to a new arrangement your site will be archived and given a DOI and the web site turned off”. Or something.

Is Omeka aDORAble by Peter Sefton, Andrew Leahy, Gerry Devine, Jake Farrell is licensed under a Creative Commons Attribution 4.0 International License.

Is HIEv aDORAble?

Posted on 2014-09-04 by ptsefton

[Update 2014-09-04: added a definition of DORA]

This week we held another of our tool/hack days at UWS eResearch. This time it was at the Hawkesbury Campus, with Gerry Devine, the data manager for the Hawkesbury Institute for the Environment. This week the tool in question is the DIVER product (AKA DC21 and HIEv).

Where did Intersect DIVER come from?

Intersect DIVER was originally developed by Intersect in 2012 for the University of Western Sydney’s Hawkesbury Institute for the Environment as a means to automatically capture and secure time series and other data from the Institute’s extensive field-based facilities and experiments. Called “the HIEv”, HIE has adopted Intersect DIVER as the Institute’s primary data capture application for Institute data. For more information, see here. http://intersect.org.au/content/intersect-diver

We wanted to evaluate DIVER against our Principles for eResearch software with a view to using it as a generic DORA working data repository.

Hang on! A DORA? What’s that?

DORA is a term coined by UWS eResearch Analyst David Clarke for a generic Digital Object Repository for Academe (yes, Fedora‘s an example of the species). We expressed it thusly in our principles:

At the core of eResearch practice is keeping data safe (remember: No Data Without Metadata). Different classes of data are safest in different homes, but ideally each data set or item should live in a repository

It can be given a URI

It can be retrieved/accessed via a URI by those who should be allowed to see it, and not by those who should not

There are plans in place to make sure the URI resolves to something useful as long is it is likely to be needed (which may be "as long as possible").

DORA Diagram

The DIVER software is running at HIE, with more than 50 "happy scientists" (as Gerry puts it) using it to manage the research data files, including those automatically deposited from the major research facility equipment.

HIEv Shot

So, what’s the verdict?

Is DIVER a good generic DORA?

The DIVER data model is based entirely on files, which is quite a different approach from CKAN, which we looked at a few weeks ago, or Omeka, which we’re going to look at in a fortnight’s time which both use a ‘digital object’ model where an object has metadata with zero or more files.

DIVER does many things right:

It has metadata, so there’s No Data without Metadata (but with some limitations, see below)
It has API access to for all the main functionality , so researchers doing reproducible research can build recipes to fetch and put data, run models and so on from their language of choice.
The API works well out of the box with hardly any fuss.
It makes some use of URIs as names for things in the data packages it produces, so that published data packages do use URIs to describe the research context.
It can extract metadata from some files and make it searchable.

But there are some issues that would need to be looked at for deploying DIVER into new places:

The metadata model in DIVER is complicated – it has several different, non-standard, ways to represent metadata, most of which are not configurable or extensible, and a lot of the metadata is not currently searchable.
DIVER has two configurable ‘levels’ of metadata that automatically group files together. At HIE they are "Facility" and "Experiment". There’s no extensible metadata per-installation; like CKAN’s simple generic name/value user-addable metadata. This is the only major configuration change you can make to customise an installation. This is a very common issue with this kind of software, no matter how many levels of hierarchy there are a case will come along that breaks the built-in model.

In my opinion the solution is not to put this kind of contextual stuff into repository software at all. Gerry Devine and I have been trying to address this by working out ways to separate out descriptions of research context from the repository, so the repository can worry only about keeping well-described content and the research context is described by a human-and-machine-readable website, ontology or database as appropriate; with whatever structure the researchers need to describe what they’re doing. Actually Gerry is doing all the work, building a new semantic CMS app that can describe research context independently of other eResearch apps.
There are a couple of hard-wired file preview functions (for images) and derived files (OCR and speech recognition) but no plugin system for adding new ones, so any new deployment that needed new derived file types would need a customisation budget.
The only data format from which DIVER can extract metadata is the proprietary TOA5 format owned by the company that produces the institute’s data-loggers. NETCDF would be more useful.
There are some user interface issues to address, such as making the default page for a data-file more compact.

Conclusion

There is a small community for the open source DIVER product, with two deployments, using it for very different kinds of research data. To date the DIVER community doesn’t have an agreed roadmap for where it might be heading and how the issues above might be addressed.

So at this stage I think it is suitable for re-deployment only into research environments which closely resemble HIE, probably including the same kinds of data-logger (I haven’t seen the other installation so can’t comment on that). It might be possible to develop DIVER into a more generic product, but there is no obvious business case for that at the moment over adapting a more widely adopted, more generic application. I think the way forward is for the current user-communities (of which I consider myself a member) to consider the benefits of incremental change, towards a more generic solution as they maintain and enhance the existing deployments, balancing local feature development over the potential benefits of attracting a broader community of users.

And another thing …

We discovered some holes in our end-to-end workflow for publishing data from HIEv to our Institutional Data Repository, and some gaps in the systems documentation, which we’re addressing as a matter of urgency.

Is HIEv aDORAble? by Peter Sefton is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

eResearch manager’s report 2014-07-28

Posted on 2014-07-28 by ptsefton

Introduction

Since the last meeting of the UWS eResearch Committee on May 22nd we have updated the eResearch roadmap to reflect where we are in relation to the plan as it was set out at the beginning of 2014.

In June I attended the Open Repositories conference and a couple of other events to do with open access to publications and data, including organising an open-data publications text-mining hackfest in Edinburgh.

Looking to the future, the eResearch team has been involved in two internal funding bids in the last week:

Research Portal 2 (P2): to develop a joined up research presence for the university, like the Research hub projects at Griffith and JCU.
More end-to end data management via more support for the AAAA data management program we’re already running.

UWS Events – Research Bazaar

Now that UWS has all our staff positions filled, we’re making a big push to do more outreach to researchers via a number of channels, including visiting departmental meetings and research forums, along with attempting to run as many eResearch-relevant training events as we can get takers for. This is all done with the help of the eResearch Communications Working Group chaired by Susan Robbins from the UWS library.

To build eResearch capability we’re trying out Research Bazaar approach, which started in Melbourne with Steve Manos and David Flanders.

What exactly, might you ask, is the ‘Research Bazaar’ aka “ResBaz”? #ResBaz is, first and foremost, a campaign to empower researchers in the use of the University’s core IT services:

Empowering researchers to collaborate with one another through the use of research apps on our cloud services.

Empowering researchers to share the data with trusted partners via our data services.

Empowering researchers to establish their reputation through our parallel computing and supercomputing services.

Empowering researchers to invent new ways of experimenting through our emerging technology services.

Our eResearch partners Intersect are helping with this; they offer a number of Learning and Development courses, and we’re talking to them about developing and importing more.

Speaking of importing eResearch training expertise, we ran the first of a series of Research Bazaar events: Mapping for the Digital Humanities powered Melbourne eResearcharians Steve Bennet and Fiona Tweedie.

Right at the beginning of July Alveo, the virtual laboratory for Communications Science was launched by the NSW chief scientist Mary O’Kane and UWS vice-Chancellor Barney Glover with a two-day event, starting with a hackfest day to generate ideas and interest, promote use of the lab and provide some hands on training. While we didn’t brand this as a Research Bazaar activity is certainly in the #resbaz spirit.

Projects

DC21/HIEv Wraps up

The HIEv project, née DC21 is now completed and HIEv has about 50 regular users at HIE. Thanks to Peter Bugeia at Intersect for project managing the final stages of the rollout and Gerry Devine, HIE data manager for promoting the software, and putting it to good use to build dashboards etc.

New features include:

Log in using your account at any.university.edu.au using the Australian Access Federation.
Share data securely with a research cohort until you’re ready to publish it to the world for re-use and citation.

New: Major Open Data Collection for the humanities

Our latest project, the Major Open Data Collections project funded by the Australian National Data Service is in the establishment phase:

Carmi Cronje is working with the ITS Project Management Office to establish the project and its various steering committees, boards etc.
The key staff member for the project, the data librarian has been appointed. Katrina Trewin, currently working in the UWS Library joins us on August 4th.

Adelta is nearly finished

The Adelta project is nearing completion, with users now testing the service:

User Interface work by Intersect is nearly done, pending some discussions with the Library about accessibility requirements.
Final bug fixes and tweaks are being applied, as per this milestone.
We are working with Sydney development company hol.ly to integrate the service with the Design And Art Online database, so that we have a true linked-data approach, with Adelta authors being identified using DAAO URIs. This builds upon one of the Developer Competition entries from Open Repositories 2014 – the Fill My List URI lookup service.

Wonderama

Andrew Leahy consulted for the Google Atmosphere event (Tue July 22) at the Australian Technology Park, Eveleigh. This was a Wonderama demonstration in collaboration with NGIS www.ngis.com.au, showcasing some of the NSW state government data hosted with Google’s geo platform.

Cr8it project rolls on

Cr8it is a collaboration between Newcastle, Intersect and UWS to build an application which live in a dropbox-like file Share Sync See service, so that people can move their research data from being sets of files, to well-described data collections in a repository.

User testing has started on parts of the software to do with selecting, and managing files.
Recent development work has been focussing on re-factoring the application to make it more testable, and easier to build on, once this is done we’re on the home straight to hook it up to the Research Data Repositories at UWS and Newcastle and start publishing data.

We are now seeing a lot of uptake of Cloudstor+, the AARNeT researcher-ready version of ownCloud, on which we are planning to put Cr8it from UWS users, for example Andrew Leahy reports that a few users a week are adopting it at his suggestion.

AAAA data management

Project to establish data management practices and infrastructure in the BENS group and the Structures Lab at IIE are continuing and we are developing new AAAA projects to start soon.

Meet DORA

New eResearch Analyst David Clarke has coined the term DORA: Digital Object Repository for Academe, a name for a generic service-oriented component for storing research data, which adheres to a set of eResearch principles David and the rest of the team are working on. We are currently evaluating software against the ideal DORA model. David’s happy to talk to you about this, as he has an Open-DORA policy ☺.

eResearch manager’s report 2014-7-28 by Peter Sefton is licensed under a Creative Commons Attribution 4.0 International License.

Internal update: UWS eResearch roadmap 2014 Q3 & 4

Posted on 2014-07-28 by ptsefton

About this document

This is the mid-year revision of University of Western Sydney eResearch team roadmap for 2014. This document will be consulted at the eResearch committee and working-group meetings to track progress throughout the year.

Summary

The timelines below have traffic-light colours to show progress. Green means things are going according to plan. Yellow means there have been delays or setbacks but these are being managed and monitored. Red means targets were not met. The main ‘red’ area is the Open Access policy – a draft has been developed and has received support from the eResearch committee and DVCR&D, is undergoing review in the office of the DVCR&D.

Assumptions

This plan assumes the current level of staffing and resources for the eResearch team and does not make any assumptions about further project funding, apart from the ANDS Major Collections project, which is in its initiation phase.

Vision

The eResearch team vision statement:

Support the objectives of the UWS research plan by creating an eResearch Ready UWS, where information and communications technologies support the collaborative conduct of high-impact, high-integrity research with minimal geographical and organisational constraints. eResearch will assist in the transition to a research culture where IT and communications technologies are integral to all research, from the fundamental underpinnings of data acquisition and creation, management and archiving, to analytical and methodological processes. Our aim is to work with stakeholders within and beyond the university to ensure UWS researchers have the information and communications technology resources, infrastructure, support and skills required, wherever they are on the path to an eResearch ready UWS.

How does this fit with the UWS research plan?

The eResearch plan is aligned with and supports the UWS Research plan. (Note this plan is now obsolete, a new one is coming with a greater emphasis on impact and community engagement and broadening research income beyond competitive grant income).

Objectives 1-3

Objective 1 – Increase external research income to the University
Objective 2 – Increase the number of fields of research at UWS operating above or well above world standard
Objective 3 – Increase the number and concentration of funded research partnerships

These objectives depend on UWS having a high-integrity research environment in which the institution will be able to support researchers in meeting their obligations under the Australian Code for the Responsible conduct of Research and funder expectations about data management, which is attractive to researchers, funders and collaborators. Building eResearch infrastructure, via the projects discussed below, and the forthcoming ITS research infrastructure roadmap will help create an environment conducive to successful income generation, and improve support for researchers aiming for high research performance.
During 2014 eResearch will begin replicating the successful roll out of end-to-end data management at HIE by creating small, tightly focused projects with clear success criteria which are aligned to the research goals of the university (Via the AAAA data management project methodology currently in development).
eResearch will continue to work closely with eResearch-intensive groups, for example by supporting phase two of Alveo (formerly HCS vLab) a NeCTAR grant ($1.3M, with a total project budget of ~ $3M) to set-up and implement a Virtual Laboratory for the multiple partners in the project: Above and Beyond Speech, Language and Music: A Virtual Lab for Human Communication Science.

Objective 4 – Ensure UWS attracts and graduates high quality Higher Degree Research (HDR) students to its areas of research strength.

During 2014 eResearch will be implementing programs to support HDR students, along with early-career researchers and the rest of the research community. This includes the establishment of self-supporting eResearch communities via a trial of the University of Melbourne ‘Research Bazaar’ model.

eResearch will work with our eResearch partner, Intersect to start delivering a broad range of eResearch training, building on previous training that has been delivered for High Performance Computing, see Communications and Organisational Development. HDR students will be key to this, as both one of the main audiences for training, and also serving as trainers, promulgating eResearch techniques and mind-set throughout the university.

Resources

Assumed Core Resources

eResearch Manager – Peter Sefton
eResearch Technical Advisor (~0.8 FTE) – Andrew Leahy
eResearch Support Officer / eResearch Analyst – TBA
eResearch Project Implementation Officer / Communications – Cornelia (Carmi) Cronje.
Intersect eResearch Analyst – Peter Bugeia

Other Resources

The resources are from other areas of the university and are financed by that cost centre. They are currently on loan to the eResearch team until October 2014.

Application Developer, ITS
Web Application Developer (provided by ITS – until ITS restructure unfolds)

Associates

The eResearch Associates are employed in key UWS research institutes or schools and work closely with the eResearch team and provide technical expertise to assist researchers.

Gerard Devine – HIE Data Manager –
Jason Ensor – Research Development Officer (Digital Humanities)
Nathan Mckinlay – ICT Professional Officer – IIE
James Wright – Technical Officer in Bioelectronics & Neuroscience – BENS

Funding

The eResearch team has no formal budget separately from the office of the DVCR&D. Recommendation: consolidate remaining project funds into an eResearch projects account to support projects in the eResearch portfolio.

Money that’s in the MS23 financial account ~ $22,244.24
RDR budget remaining ~ 100K (subject to confirmation from ITS)

Focus areas

Policy Working Group

The policy working group is chaired by Kerrin Patterson, Associate Director Performance and Quality (Acting), Office of Engagement, Strategy & Quality. The group has identified two priorities:

Establishing an Open Access (OA) policy for both research publications and research data.
Creating a Research Data Management (RDM) policy.

The working group has made substantial progress on the Open Access (OA) policy, and has asked the Manager, eResearch to review the policy framework at UWS, particularly the Research Code before starting on the Research Data Management (RDM) policy. Recent changes to Australian Research Council (ARC) funding rules for Discovery grants, mean this is now a pressing issue for both the OA and RDM policies at UWS:

A11.5.2 Researchers and institutions have an obligation to care for and maintain
research data in accordance with the Australian Code for the Responsible
Conduct of Research (2007). The ARC considers data management
planning an important part of the responsible conduct of research and
strongly encourages the depositing of data arising from a Project in an
appropriate publicly accessible subject and/or institutional repository.

Q1

Q2

Q3

Q4

Open Access Policy

Draft presented to DVCR

Policy adopted

Support DVCR&D in progressing policy thru the UWS process
Revise materials to support the policy, new Powerpoint slide show possible statements from Scott Holmes

See communications working group plan

Research Data Management policy

Review of UWS policy, particularly the Research Code

Review of UWS policy complete

Policy WG finish gap-analysis/comparison of UWS policies
Policy WG recommend whether we need an RDM policy and what its scope should be

Policy working group produce draft of RDM policy and/or updates to related policies

Communications and Organisational Development

The Communications working group is chaired by Susan Robbins, Research Services Coordinator for the UWS library. The following table sets out the broad goals for this area.

During 2014 the eResearch will be working with Intersect to establish an organisational development approach to eResearch under the “Research Bazaar” banner.

	Q1	Q2	Q3	Q4
Communications plans	Generic matrix to be used for eResearch messaging	Implement for eResearch website	Communications WG publish updated plan eResearch publish an events calendar	As directed by comms WG
Awareness campaign for OA policy	Launch of some sort?	Web pages published Webinars and face to face briefings	Publish web pages about the policy on main site Set up calendar for webinars and other outreach	Library to run OA promotion campaign to get more deposits ORS to include comms about OA in research lifecycle touchpoints
Capability- Building in research groups*	Planning	Produce training resources and communicate they exist?	Run 1 #ResBaz* workshop from Melb.* Book in 2 Intersect courses	1 event run at each of HIE, DHRC, MARCS Trial 1 Software Carpentry
Alignment of eResearch with research lifecycle	Planning / development	Two diagrams HDR and Researchers	Produce draft of lifecycle Get feedback on draft from stakeholders (lib, ORS, eResearch, researchers) Physical posters for use by key stakeholders	Publish lifecycle on eResearch website Integrate lifecycle into stakeholder websites
Dissemination: Conference presentations, journal articles, The Conversation etc	Identify potential topics and co-authors	Contact collaborators and commence writing online opinion pieces, blog posts etc. Submit conference abstracts	Open Repositories	eResearch Australasia Facilitate BOF session
eResearch included in Research Training agenda and materials	Planning	Plan established with ORS	Plan with ORS (Mary Krone, Luc Small)	As per plan
Work with Intersect on establishing Research Bazaar	Planning	Run as many existing Intersect courses as possible/relevant. Initial pilot of Melbourne Uni courses	Run existing Intersect courses. Expanded pilot of Melbourne Uni courses Software carpentry	Research Bazaar established, program to be maintained jointly by Intersect and eResearch team
Wonderama internal & organisational dev	Developing Wonderama as platform for the Digital Humanities and the Project For Western Sydney outreach and consulting. Developing a consulting/business model ($$)
		Google Summer of Code
	PX students UWS Solar Racer
		CompSci Advanced projects?
Wonderama external and outreach activities ($$) = paid gig	UWS HiTech Fest (Careers market)	iFly Downunder launch at Panthers (indoor skydiving) ($$) CeBIT conference (SCEM to sponsor?)	Google Atmosphere ($$)	TBD

** #resbaz = Research Bazaar

Measures of success

*Capability building: Count number of figures/tables/citations/programs in publications/theses produced using workshop tools and/or programming languages.

eResearch Projects

The following table lists projects which report to the eResearch Projects Working Group committee. This table shows the broad project stage for each project over the year, a separate schedule/dashboard which will be presented to the eResearch Projects committee will show detailed targets for each.

	Q1	Q2	Q3	Q4
Adelta	Phase 1 finished	Discuss library hosting of Adelta	Possible Integration into Library search box for greater discoverability HI Sandra,	Google analytics to measure use
Cr8it core app	Negotiate sustainable support offer from Intersect/AARNET	Start of trials	Implementation	Realisation
ANDS Major collection	Scoping complete	Project running	Project running	Project running
AAAA Data Management Projects
HIEv	Realisation	Realisation (Set up reporting of research-focused metrics)	Realisation	Realisation
IIE Structures Lab	Planning, initiation	Implementation	Realisation	Realisation
MARCS BENS	Planning, initiation	Implementation	Realisation	Realisation
To Be Advised (Digital Humanities)		Planning	Initiation	Implementation
To Be Advised (something sciency)		Planning	Initiation	Implementation
Establish “AA” data management for facilities (Acquire & Archive)
AMCF (SEM+)
SIMS	Planning	Implementation	Realisation
NGS (Sequencing)	Planning	Implementation	Realisation
BMRF (NMR)			Planning	Implementation
MSF (MassSpec)			Planning	Implementation
CBIF (Confocal)			Planning	Implementation

AAAA projects: measures of success

Each AAAA data management project will be measured with a variety of metrics. Targets will be agreed with the project stakeholders both at project initiation and in the realisation phase and maintained in a separate AAAA dashboard. These metrics are designed to show not just raw use of the AAAA methodology in terms of users or data sets (both of which are gameable metrics) but to focus on the effect of the AAAA program on research performance and ‘eResearch readiness’.

R#	Number of researchers who have been inducted/trained and have access to AAAA infrastructure
DAR	Datasets Archived in RDR
ACD	Total # of articles in UWS publications repository citing datasets in RDR (including via repository metadata)
IDMP	Institute or research-cohort Data Management Plan(s) in place
GRDMP	Number and value of current grants which reference formal data management plans

Infrastructure Working Group

Infrastructure planning is in discussion with ITS Strategy. A technology roadmap is being produced with the ITS Roadmap Builder Tool. This will be published as a separate plan.

Intersect Engagement

The relationship between Intersect and UWS is covered by a member engagement plan (in development for 2014).

eResearch Team organisational Development

Capability	Q1	Q2	Q3	Q4
eResearch tool awareness	Team familiarity with data capture applications (eg): CKAN MyTardis	“Notebook programming” Rstudio Python Notebooks ShaderToy	Academic authoring tools: LaTex, Markdown, Pandoc, EPUB etc	TBA
Communications		Visual comms/ whiteboard training	TBA	TBA
Software development		eResearch tech people to attend workshop in one language*	eResearch tech people to attend workshop in one language	Team familiarity with modern programming principles and environments**
Conferences	Australasian Digital Humanities (Perth)	Open Repositories (Helsinki) Google I/O (SF)		eResearch Australasia (Melb) Google Open Source Summit (SF) OzViz workshop (Bris?)

Metrics

*Certificate in Software Carpentry (Python/R)
**Team members to complete one MOOC or otherwise demonstrate professional development

First Research Bazaar event at UWS, Mapping for humanities

Posted on 2014-07-28 by ptsefton

The eResearch team, just finished running a two day session on mapping tools for the humanities, delivered by visiting trainers from the University of Melbourne eResearch team, under the Research Bazaar #resbaz umbrella. Resbaz is about enabling communities of practice for eResearch, rather than building expensive centralized support. We had lots of positive feedback from participants, and a good vibe; you know it’s working when people sit at the computers and keep playing well after the lunch has arrived.

The session served-up two main packages:

CartoDB – a nice online tool for map building – putting (fancy) dots on online maps. See the slides. CartoDB is available as a paid service, but stay tuned for a version that’s free for researchers.edu.au.
Tilemill, a more comprehensive tool for making publication quality print and online maps (available as a desktop app).

More workshops coming soon – see these offerings from Intersect, our eResearch partner. The Open Refine course in particular is really useful for anyone who deals with spreadsheet or table data.

5 August 2014 Cleaning & exploring your data with Open Refine at UWS.
5 August 2014: Data Visualisation with Google Fusion Tables at UWS.

We don’t have all the results in from the official feedback survey yet, but the verbal feedback was positive from the participants. One thing we’d like to look at for future #resbaz training is making sure we add a little dash of data management and consideration of the end-to-end research process to each workshop.

Depending on the course, take the time at the start to set people up with Cloudstor+ storage, a git repository or another appropriate management system for working data and a place to publish results, maybe github, maybe figshare, or a discipline specific or institutional repository.
Keep online notes, maybe using one of the online lab/research notebook platforms – (we’re watching Egon Willighagen’s ongoing review of these systems attentively – please keep it up Egon!).
At the end of the workshop, publish something – in the case of the maps it would be good to actually work though the process of getting a good print or web version of the map, and making sure all the data and code used to create it are saved and published.
Oh, and I’d love to be able to offer a prize for the first published map in an article or submitted thesis to come out of the workshop.

First Research Bazaar event at UWS, Mapping for humanities by Peter Sefton is licensed under a Creative Commons Attribution 4.0 International License.

Trip report: Peter Sefton @ Open Repositories 2014, Helsinki, Finland

Posted on 2014-07-04 by ptsefton

Trip report: Peter Sefton @ Open Repositories 2014, Helsinki, Finland by Peter Sefton is licensed under a Creative Commons Attribution 4.0 International License.

From June 9th-13 ^th I attended the Open Repositories conference way up North in Helsinki. This year I was not only on the main committee for the conference, but was part of a new extension to the Program Committee, overseeing the Developer Challenge event, which has been part of the conference since OR2008 in Southampton . I think the dev challenge went reasonably well, but probably requires a re-think for future conferences, more on that below.

In this too-long-you-probably-won’t read post I’ll run through a few highlights around the conference theme, the keynote and the dev event.

Summary: For me the take-away was that now we have a repository ecosystem developing, and the OR catchment extends further and further beyond the library, sustainability is the big issue , and conversations around sustainability of research data repositories in particular are going to be key to the next few iterations of this conference. Sustainability might make a good theme or sub-theme. Related to sustainability is risk; how do we reduce the risk of the data equivalent of the serials crisis if there is such a crisis it won’t look the same, so how we will stop it?

View from the conference dinner

Keynote

The keynote this time was excellent. Neuroscientist Erin McKiernan from Mexico gave an impassioned and informed view of the importance of Open Access: Culture change in academia: Making sharing the new norm (McKiernan, 2014). Working in Latin America McKiernan could talk first-hand about how the scholarly communications system we have now disadvantages all but the wealthiest countries.

There was a brief flurry of controversy on Twitter over a question I asked about the risks associated with commercially owned parts of the scholarly infrastructure and how we can manage those risks. I did state that I thought that Figshare was owned by McMillan’s Digital Science, but was corrected by Mark Hahnel; Digital Science is an investor, so I guess “it is one of the owners” rather than “owns”. Anyway, my question was misheard as something along the lines of “How can you love Figshare so much when you hate Nature and they’re owned by the same company”. That’s not what I meant to say, but before I try to make my point again in a more considered way, some context.

McKiernan had shown a slide like this:

My pledge to be open

I will not edit, review, or work for closed access journals.

I will blog my work and post preprints, when possible.

I will publish only in open access journals.

I will not publish in Cell, Nature, or Science.

I will pull my name off a paper if coauthors refuse to be open.

If I am going to ‘make it’ in science, it has to be on terms I can live with.

Good stuff! If everyone did this, the Scholarly Communications process would be forced to rationalize itself much more quickly than is currently happening and we could skip the endless debates about the “Green Road” and the “Gold Road” and the “Fools Gold Road”. It’s tragic we’re still debating in this using this weird colour-coded-speak twenty years in to the O A movement .

Anyway, note the mention of Nature .

What I was trying to ask was: How can we make sure that McKiernan doesn’t find herself, in twenty years time, with a slide that says:

“I will not put my data in Figshare”.

That is, how do we make sure we don’t make the same mistake we made with scholarly publishing? You know, where academics write and review articles, often give up copyright in the publishing process, and collectively we end up paying way over the odds for a toxic mixture of rental subscriptions and author-pays open-access, with some risk the publisher will ‘forget’ to make stuff open.

I don’t have any particular problem with Figshare as it is now, in fact I’m promoting its use at my University, and working with the team here on being able to post data to it from our Cr8it data publishing app . All I’m saying is that we must remain vigilant. The publishing industry has managed to transform itself under our noses from: much needed distribution service of tangible goods ; to rental service where we get access to The Literature pretty-much only if we keep paying ; to its new position as The custodian of The Literature for All Time , usurping libraries as the place we keep our stuff.

We need to make sure that the appealing free puppy offered by the friendly people at Figshare doesn’t grow into a vicious dog that mauls our children or eats up the research budget.

So, remember, Figshare is not just for Christmas.

Disclosure: After the keynote, I was invited to an excellent Thai dinner by the Figshare team, along with Erin and a couple of other conference-goers. Thanks for the Salmon and the wine, Mark and the Figshare investors. I also snaffled a few T-Shirts from a later event ( Disruption In The Publishing Industry: Digital, Analytics & The Future ) to give to people back home.

Figshare founder and CEO Mark Hahnel (right) and product manager Chris George hanging out at the conference dinner

Conference Theme, leading to discussions about sustainability

The conference theme was Towards Repository Ecosystems .

Repository systems are but one part of the ecosystem in 21st century research, and it is increasingly clear that no single repository will serve as the sole resource for its community. How can repositories best be positioned to offer complementary services in a network that includes research data management systems, institutional and discipline repositories, publishers, and the open Web? When should service providers build to fill identified niches, and where should they connect with related services? How might these networks offer services to support organizations that lack the resources to build their own, or researchers seeking to optimize their domain workflows?

Even if I say so myself, the presentation I delivered for the Alveo project (co-authored with others on the team) was highly theme-appropriate; it was all about researcher-needs driving the creation of a repository service as the hub of a Virtual Research Environment, where the repository part is important but it’s not the whole point .

I had trouble getting to see many papers, given the dev-wrangling, but there was definitely a lot of eco-system-ish work going on, as reported by Jon Dunn :

Many sessions addressed how digital repositories can fit into a larger ecosystem of research and digital information. A panel on ORCID implementation experiences showed how this technology could be used to tie publications and data in repositories to institutional identity and access management systems, researcher profiles, current research information systems, and dissertation submission workflows; similar discussions took place around DOIs and other identifiers. Other sessions addressed the role of institutional repositories beyond traditional research outputs to address needs in teaching and learning and administrative settings and issues of interoperability and aggregation among content in multiple repositories and other systems .

One session I did catch (and not just ‘cos I was chairing it) had a presentation by Adam Field and Patrick McSweeney on Micro data repositories: increasing the value of research on the web (Field and McSweeney, 2014). This has direct application to what we need to do in eResearch, Adam reported on their experience setting up bespoke repository systems for individual research projects, with a key ingredient missing in a lot of such systems; maintenance and support from central IT. We’re trying to do something similar at the University of Western Sydney, replicating the success of a working-data repository at one of our institutes ( reported at OR2013 ) across the rest of the university, I’ll talk more to Adam and Patrick about this.

For me the most important conversation at the conference was around sustainability. We are seeing more research-oriented repositories and Virtual Research Environments like Alveo, and it’s not always clear how these are to be maintained and sustained.

Way back, when OR was mainly about Institutional Publications Repositories (simply called Institutional Repositories, or IRs) we didn’t worry so much about this; the IR typically lived in The Library, the IR was full of documents and The Library already had a mission to keep documents. Therefore the Library can look after the IR. Simple.

But as we move into a world of data repository services there are new challenges:

Data collections are usually bigger than PDF files, many orders of magnitude bigger in fact making it much more of an issue to say “we’ll commit to maintaining this ever-growing pile of data”:
“There’s no I in data repostory (sic)” – i.e. many data repositories are cross-institutional which means that there is no single institution to sustain a repository and collaboration agreements are needed. This is much, much more complicated that a single library saying “We’ll look after that”.

And as noted above, there are commercial entities like Figshare and Digital Science realizing that they can place themselves right in the centre of this new data-economy. I assume they’re thinking about how to make their paid services an indispensible part of doing research, in the way that journal subscriptions and citation metrics services are, never mind the conflict of interest inherent in the same organization running both.

Some libraries are stepping up and offering data services, for example, work between large US libraries.

The dinner venue

The developer challenge

This year we had a decent range of entries for the dev challenge, after a fair bit of tweeting and some friendly matchmaking by yours truly. This is the third time we’ve run the thing a clearly articulated set of values about what we’re trying to achieve .

All the entrants are listed here, with the winners noted in-line. I won’t repeat them all here, but wanted to comment on a couple.

The people’s choice winner was a collaboration between a person with an idea, Kara Van Malssen from AV Preserve in NY, and a developer from the University of Queensland, Cameron Green, to build a tool to check up on the (surprisingly) varied results given by video characterization software . This team personified the goals of the challenge, creating a new network, while scratching an itch, and impressing the conference-goers who gathered with beer and cider to watch the spectacle of ten five-minute pitches.

My personal favorite came from an idea that I pitched (see the ideas page ) was the Fill My List framework, which is a start on the idea of a ‘ Universal Linked Data metadata lookup/autocomplete ’. We’re actually picking up this code and using it at UWS. So while the goal of the challenge is not to get free software development for the organizers that happened in this case (yes, this conflict of interest was declared at the judging table). Again this was a cross-institutional team (some of whom had worked together and some of whom had not). It was nice that two of the participants, Claire Knowles of Edinburgh and Kim Shepard of Auckland Uni were able to attend a later event on my trip at a hackfest in Edinburgh . There’s a github page with links to demos .

But, there’s a problem. The challenge seems to be increasingly hard work to run, with fewer entries arising spontaneously at recent events. I talked this over with members of the committee and others. There seem to be a range of factors:

The conference may just be more interesting to a developer audience than it used to be. Earlier iterations had a lot more content in the main sessions about ‘what is a(n) (institutional) repository’ and ‘how do I promote my repository and recruit content’ whereas now we see quite detailed technical stuff more often.
Developers are often heavily involved in the pre-conference workshops leaving no time to attend a hack day to kick of the conference.
Travel budgets are tighter so if developers do end up being the ones sent they’re expected to pay attention and take notes.

I’m going to be a lot less involved in the OR committee etc next year, as I will be focusing on helping out with Digital Humanities 2015 at UWS. I’m looking forward to seeing what happens next in the evolution of the developer stream at the OR conference. At least it’s not a clash.

The Open Repositories Conference (OR2015) will take place in Indianapolis, Indiana, USA at the Hyatt Regency from June 8-11, 2015. The conference is being jointly hosted by Indiana University Libraries , University of Illinois Urbana-Champaign Library , and Virginia Tech University Libraries .

This pic got a few retweets

References

Field, A., and McSweeney, P. (2014). Micro data repositories: increasing the value of research on the web. http://eprints.soton.ac.uk/364266/.

McKiernan, E. (2014). Culture change in academia: Making sharing the new norm. http://figshare.com/articles/Culture_change_in_academia_Making_sharing_the_new_norm_/1053008.

[This document is a lightly-edited version of an approved project proposal written by staff at the University of Western Sydney for the Australian National Data Service (ANDS) metadata stores funding stream – we are publishing it here to assist in collaborating with other universities on their Metadata Stores projects. Some ANDS boilerplate text and financial information have been removed, and links added to materials that add context.]

ANDS Project Description

for

Enterprise Research Data Catalogue

ANDS Project Code: MS23

Document Version 1.0

Prepared by Peter Sefton and Peter Bugeia

University of Western Sydney

6/12/2011

Project Description

Organisation responsible for the project (Subcontractor)	University of Western Sydney
Organisation that will undertake the work (Sub-Subcontractor)
ABN or ACN	530 140 698 81
Name of Contact Person	Peter Sefton
Complete address and contact details of Contact Person	eResearch Capability Team Office of the Pro Vice Chancellor (Research) Academic and Research Division University of Western Sydney Campus : Penrith (Werrington North) Building : AD Room : AD.G.15 Locked Bag 1797 South Penrith NSW 2751 T: 61 2 4736 0072 F: 61 2 4736 0905 p.sefton@uws.edu.au
ANDS Program	Metadata Stores
Project Summary	This project adheres to NCRIS funding requirements. Funded activities are limited to: installation, configuration and testing of software; manual creation of metadata (beyond that required for software specification and testing); scoping exercises or studies in the amount of research data available at an institution. The project does not use NCRIS funds for the following activities: purchasing of IT hardware for storage or any other purpose; ongoing staffing; “proof of concept” software development; funding of work by parties based outside Australia. Any software development will be made available as open source.
Funding Sought	<removed>
Proposed project timeframe	10 months
Name of the person responsible for contract administration	<removed>
Names and affiliations of all collaborators if any	University of Newcastle – Vicki Picasso. Other collaborators will be identified during the course of the project.

Background

The University of Western Sydney is undertaking the early stages of an internally funded project to establish a Research Data Repository [link added] (RDR) and associated infrastructure to support it. This project is being led by the eResearch Unit with the participation of IT, the Library and the Office of Research Services. The repository will consist of:

scalable, managed file storage for both working and archived data;
access to virtualized computing infrastructure so that researchers can run data analysis tasks;
a research data catalogue containing metadata about data at a collection level for code-compliance, strategic research management and discovery purposes.

The storage component of the RDR was established in 2010. The next steps are to design the architecture that links the storage to computing infrastructure and cataloguing applications. This architectural work will be undertaken by the eResearch Unit, IT, and the University Library.

UWS has a nascent research data catalogue which is being established under ANDS project SC20.

Throughout this document the ‘metadata store’ for research data will be referred to as the ‘Research Data Catalogue’ to emphasise its role in the institution using a term that should be understandable to all stakeholders.

2. Aims and Objectives

*Alignment with ANDS Objective*	*already*	*to be*	no
To manage metadata about data collections held at the institution	(some progress on SC20)	X
To enable discovery and reuse of data collections held at the institution		X
To support strategic planning for research in the institution		X
To ensure high quality metadata		X

Overview of project

The proposed metadata stores work outlined in this document will contribute to the RDR project by implementing the research data catalogue (metadata store) in the institutional context, establishing data sources for parties and activities from research and library systems, and providing an expanded platform for describing collections.

This will be built into an integrated system for recording catalogue-descriptions of research data collections with a view to it becoming the institutional research data catalogue for the university. There is opportunity for it to be collaboratively built to fulfil a broader set of institutional requirements than just those of the University of Western Sydney’s.

The University has chosen the ReDBox application as the research data catalogue to fulfil functional requirements under SC20. This Metadata Stores project will explore how it can be expanded to be the basis of the University’s institutional research data catalogue, and seek alternative and additional software solutions if necessary. It is proposed to conduct this analysis in concert with other institutions using the same software and/or with similar requirements, so that any software developed or purchased has a broad user base.

Scope and boundaries

The project will focus on the following:

implementation of the core deliverables (D1-D6) suggested by ANDS, as none of these are fully established at UWS,
the establishment of workflows for identifying collections, and
the integration of data management planning into the broader research lifecycle.

The primary driver for this work is to establish a picture at UWS of where research data resides and to establish infrastructure for researchers to be able to store and describe their data for later re-use by themselves, their research teams and students, and more globally. This work will aim to meet UWS requirements for research management and practice as well as the ANDS goal of sharing collection descriptions.

The full scope of the final project will be refined and specified in Deliverable D15, Project Management Plan.

Dependencies

This project depends on the SC20 project to establish the basic application. This is considered low risk as the same application is now in production at both the University of Newcastle and at Flinders University.

Overall Approach

Strategy and methodology

This project will use an agile project methodology for software development tasks and for other tasks such as evaluation of data sources. The exact nature of the project will be developed with the project manager and team and documented in deliverable D15, Project Management Plan.

UWS is aiming to collaborate with other institutions that are using similar software and with similar approaches to research data in general. This will provide an opportunity to work together to specify and deliver new software features which meet a common need. We have identified one partner, the University of Newcastle and will work with them to recruit more.

Technical issues

Some technical issues which have presented themselves in the formative stages of this project include:

The relationship between storage infrastructure and the metadata catalogue and how these should be linked. Some attention will be given to specifying this interface in DC21 and SC20.
The relationship between NLA party IDs, local IDs and the forthcoming ORCID system, and the interfaces to all of these systems. This issue will need to be investigated with ANDS and the ANDS community.

Internal Resources

The exact breakdown of the resources needed for this project is not yet known but it will be lead by the eResearch Unit and will involve library staff in sourcing data collections.

External resources

It is not known at this stage if external resources will be engaged but it is highly likely that if software development is required, expressions of interest will be sought from QCIF (where ReDBox is currently maintained) and Intersect, the NSW eResearch service provider, and possibly via the internal teams of universities partnering in this work.

Stakeholders

The project steering committee will consist of representatives from:

The eResearch unit.
Research Services.
IT
The Library.
Researchers from various disciplines, by invitation, as needed.

4. Project Deliverables

D1	A working feed of records describing Collections and associated Activities, Parties and Services to Research Data Australia, in the current version of RIF-CS (1.3), demonstrated to meet the quality requirements for RIF-CS records as set by ANDS. This feed will contain additional descriptive metadata for newly identified collections, over and above the feed established in SC20 and will be available for use by researchers in an expanded range of discipline areas as per D2. RIF-CS 1.3 support will require an upgrade to ReDBox. The new Research Data Catalogue is expected to import the contents of the SC20 metadata store.
D2	A feed of collections from at least three distinct Faculties (or equivalent organisational units) within the institution to Research Data Australia. UWS is in the process of establishing 5 new flagship research institutes in addition to 10 existing Schools. Priority will be given to collections sourced from the institutes, which represent a broad range of disciplines, under criteria based on those used in SC20. The most established of these include: Hawkesbury Institute for the Environment (Climate Science) Institute for Culture and Society. MARCS Institute for Brain and Behaviour.* Civionics* (Civionics is a discipline concerned with the interface of the use of electronic devices for the monitoring of civil engineering infrastructure) *These are currently research centres in the process of becoming fully-fledged institutes.
D3	Demonstrated alignment of metadata records about Parties with an institutional name authority (HR or Library), with the authoritative form of the name sourced external to the metadata store, and with new researcher descriptions added to the metadata through regular updates from the name authority. Party information will be sourced from the software system used by Research Services for administering UWS research, grants and projects, this will be integrated with the Research Data Catalogue via a name authority system with an automatic update. Party IDs will be minted using the local UWS Handle server.
D4	Demonstrated alignment of metadata records about Parties with the ARDC Party Infrastructure Project, with researcher descriptions contributed to the NLA, and with People Australia identifiers for researchers recorded against researchers. The project will evaluate the different options for feeding data to the NLA , choosing between a feed to ANDS in RIF-CS format or to the NLA, and if the latter, choosing which metadata format to use, either RIF-CS or EAC-CPF. The project will also investigate a solution for importing or aligning local IDs with NLA IDs and how to interoperate with the global ORCID system when it comes online.
D5	Demonstrated alignment of metadata records about Activities with institutional and external sources of truth (Research Office, ARC and NHMRC grant registries), with the authoritative description of the Activity sourced external to the metadata store, and with new researcher project added to the metadata through regular updates from the sources of truth. This deliverable will use the same data sources and processes as D3, with the addition of processes to import globally defined IDs for activities, such as ARC grants, with a process for aligning these with local views of the same data.
D6	Demonstrated workflow for registering new Collections in the university; this can include automated update, or semi-automated (notification-based). This project will explore the following workflows for data collection registration, with the community of ReDBox user-organisations: The existing library-mediated registration process established in SC20 with data-interviews informing curated descriptions. Automated feeds from data capture systems, feeding into template records which have been curated as in the point above by the library. This will be piloted in the DC21 project. A new system that will integrate the process of applying for data storage, and creating a data management plan into a single form, to integrate the process of describing and capturing data into institutional processes. An system that allows researchers to capture and view data in the RDR-managed storage system or on local storage, and to curate it into collections, both by manually selecting items, and by rule (such as a metadata query or by location). This will have a plugin architecture to allow it to be adapted for different disciplines and file types and build on the integration work between DC21 and SC20.
D7	A software system to realise deliverables D1–D6 (and D8, D13–D14 if applicable), with robust storage and management of metadata. The starting point for a software system used will be the one used for implementing SC20, which is the ANDS-funded ReDBox application. We will aim to undertake this work in concert with other institutions and evaluate the most appropriate way to create the new functionality, either by extending ReDBox or by using other systems.

Optional Deliverables

If your institution has already implemented some of the foregoing deliverables at an institutional level, ANDS expects that you will also include some of the following optional deliverables:

Demonstrated ability to manage the following aspects of the collection lifecycle through recording and exposing relevant metadata related to:

D8.1 embargo dates for collections, where applicable
D8.2 current online location of collection (on internal store or external store)
D8.3 current offline location of collection
D8.4 intellectual property rights (licensing, restrictions on reuse)
D8.5 retention policy (disposal date, deposit date)

D8.6 policy framework (data management plan relevant, ethics clearance forms relevant)

Many of these functions are delivered by the ReDBox application out of the box, the implementation will make sure that they are adopted at UWS.

A public researcher or research profile portal, exposing publishable metadata about the research data being held at the institution.

Not a priority.

D10

Demonstrated ability to feed a selected subset of the collection records relating to a particular discipline to a discipline registry, following the metadata schema and conventions of that registry

Not a priority.

D11

Demonstrated ability to manage the following aspects of the collection lifecycle through recording and exposing relevant metadata:

citation requirements (authoritative identifiers, including DOI, preferred citation format)

citation tracking of collections

audit information (refer to publications audit)

proprietary tools and formats used in collecting the collection

Not a priority.

D12

Strategic reporting on contents and coverage of metadata store for internal use

This is a key area for informing the establishment of a Research Data Repository and the organisational cultural environment in which it will exist. This project will aim to produce reports that can be used to track the growth of the RDR, via the Research Data Catalogue.

D13

Storage and exposure for discovery of object level metadata, and alignment of object level metadata with collection metadata (i.e. ability to navigate from object metadata to collection metadata; update of object metadata aligned with update of collection metadata)

Not a priority.

D14

Storage and management of technical metadata for object and collection reuse, including software and equipment descriptions, methodology, and data interpretation

Not a priority.

Procedural Deliverables

D15	Project Management Plan, using the ANDS template, specifying the details of the planned activity, with risks, schedules, etc
D16	Progress Reports, using ANDS templates
D17	Final Report, using ANDS templates
D18	Deposit of any software (including stylesheets and schemata) developed in the project for achieving other deliverables, and that can be (usefully) used outside the institution, in either Google Code or SourceForge, including: a Google code comment and tag or SourceForge summary and tag containing the text “ANDS-funded” Developer manuals where applicable, to facilitate reuse Deployment manuals to facilitate external deployment User manuals to facilitate use
D19	A source code report, if any software is developed and publicly deposited under D17
D20	A User Acceptance Test online survey

5. Assumptions, Constraints, Dependencies and Risks

	Assumptions	Constraints	Dependencies	Risks*
Staffing	UWS will be able to provide staff to inform the project and recruit a project manager.	The usual constraints of working in a university.	This project depends on the RDR project, which is not yet established, but does have a budget.	Project management and data librarian staff can not be sourced.
Organisational	The RDR project will continue to develop, and storage will be available to researchers via some kind of easy-to-use application process.	UWS project management and governance processes must be followed.	This depends on the ITS budget.	RDR storage does not come online.
Technical		The scope of the technical work is yet to be established – there are no indications that insurmountable challenges will arise.
External Suppliers	Software development can be sourced from QCIF or Intersect
Legal/Ethical
Other	Researchers have limited time to participate.	Early work on SC20 is finding that sourcing data collections is difficult		Collections will be hard to source. (Mitigation: try to provide services that are of high value to researchers and collect metadata as a gateway to their provision (eg the process of filling out applications for storage).

* – Where Risks have been identified, briefly outline your mitigation strategy.

6. Stakeholder Analysis

Stakeholder	Interest / stake	Importance
eResearch Unit	Lead agency	High
Library	Business owner for the Research Data Catalogue – operational responsibility for data curation.	High
Research Services	Custodians of the ancillary data about parties and activities which support the RDC.	High
Information Technology Services	Implementer / supplier of storage infrastructure and environment for the RDR	High

7. Project Management

Project Team, Roles and Responsibilities

Role	% EFT	Responsibilities	Recruitment required? (yes/no)	In-kind contribution or ANDS funded?
Project Manager	50	Deliver the project to ANDS expectations. Assume responsibility and accountability for each Deliverable. Monitor and report to ANDS on project progress. Advise ANDS if project appears to be in danger of non-delivery. Please add more rows as required to describe further responsibilities.	yes	ANDS funded
Project steering committee	?	Exact composition to TBA – [Steering committee now established – chaired by a representative of the office of the Pro Vice Chancellor Reseach, has representatives from ITS, Library, Office of Research Services and eResearch.]		In Kind
Data librarians	50%	Source data collections Curate data descriptions		In Kind
eResearch team	10%	Write policy and procedures for data management in the context of the RDR and RDC Report to ANDS on project governance [fixed typo] issues		In Kind

8. Budget

<removed>

9. Exit and Sustainability Plans

10. Milestones for Payment

Amount	Indicative Timing	Milestone
25% <removed>	Day One (1)	Contract execution
25% <removed>	Agreed project start date + eight (8) weeks	D15 Project Management Plan, using the ANDS template, specifying the details of the planned activity, with risks, schedules, etc D16 Progress Report, using ANDS templates
25% <removed>	Agreed project start date + 30 weeks	D16 Progress Report, using ANDS templates D1 A working feed of records describing Collections and associated Activities, Parties and Services to Research Data Australia, in the current version of RIF-CS (1.3), demonstrated to meet the quality requirements for RIF-CS records as set by ANDS
25% <removed>	52 weeks (Completion)	[D2–D7 mandatory dellverables] [any optional deliverables, including D8–D14 where applicable] D17 Final Report, using ANDS templates D18 Deposit of any software (including stylesheets and schemata) developed in the project for achieving other deliverables, and that can be (usefully) used outside the institution, in an open source repository such as Google Code, SourceForge or GitHub: a comment, summary or tag containing the text “ANDS-funded” developer manuals where applicable, to facilitate reuse deployment manuals to facilitate external deployment user manuals to facilitate use. D19 A source code report, if any software is developed and publicly deposited under D18 D20 A User Acceptance Test online survey

11. Glossary of Terms

Term	Definition
Collection	A collection describes a grouping of physical or digital items of interest to the research community, particularly research data sets or physical collections of research materials.
Activity	An activity is an undertaking or process related to the creation, update, or maintenance of a collection.
Party	A party is a person or group related to an activity, to the creation, update, or maintenance of a collection, or to the provision of a service. Parties add to the discoverability of collections and add valuable contextual information, including assisting with determination of value for a collection. A party could be either a group: one or more persons acting as a family, group, association, partnership, corporation, institution or agency. person: a human being; or an identity (or role) assumed by one or more human beings.

Appendix A. Check list of metadata store functionality

The purpose of this background check is to determine the scope of the project by structuring an analysis of your institution’s data management readiness, and to provide a check list that reflects the functionality of an effective data collection infrastructure. Completion of the checklist is not mandatory, but may well be useful to your institution.

	Yes	No	Developing
Does your institution have a Data Management Policy?			X
Is your institution able to automatically aggregate metadata about data collections from various areas/units within your institution?		X
Is any of this metadata exposed for discovery through a discipline portal?		X
Is any of this metadata exposed for discovery through an institutional portal?		X
Is any of this metadata exposed for discovery through Research Data Australia?			X
Are you able to expose and manage metadata about data collections at an object level? (Individual data objects; data collection methods; sample information; etc.)		X
Do you manipulate metadata descriptions aggregated from various areas of the institution, in order to align them with an institutional metadata standard?		X
Does your institution’s metadata conform or map to RIF-CS?			X
Does your institution’s metadata use controlled vocabularies?			X
Is your institution’s metadata integrated with institutional sources of truth (e.g. HR for researchers, Research Office for grants)?			X
Is your institution’s metadata integrated with national sources of truth (e.g. NLA Party, ARC/NHMRC grants registry)?			X
Do you have a process for registering new data collections as they are created?			X
When it comes to the core attributes of data collections required for effective data management, are you able to manage the following:	Yes	No	Developing
embargo dates for collections, where applicable?		X
current online location of collection (whether internal store or external store)?			X
current offline location of collection?			X
intellectual property rights – licensing, restrictions on reuse?			X
retention policy e.g. disposal date, deposit date?			X
policy framework e.g. data management plan, ethics clearance forms?			X

	COMPLETED	CURRENT FOCUS	NEAR FUTURE	TRANSITION TO 2.0
Information and Tools		Data Management Checklist and Template Website information Needs analysis tool	Joint Statement Data Management workshops Data Management Plans for Schools and Institutes	Publicity and announcements of services available Policy planning
Technology Services		~~Shared drive support plan~~ Researcher ticket tracking	Online request forms Services planning Service catalogue update
Research Data Catalogue	~~Contracts with ANDS~~ ~~SC20 commences~~ ~~DC21 commences~~ ~~ReDBox installation~~ ~~Import existing metadata~~	DC21 data links ~~SC20 data collections~~ MS23 connections Procedures &workflow Test Planning User Acceptance Test	File watching Automated feeds to RDA Automated ingest of institutional records Researcher self submit
Research Data Store	~~Strategy~~ ~~Project initiation~~ ~~Procurement planning~~	~~Procurement~~ ~~Set up~~ Migration of existing data ~~Shared drive services~~ Provision for RDC	Connections to cloud storage Application layer

Key Facts
Funding Source/Amount:	ANDS – Data Capture Program – $200k
Lead Organisation/CIs:	University of Western Sydney Hawkesbury Institute of the Environment Prof. Ian Anderson
Timeframe:	Development commenced in December 2011 and the system will go live in the latter half of 2012.
Related Projects:	TERN/OzFlux: The present project is best regarded as supporting the precursor activities that enable the delivery of quality assured data to a facility such as OzFlux.

Community	RDR services
Researchers	A safe place to put working data. A platform for collaboration around data. Archival storage, to meet researcher and departmental obligations to preserve data. A platform for describing and advertising data sets for discovery and re-use. A platform for publishing open access data A platform for protecting confidential or trade-secret data, managing embargoes and disposal dates. Direct connection to scalable computing resources. Minimal extra work for researchers over current interactions with ORS (grant applications, reporting etc).
Office of Research Services	An institutional view of where data resides. Integration with the research lifecycle and existing ORS systems Improved visibility for UWS research
The Library	Curation tools for research-services librarians to describe data sets Descriptions of data collections become part of the library’s holdings.
Information Technology Services	Consolidation of research data into one service model (currently storage provisioning is distributed and ad hoc).
UWS eResearch team	University processes around the RDR will make it easier to identify where eResearch needs are at UWS. Basic data storage is a ‘gateway’servicefor researchers into embracing more advanced computing and data-driven research (ieeResearch).
General community	The public will have an improved view of what research is conducted at UWS. Potential future Australian Government or other reporting on data re-use and citation rates.
External researchers	Discoverable data, people and projects Immediate re-use of open-access data where possible Mediated use data where possible, in collaborative projects or by arrangement with data creators

TL;DR

Good

Bad

But maybe it can do Linked Data?

What we did

More on stretching Omeka

What would an Omeka service look like?

Where did Intersect DIVER come from?

So, what’s the verdict?

Conclusion

And another thing …

eResearch manager’s report

2014-07-28

Introduction

UWS Events – Research Bazaar

Projects

DC21/HIEv Wraps up

New: Major Open Data Collection for the humanities

Adelta is nearly finished

Wonderama

Cr8it project rolls on

AAAA data management

Meet DORA

About this document

Summary

Assumptions

Vision

How does this fit with the UWS research plan?

Objectives 1-3

Objective 4 – Ensure UWS attracts and graduates high quality Higher Degree Research (HDR) students to its areas of research strength.

Resources

Assumed Core Resources

Other Resources

Associates

Funding

Focus areas

Policy Working Group

Communications and Organisational Development

Measures of success

eResearch Projects

AAAA projects: measures of success

Infrastructure Working Group

Intersect Engagement

eResearch Team organisational Development

Metrics

Keynote

Conference Theme, leading to discussions about sustainability

The developer challenge

What’s Alveo?

Who should attend?

Which day to attend?

Join the Research Bazaar

Q. Why are we here? A. Impact & Integrity

Research Integrity

Advisors…

Research Impact

Research Impact: new ways of working

Q. Why are we here? A. Training and organizational development

We’re building capability by grass-roots training and engagement

Q. Why are we here? A. To help set the agenda for our IT department

Number one take-away from today! To get help or advice

Penrith (Kingswood) – Wonderama

Penrith (Werrington South) – the Research Data Repository

Bankstown

Hawkesbury

Parramatta

A look at the (potential) new airport

Historical images in modern context

UWS eResearch

Notes

eResearch mission

Notes

Governance & context

Desire paths and goat tracks

Notes

Ancient history: 2012

Recent history: 2013

The Future: 2014

Case Study: The HCS vLab drag and drop research!

Notes