How eResearch-y are you?

How eResearch-y are you? An extremely serious quiz comprised of eight multiple-choice questions.
  1. Your latest published article has a graph in it. If the eResearch police asked you to reproduce the plot exactly using the original data you’d:
    1. Check out the code archived with the article, and re-run the make-file, which would not only re-generate the plot using Knitr, but the whole article, which would also be made available as an interactive website using Shiny with an option to re-run the models on data which is crowd-transcribed from the logs of 17th century slave ships.
    2. Redo the diagram in Excel, using the clearly set out method and supplemental material from the article.
    3. Find the data (by borrowing back last year’s laptop from a postgrad), then fiddle around with what you think is the right spreadsheet make something that looks pretty much like the one in the paper.
    4. Plot? What plot? And what was all that babble in option A?
  1. Turns out that some of the photos and recordings you made when documenting a research site contain images and sounds of a Yeti. If you can provide complete records of where and when you collected this data, you can collect a $1,000,000 prize from a cable TV station. Your next step is to:
    1. Provide the DOI to the dataset which you have archived in your institution’s data repository. The repository record with the data attached provides all the information required to support your claim.
    2. Scan the relevant pages from your field notebook and annotate these with supporting information specific to the Yeti sighting.
    3. Rummage around the office: you last saw that scrap of paper you scribbled on during the fieldwork with the pile on top of your filing cabinet.
    4. Quickly throw together some handwritten notes and scorch them with a candle so they look old. No actually, you couldn’t be bothered. Also you don’t believe in Yetis or Santa.
  2. You have so much data to analyse and your models are getting so complicated that your laptop is getting hot, so you:
    1. Use Docker to create a 128 Node compute cluster in the NeCTAR cloud, get some results, archive all the code, data and outputs with DOIs and go home early.
    2. Enrol in Intersect’s High Perfomance Computing (HPC) courses and learn how to run your job on shared infrastructure.
    3. Give it to one of the PhD students to sort out.
    4. We have you mixed up with someone else – your iPad never gets hot unless you watch too much YouTube in the sun.
  1. When archiving data you always:
    1. Take care to use standard file-formats that are easily machine readable, and make sure all code and as much provenance information as possible, are also archived.
    2. Fill in the metadata fields on the institutional data catalogue application as carefully as you can.
    3. Try to change the worksheet names on your Excel files from Sheet 1 to something more meaningful, if you get time.
    4. Use the shredder in the research office. It’s more fun than the old technique of scrunching up the envelope on which the data were written and trying to get it in the bin for a three-pointer.
  1. The best place to store research data during your project is:
    1. On a secure, backed-up, cloud storage server (with data held in an appropriate jurisdiction) which you can access from anywhere with an internet connection, and share with designated collaborators.
    2. On a secure, backed-up drive accessible only from your office.
    3. On a Dr Who USB stick.
    4. You delete your raw data after you’ve analysed it. Although, actually, sometimes raw data doesn’t agree with you; so you cook some up to better fit your conclusions.
 
  1. A data management plan is:
    1. An important tool which facilitates planning for the creation, storage, access and preservation of research data. Creating this at the start of a research project and referring to it as a living document informs the research workflow and specifies how data will be managed.
    2. Something to think about once you’ve collected some data.
    3. More paperwork to bog down the research process, like Ethics. Oh for the good old days when we used to be able to electrocute the students without filling out so many forms.
    4. Data management plan? I’m not even in management so don’t interupt me, I’m enjoying my holidays
  1. Collaborative research is:
    1. Enabled by eResearch technologies and supported by Open Access to published research data.
    2. Maximising the funding universities receive by sharing resources and equipment for a research project.
    3. Popping next door to ask a colleague a question.
    4. Not something you’re interested in. Your data will die with you.
  1. If you wanted to share your completed research dataset with others you would:
    1. Contact the Library or eResearch and discuss publishing the data and related methodology to the institutional data catalogue, which can then also be included in the Research Data Australia discovery portal. The data would be described using appropriate metadata, and linked to related collections, fields of research, people and facilities.
    2. Publish the data on your personal website and ask people to contact you via a hotmail address for more information.
    3. Email the file to colleagues you think would be interested
    4. You told us before – your data will die with you.
Your score:

Mostly As – We’d love to talk to you about becoming an eResearch champion. You have embraced the benefits of eResearch technology and methodology and have put comprehensive plans in place for the use and re-use of your valuable data.

Mostly Bs – You understand that technology is a useful tool but you’re hesitant to rely on it for your research. Try putting aside your trust issues and play around with one new tool or habit this week – it might spark an idea or save you valuable time. There are lots of opportunities to attend training or do a self-paced online course to increase your comfort level.

Mostly Cs – It might be time to chat to the eResearch team about joining the 21st century. Although your existing research process may be valid, eResearch boosts the research process through opportunities to add computing power, streamline workflows, and collaborate with like-minded researchers from around the globe.

Mostly Ds – Bah, humbug.

Creative Commons License
How eResearch-y are you? An extremely serious quiz comprised of eight multiple-choice questions. by Peter Sefton & Katrina Trewin is licensed under a Creative Commons Attribution 4.0 International License.

Thanks Kim Heckenberg for your input and sorry Alf, we didn’t put in anything about multi-screen immersive visualization.

Is Omeka aDORAble?

So, we have been asking looking at a few different software packages, and putting them through their paces at a series of Tuesday ‘tools days’ hosted by UWS eResearch, asking “Is this software going to be one of our supported Working Data Repositories for researcher cohorts?” That is, how does it rate as a DORA, a Digital Object Repository for Academe?

Last month we had our biggest ever tools-day event, with external people joining the usual eResearch suspects. Thanks to Jacqueline Spedding from the Dictionary of Sydney, Michael Lynch & Sharyn Wise from UTS and Cindy Wong and Jake Farrell from Intersect for coming along.

Omeka is a lightweight digital repository / website building solution, originally targeting the Galleries, Archives & Museums space.

TL;DR

So what were we wanting to know about Omeka? The external folks came along for a variety of reasons but at UWS we wanted to know the following (with short answers, so you don’t have to read on).

  • Is this something we can recommend for researchers with the kinds of research collections Omeka is known for?

    Answer: almost certainly yes, unless we turn up any major problems in further testing, this is a good, solid, basic repository for Digital Humanities projects. So, for image and document based collections with limited budgets this looks like an obvious choice.

  • Can Omeka be used to build a semantically-rich website in a research/publishing project like the Dictionary of Sydney?

(The reason we’re asking this, is that UWS has a couple of projects with some similarities to the Dictionary, and we at UWS are interested in exploring what options there are for building and maintaining a big database like this. The Dictionary uses an open source code-based called Heurist. Anyway, we have some data from Hart Cohen’s Journey to Horseshoe Bend project which was exported from an unfinished attempt to build a website using Heurist).

The verdict? Still working on it, but reasonably promising so far.

  • Beyond its obvious purpose, is this a potential generic Digital Object Repository for Academe (DORA)?
  • Maybe. Of all the repository software we’ve tried at tools-days and looked at behind the scenes, this seems to be the most flexible and easily approachable.

Good

Omeka has a lot to recommend it:

  • It’s easy to get up and running.

  • It’s easy to hack, and easy to hack well, since it has plugins and themes that let you customise it without touching the core code. These are easy enough to work with that we had people getting (small) results on the day. More on that below.

  • It uses the Digital Object Pattern (DOP) – ie at the heart of Omeka are digital objects called Items with metadata, and attached files.

  • It has an API which just works, and can add items etc, although there are some complexities, more on which below.

  • It has lots of built-in ways to ingest data, including (buggy) CSV import and OAI-PMH harvesting.

Bad

There are some annoyances:

  • The documentation, which at first glance seems fairly comprehensive is actually quite lacking. Examples of the plugin API are incorrect, and the description of the external API are pretty terse and very short on examples (eg they don’t actually give an example of how to use your API key, or the pagination).

  • The API while complete is quite painful to use if you want to add anything – to add an item with metadata it’s not as simple as saying {“title”: “My title”} or even {“dc:title”: “My Title”} – you have to do an API call to find elements called Title, from the different element sets, then pick one and use that. And copy-pasting someone else’s example is hard: their metadata element 50 may not be the same as yours. That’s nothing a decent API library wouldn’t take care of, the eResearch team is looking for a student who’d like to take the Python API on as a project (and we’ve started improving the Python library).

  • Very limited access control with no way of restricting who can see what by group.

  • By default the MYSQL search is set up to only search for 4 letter words or greater, so you can’t search for CO2 or PTA (Parramatta) both of which are in our test data; totally fixable with some tweaking.

  • Measured against our principles, there’s one clear gap. We want to encourage the use of metadata to embrace linked-data principles and use URIs to identify things, in preference to strings. So while Omeka scores points for shipping with Dublin Core metadata, it loses out for not supporting linked data. If only it let you have a URI as well as a string value for any metadata field!

But maybe it can do Linked Data?

Since the hack day we have some more news on Omeka’s coming linked data support. Patrick from the Omeka Team says on their mailing list:

Hi Peter,

Glad you asked!

The API will use JSON-LD.

The Item Add interface as we’re currently imagining it has three options for each property: text input (like what exists now), internal reference (sorta bringing Item Relations into core, just with a better design), and external URI. The additional details, like using a local label for an external URI sound interesting, and we’ll be thinking about if/how that kind of thing might work.

Properties, too, will be much more LoD-friendly. In addition to Dublin Core, the FOAF, BIBO, and other vocabularies will be available both for expressing properties, and the classes available (analogous to the Item Types currently available).

Changes like this (and more!) are at the heart of the changes to design and infrastructure I mentioned in an earlier response. We hope that the additional time will be worth it to be able to address needs like these!

You can watch the progress at the Omeka S repo: https://github.com/omeka/omeka-s

Thanks,

Patrick

This new version of Omeka (Omeka-S) is due in “The Fall Semester of 2015”, which is North American for late next year, in Spring. Hard to tell from this short post by Patrick, but this looks promising. There are a few different ways that the current version of Omeka may support Linked Data. The best way forward is probably to use the ItemRelations plugin.

But what can we do in the meantime?

  • The Item Relations plugin desperately needs a new UI element to do lookups as at the moment you need to know the integer ID of the item you want to link to. Michael Lynch and Lloyd Harischandra both looked at various aspects of this problem on the day.

  • Item Relations don’t show up in the API. But the API is extensible, so that should be doable, should be simple enough to add a resource for item_realations and allow thevocab lookups etc needed to relate things to each other as (essentially) Subject Predicate Object. PT’s been working on this as a spare-time project.

  • Item Relations doesn’t allow for a text label on the relation or the endpoint, so while you might want to say someone is the dc:creator of a resource, you only see the “Creator” label and the title of the item you link to. What if you wanted to say “Dr Sefton” or “Petiepie” rather than “Peter Sefton” but still link to the same item?

What we did

Slightly doctored photo, either that or Cindy attended twice!

Slightly doctored photo, either that or Cindy attended twice!

Gerry Devine showed off his “PageMaker” Semantic CMS: Gerry says:

The SemanticPageMaker (temporary name) is an application that allows for the creation of ‘Linked Data’-populated web pages to describe any chosen entity. Web forms are constructed from a pre-defined set of re-usable semantic tags which, when completed, automatically produce RDFa-enabled HTML and a corresponding JSON-LD document. The application thus allows semantically-rich information to be collected and exposed by users with little or no knowledge of semantic web terms.

I have attached some screenshots from my local dev instance as well as an RDFa/html page and a JSON-LD doc that describes the FACE facility (just dummy info at this stage) – note the JSON-LD doesn’t expose all fields (due to duplicated keys)

A test instance is deployed on Heroku (feel free to register and start creating stuff – might need some pointers though in how to do that until I create some help pages):

https://desolate-falls-4138.herokuapp.com/

Github:

https://github.com/gdevine/SemanticPageMaker

This might be the long-lost missing link: a simple semantic CMS which doesn’t try to be a complete semantic stack with ontologies etc, it just allows you to define entities realtions and give each type of entity a URI, and let them relate to each other and to be a good Linked Data citizen providing RDF and JSON data. Perfect for describing research context.

And during the afternoon, Gerry worked on making his CMS able to be used for lookups, so for example if we wanted to link an Omeka item to a facility at HIE we’d be able to do that via a lookup. We’re looking at building on work, the Fill My List (FML) project started by a team from Open Repositories 2014 on a universal URI lookup service with a consitent API for different sources of truth. Since the tools-day Lloyd has installed a UWS copy of FML so we can start experimenting with it with our family of repositories and research contexts.

Lloyd and Michael both worked on metadata lookups. Michael got a proof-of-concept UI going so that a user can use auto-complete to find Items rather than having to copy IDs. Lloyd got some autocomplete happening via a lookup to Orcid via FML.

PT and Jacqueline chatted about rich semantically-linked data-sets like the Dictionary of Sydney. In preparation for the workshop, PT tried taking the data from the Journey to Horseshoe Bend project, which is in a similar format to the Dictionary, putting it in a spreadsheet with multiple worksheets and importing it via a very dodgy Python Script.

Peter Bugeia investigated how environmental-science data would look in Omeka, by playing with the API to pump in data from the HIEv repository.

Sharyn and Andrew tried to hack together a simple plugin. Challenge: see if we can write a plugin which will detect YouTube links in metadata and embed a YouTube player (as a test case for a more general type of plugin that can show web previews of lots of different kinds of data). They got their hack to the “Hello World, I managed to get something on the screen” stage in 45 minutes, which is encouraging.

Jake looked at map-embedding: we had some sample data from UWS of KMZ (compressed Google-map-layers for UWS campuses), we wondered if it would be possible to show map data inline in an item page. Jake made some progress on this – the blocker isn’t Omeka it was finding a good way to do the map embedding.

Cindy continued the work she’s been doing with Jake on the Intersect press-button Omeka deployment. They’re using something called Snap Deploy and Ansible.

Jake says:

Through our Snapdeploy service Intersect are planning to offer researchers the ability to deploy their own instance of OMEKA with just a click of a button, with no IT knowledge required. All you need is an AAF log in and Snapdeploy will handle the creation of your NeCTAR Cloud VM and the deployment of OMEKA to that VM for you. We are currently in the beginning stages of adapting the Snapdeploy service to facilitate an Omeka setup and hope to offer it soon. We would also like feedback from you as researchers to let us know if there are any Omeka plug-ins that you think we could include as part of our standard deployment process that would be universally useful to the research community, so that we can ensure our Omeka product offers the functionality that researchers actually need.

David explored the API using an obscure long forgotten programming language, “Java” we think he called it and reported on the difficulty of grasping it.

More on stretching Omeka

If we were to take Omeka out of it’s core comfort zone, like say being the working data repository in an engineering lab there are a number of things we’d want to do:

  • Create some user-facing forms for data uploads these would need to be simpler than the full admin UI with lookups for almost everything, People, Subject codes, research context such as facilities.

  • Create (at least) group-level access control probably per-collection.

  • Build a generic framework for previewing or viewing files of various types. In some cases this is very simple, via the addition of a few lines of HTML, in others we’d want to have some kind of workflow system that can generate derived files.

  • Fix the things noted above: better API library, Linked Data Support,

What would an Omeka service look like?

If we wanted to offer this at UWS or beyond as well as use it for projects beyond the DH sphere, what would a supported service look like?

To make a sustainable service, we’d want to:

  • Work out how to provide robust hosting with an optimal number of small Omeka servers per host (is it one? is it ten?).

  • Come up with a generic data management plan: “We’ll host this for you for 12 months. After which if we don’t come to a new arrangement your site will be archived and given a DOI and the web site turned off”. Or something.

Creative Commons License
Is Omeka aDORAble by Peter Sefton, Andrew Leahy, Gerry Devine, Jake Farrell is licensed under a Creative Commons Attribution 4.0 International License.

Is HIEv aDORAble?

[Update 2014-09-04: added a definition of DORA]

This week we held another of our tool/hack days at UWS eResearch. This time it was at the Hawkesbury Campus, with Gerry Devine, the data manager for the Hawkesbury Institute for the Environment. This week the tool in question is the DIVER product (AKA DC21 and HIEv).

Where did Intersect DIVER come from?

Intersect DIVER was originally developed by Intersect in 2012 for the University of Western Sydney’s Hawkesbury Institute for the Environment as a means to automatically capture and secure time series and other data from the Institute’s extensive field-based facilities and experiments. Called “the HIEv”, HIE has adopted Intersect DIVER as the Institute’s primary data capture application for Institute data. For more information, see here. http://intersect.org.au/content/intersect-diver

We wanted to evaluate DIVER against our Principles for eResearch software with a view to using it as a generic DORA working data repository.

Hang on! A DORA? What’s that?

DORA is a term coined by UWS eResearch Analyst David Clarke for a generic Digital Object Repository for Academe (yes, Fedora‘s an example of the species). We expressed it thusly in our principles:

At the core of eResearch practice is keeping data safe (remember: No Data Without Metadata). Different classes of data are safest in different homes, but ideally each data set or item should live in a repository

  • It can be given a URI
  • It can be retrieved/accessed via a URI by those who should be allowed to see it, and not by those who should not
  • There are plans in place to make sure the URI resolves to something useful as long is it is likely to be needed (which may be "as long as possible").
DORA Diagram

DORA Diagram

The DIVER software is running at HIE, with more than 50 "happy scientists" (as Gerry puts it) using it to manage the research data files, including those automatically deposited from the major research facility equipment.

HIEv Shot

HIEv Shot

So, what’s the verdict?

Is DIVER a good generic DORA?

The DIVER data model is based entirely on files, which is quite a different approach from CKAN, which we looked at a few weeks ago, or Omeka, which we’re going to look at in a fortnight’s time which both use a ‘digital object’ model where an object has metadata with zero or more files.

DIVER does many things right:

  • It has metadata, so there’s No Data without Metadata (but with some limitations, see below)

  • It has API access to for all the main functionality , so researchers doing reproducible research can build recipes to fetch and put data, run models and so on from their language of choice.

  • The API works well out of the box with hardly any fuss.

  • It makes some use of URIs as names for things in the data packages it produces, so that published data packages do use URIs to describe the research context.

  • It can extract metadata from some files and make it searchable.

But there are some issues that would need to be looked at for deploying DIVER into new places:

  • The metadata model in DIVER is complicated – it has several different, non-standard, ways to represent metadata, most of which are not configurable or extensible, and a lot of the metadata is not currently searchable.

  • DIVER has two configurable ‘levels’ of metadata that automatically group files together. At HIE they are "Facility" and "Experiment". There’s no extensible metadata per-installation; like CKAN’s simple generic name/value user-addable metadata. This is the only major configuration change you can make to customise an installation. This is a very common issue with this kind of software, no matter how many levels of hierarchy there are a case will come along that breaks the built-in model.

    In my opinion the solution is not to put this kind of contextual stuff into repository software at all. Gerry Devine and I have been trying to address this by working out ways to separate out descriptions of research context from the repository, so the repository can worry only about keeping well-described content and the research context is described by a human-and-machine-readable website, ontology or database as appropriate; with whatever structure the researchers need to describe what they’re doing. Actually Gerry is doing all the work, building a new semantic CMS app that can describe research context independently of other eResearch apps.

  • There are a couple of hard-wired file preview functions (for images) and derived files (OCR and speech recognition) but no plugin system for adding new ones, so any new deployment that needed new derived file types would need a customisation budget.

  • The only data format from which DIVER can extract metadata is the proprietary TOA5 format owned by the company that produces the institute’s data-loggers. NETCDF would be more useful.

  • There are some user interface issues to address, such as making the default page for a data-file more compact.

Conclusion

There is a small community for the open source DIVER product, with two deployments, using it for very different kinds of research data. To date the DIVER community doesn’t have an agreed roadmap for where it might be heading and how the issues above might be addressed.

So at this stage I think it is suitable for re-deployment only into research environments which closely resemble HIE, probably including the same kinds of data-logger (I haven’t seen the other installation so can’t comment on that). It might be possible to develop DIVER into a more generic product, but there is no obvious business case for that at the moment over adapting a more widely adopted, more generic application. I think the way forward is for the current user-communities (of which I consider myself a member) to consider the benefits of incremental change, towards a more generic solution as they maintain and enhance the existing deployments, balancing local feature development over the potential benefits of attracting a broader community of users.

And another thing …

We discovered some holes in our end-to-end workflow for publishing data from HIEv to our Institutional Data Repository, and some gaps in the systems documentation, which we’re addressing as a matter of urgency.

Creative Commons License
Is HIEv aDORAble? by Peter Sefton is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

eResearch manager’s report 2014-07-28

Introduction

Since the last meeting of the UWS eResearch Committee on May 22nd we have updated the eResearch roadmap to reflect where we are in relation to the plan as it was set out at the beginning of 2014.

In June I attended the Open Repositories conference and a couple of other events to do with open access to publications and data, including organising an open-data publications text-mining hackfest in Edinburgh.

Looking to the future, the eResearch team has been involved in two internal funding bids in the last week:

  1. Research Portal 2 (P2): to develop a joined up research presence for the university, like the Research hub projects at Griffith and JCU.
  2. More end-to end data management via more support for the AAAA data management program we’re already running.

UWS Events – Research Bazaar

Now that UWS has all our staff positions filled, we’re making a big push to do more outreach to researchers via a number of channels, including visiting departmental meetings and research forums, along with attempting to run as many eResearch-relevant training events as we can get takers for. This is all done with the help of the eResearch Communications Working Group chaired by Susan Robbins from the UWS library.

To build eResearch capability we’re trying out Research Bazaar approach, which started in Melbourne with Steve Manos and David Flanders.

What exactly, might you ask, is the ‘Research Bazaar’ aka “ResBaz”? #ResBaz is, first and foremost, a campaign to empower researchers in the use of the University’s core IT services:

  • Empowering researchers to collaborate with one another through the use of research apps on our cloud services.

  • Empowering researchers to share the data with trusted partners via our data services.

  • Empowering researchers to establish their reputation through our parallel computing and supercomputing services.
  • Empowering researchers to invent new ways of experimenting through our emerging technology services.

Our eResearch partners Intersect are helping with this; they offer a number of Learning and Development courses, and we’re talking to them about developing and importing more.

Speaking of importing eResearch training expertise, we ran the first of a series of Research Bazaar events: Mapping for the Digital Humanities powered Melbourne eResearcharians Steve Bennet and Fiona Tweedie.

Right at the beginning of July Alveo, the virtual laboratory for Communications Science was launched by the NSW chief scientist Mary O’Kane and UWS vice-Chancellor Barney Glover with a two-day event, starting with a hackfest day to generate ideas and interest, promote use of the lab and provide some hands on training. While we didn’t brand this as a Research Bazaar activity is certainly in the #resbaz spirit.

Projects

DC21/HIEv Wraps up

The HIEv project, née DC21 is now completed and HIEv has about 50 regular users at HIE. Thanks to Peter Bugeia at Intersect for project managing the final stages of the rollout and Gerry Devine, HIE data manager for promoting the software, and putting it to good use to build dashboards etc.

New features include:

  • Log in using your account at any.university.edu.au using the Australian Access Federation.
  • Share data securely with a research cohort until you’re ready to publish it to the world for re-use and citation.

New: Major Open Data Collection for the humanities

Our latest project, the Major Open Data Collections project funded by the Australian National Data Service is in the establishment phase:

  • Carmi Cronje is working with the ITS Project Management Office to establish the project and its various steering committees, boards etc.
  • The key staff member for the project, the data librarian has been appointed. Katrina Trewin, currently working in the UWS Library joins us on August 4th.

Adelta is nearly finished

The Adelta project is nearing completion, with users now testing the service:

  • User Interface work by Intersect is nearly done, pending some discussions with the Library about accessibility requirements.
  • Final bug fixes and tweaks are being applied, as per this milestone.
  • We are working with Sydney development company hol.ly to integrate the service with the Design And Art Online database, so that we have a true linked-data approach, with Adelta authors being identified using DAAO URIs. This builds upon one of the Developer Competition entries from Open Repositories 2014 – the Fill My List URI lookup service.

Wonderama

Andrew Leahy consulted for the Google Atmosphere event (Tue July 22) at the Australian Technology Park, Eveleigh. This was a Wonderama demonstration in collaboration with NGIS www.ngis.com.au, showcasing some of the NSW state government data hosted with Google’s geo platform.

Cr8it project rolls on

Cr8it is a collaboration between Newcastle, Intersect and UWS to build an application which live in a dropbox-like file Share Sync See service, so that people can move their research data from being sets of files, to well-described data collections in a repository.

  • User testing has started on parts of the software to do with selecting, and managing files.
  • Recent development work has been focussing on re-factoring the application to make it more testable, and easier to build on, once this is done we’re on the home straight to hook it up to the Research Data Repositories at UWS and Newcastle and start publishing data.

We are now seeing a lot of uptake of Cloudstor+, the AARNeT researcher-ready version of ownCloud, on which we are planning to put Cr8it from UWS users, for example Andrew Leahy reports that a few users a week are adopting it at his suggestion.

AAAA data management

Project to establish data management practices and infrastructure in the BENS group and the Structures Lab at IIE are continuing and we are developing new AAAA projects to start soon.

Meet DORA

New eResearch Analyst David Clarke has coined the term DORA: Digital Object Repository for Academe, a name for a generic service-oriented component for storing research data, which adheres to a set of eResearch principles David and the rest of the team are working on. We are currently evaluating software against the ideal DORA model. David’s happy to talk to you about this, as he has an Open-DORA policy ☺.

Creative Commons License
eResearch manager’s report 2014-7-28 by Peter Sefton is licensed under a Creative Commons Attribution 4.0 International License.

Internal update: UWS eResearch roadmap 2014 Q3 & 4

About this document

This is the mid-year revision of University of Western Sydney eResearch team roadmap for 2014. This document will be consulted at the eResearch committee and working-group meetings to track progress throughout the year.

Summary


The timelines below have traffic-light colours to show progress. Green means things are going according to plan. Yellow means there have been delays or setbacks but these are being managed and monitored. Red means targets were not met. The main ‘red’ area is the Open Access policy – a draft has been developed and has received support from the eResearch committee and DVCR&D, is undergoing review in the office of the DVCR&D.

Assumptions

This plan assumes the current level of staffing and resources for the eResearch team and does not make any assumptions about further project funding, apart from the ANDS Major Collections project, which is in its initiation phase.

Vision

The eResearch team vision statement:

Support the objectives of the UWS research plan by creating an eResearch Ready UWS, where information and communications technologies support the collaborative conduct of high-impact, high-integrity research with minimal geographical and organisational constraints. eResearch will assist in the transition to a research culture where IT and communications technologies are integral to all research, from the fundamental underpinnings of data acquisition and creation, management and archiving, to analytical and methodological processes. Our aim is to work with stakeholders within and beyond the university to ensure UWS researchers have the information and communications technology resources, infrastructure, support and skills required, wherever they are on the path to an eResearch ready UWS.

How does this fit with the UWS research plan?

The eResearch plan is aligned with and supports the UWS Research plan. (Note this plan is now obsolete, a new one is coming with a greater emphasis on impact and community engagement and broadening research income beyond competitive grant income).

Objectives 1-3

  • Objective 1 – Increase external research income to the University

  • Objective 2 – Increase the number of fields of research at UWS operating above or well above world standard

  • Objective 3 – Increase the number and concentration of funded research partnerships

These objectives depend on UWS having a high-integrity research environment in which the institution will be able to support researchers in meeting their obligations under the Australian Code for the Responsible conduct of Research and funder expectations about data management, which is attractive to researchers, funders and collaborators. Building eResearch infrastructure, via the projects discussed below, and the forthcoming ITS research infrastructure roadmap will help create an environment conducive to successful income generation, and improve support for researchers aiming for high research performance.

  • During 2014 eResearch will begin replicating the successful roll out of end-to-end data management at HIE by creating small, tightly focused projects with clear success criteria which are aligned to the research goals of the university (Via the AAAA data management project methodology currently in development).

  • eResearch will continue to work closely with eResearch-intensive groups, for example by supporting phase two of Alveo (formerly HCS vLab) a NeCTAR grant ($1.3M, with a total project budget of ~ $3M) to set-up and implement a Virtual Laboratory for the multiple partners in the project: Above and Beyond Speech, Language and Music: A Virtual Lab for Human Communication Science.

Objective 4 – Ensure UWS attracts and graduates high quality Higher Degree Research (HDR) students to its areas of research strength.

During 2014 eResearch will be implementing programs to support HDR students, along with early-career researchers and the rest of the research community. This includes the establishment of self-supporting eResearch communities via a trial of the University of Melbourne ‘Research Bazaar’ model.


eResearch will work with our eResearch partner, Intersect to start delivering a broad range of eResearch training, building on previous training that has been delivered for High Performance Computing, see Communications and Organisational Development. HDR students will be key to this, as both one of the main audiences for training, and also serving as trainers, promulgating eResearch techniques and mind-set throughout the university.

Resources

Assumed Core Resources

  • eResearch Manager – Peter Sefton

  • eResearch Technical Advisor (~0.8 FTE) – Andrew Leahy

  • eResearch Support Officer / eResearch Analyst – TBA

  • eResearch Project Implementation Officer / Communications – Cornelia (Carmi) Cronje.

  • Intersect eResearch Analyst – Peter Bugeia

Other Resources

The resources are from other areas of the university and are financed by that cost centre. They are currently on loan to the eResearch team until October 2014.

  • Application Developer, ITS

  • Web Application Developer (provided by ITS – until ITS restructure unfolds)

Associates

The eResearch Associates are employed in key UWS research institutes or schools and work closely with the eResearch team and provide technical expertise to assist researchers.

  • Gerard Devine – HIE Data Manager –

  • Jason Ensor – Research Development Officer (Digital Humanities)

  • Nathan Mckinlay – ICT Professional Officer – IIE

  • James Wright – Technical Officer in Bioelectronics & Neuroscience – BENS

Funding

The eResearch team has no formal budget separately from the office of the DVCR&D. Recommendation: consolidate remaining project funds into an eResearch projects account to support projects in the eResearch portfolio.

  • Money that’s in the MS23 financial account ~ $22,244.24

  • RDR budget remaining ~ 100K (subject to confirmation from ITS)

Focus areas

Policy Working Group

The policy working group is chaired by Kerrin Patterson, Associate Director Performance and Quality (Acting), Office of Engagement, Strategy & Quality. The group has identified two priorities:

  • Establishing an Open Access (OA) policy for both research publications and research data.

  • Creating a Research Data Management (RDM) policy.

The working group has made substantial progress on the Open Access (OA) policy, and has asked the Manager, eResearch to review the policy framework at UWS, particularly the Research Code before starting on the Research Data Management (RDM) policy. Recent changes to Australian Research Council (ARC) funding rules for Discovery grants, mean this is now a pressing issue for both the OA and RDM policies at UWS:

A11.5.2 Researchers and institutions have an obligation to care for and maintain

research data in accordance with the Australian Code for the Responsible

Conduct of Research (2007). The ARC considers data management

planning an important part of the responsible conduct of research and

strongly encourages the depositing of data arising from a Project in an

appropriate publicly accessible subject and/or institutional repository.

Q1

Q2

Q3

Q4

Open Access Policy

Draft presented to DVCR

Policy adopted

Support DVCR&D in progressing policy thru the UWS process

Revise materials to support the policy, new Powerpoint slide show possible statements from Scott Holmes

See communications working group plan

Research Data Management policy

Review of UWS policy, particularly the Research Code

Review of UWS policy complete

Policy WG finish gap-analysis/comparison of UWS policies

Policy WG recommend whether we need an RDM policy and what its scope should be

Policy working group produce draft of RDM policy and/or updates to related policies

Communications and Organisational Development

The Communications working group is chaired by Susan Robbins, Research Services Coordinator for the UWS library. The following table sets out the broad goals for this area.

During 2014 the eResearch will be working with Intersect to establish an organisational development approach to eResearch under the “Research Bazaar” banner.

Q1

Q2

Q3

Q4

Communications plans

Generic matrix to be used for eResearch messaging

Implement for eResearch website

Communications WG publish updated plan

eResearch publish an events calendar

As directed by comms WG

Awareness campaign for OA policy

Launch of some sort?

Web pages published

Webinars and face to face briefings

Publish web pages about the policy on main site

Set up calendar for webinars and other outreach

Library to run OA promotion campaign to get more deposits

ORS to include comms about OA in research lifecycle touchpoints

Capability- Building in research groups*

Planning

Produce training resources and communicate they exist?

Run 1 #ResBaz** workshop from Melb.

Book in 2 Intersect courses

1 event run at each of HIE, DHRC, MARCS

Trial 1 Software Carpentry

Alignment of eResearch with research lifecycle

Planning / development

Two diagrams HDR and Researchers

Produce draft of lifecycle

Get feedback on draft from stakeholders (lib, ORS, eResearch, researchers)

Physical posters for use by key stakeholders

Publish lifecycle on eResearch website

Integrate lifecycle into stakeholder websites

Dissemination:

Conference presentations, journal articles, The Conversation etc

Identify potential topics and co-authors

Contact collaborators and commence writing online opinion pieces, blog posts etc.

Submit conference abstracts

Open Repositories

eResearch Australasia

Facilitate BOF session

eResearch included in Research Training agenda and materials

Planning

Plan established with ORS

Plan with ORS (Mary Krone, Luc Small)

As per plan

Work with Intersect on establishing Research Bazaar

Planning

Run as many existing Intersect courses as possible/relevant.

Initial pilot of Melbourne Uni courses

Run existing Intersect courses.

Expanded pilot of Melbourne Uni courses

Software carpentry

Research Bazaar established, program to be maintained jointly by Intersect and eResearch team

Wonderama internal & organisational dev

Developing Wonderama as platform for the Digital Humanities and the Project For Western Sydney outreach and consulting.

Developing a consulting/business model ($$)

Google Summer of Code

PX students UWS Solar Racer

CompSci Advanced projects?

Wonderama external and outreach activities

($$) = paid gig

UWS HiTech Fest (Careers market)

iFly Downunder launch at Panthers (indoor skydiving) ($$)

CeBIT conference (SCEM to sponsor?)

Google Atmosphere ($$)

TBD

** #resbaz = Research Bazaar

Measures of success

*Capability building: Count number of figures/tables/citations/programs in publications/theses produced using workshop tools and/or programming languages.

eResearch Projects

The following table lists projects which report to the eResearch Projects Working Group committee. This table shows the broad project stage for each project over the year, a separate schedule/dashboard which will be presented to the eResearch Projects committee will show detailed targets for each.

Q1

Q2

Q3

Q4

Adelta

Phase 1 finished

Discuss library hosting of Adelta

Possible Integration into Library search box for greater discoverability

HI Sandra,

Google analytics to measure use

Cr8it core app

Negotiate sustainable support offer from Intersect/AARNET

Start of trials

Implementation

Realisation

ANDS Major collection

Scoping complete

Project running

Project running

Project running

AAAA Data Management Projects

HIEv

Realisation

Realisation

(Set up reporting of research-focused metrics)

Realisation

Realisation

IIE Structures Lab

Planning, initiation

Implementation

Realisation

Realisation

MARCS BENS

Planning, initiation

Implementation

Realisation

Realisation

To Be Advised

(Digital Humanities)

Planning

Initiation

Implementation

To Be Advised

(something sciency)

Planning

Initiation

Implementation

Establish “AA” data management for facilities (Acquire & Archive)

AMCF (SEM+)

SIMS

Planning

Implementation

Realisation

NGS (Sequencing)

Planning

Implementation

Realisation

BMRF (NMR)

Planning

Implementation

MSF (MassSpec)

Planning

Implementation

CBIF (Confocal)

Planning

Implementation

AAAA projects: measures of success

Each AAAA data management project will be measured with a variety of metrics. Targets will be agreed with the project stakeholders both at project initiation and in the realisation phase and maintained in a separate AAAA dashboard. These metrics are designed to show not just raw use of the AAAA methodology in terms of users or data sets (both of which are gameable metrics) but to focus on the effect of the AAAA program on research performance and ‘eResearch readiness’.

R#

Number of researchers who have been inducted/trained and have access to AAAA infrastructure

DAR

Datasets Archived in RDR

ACD

Total # of articles in UWS publications repository citing datasets in RDR (including via repository metadata)

IDMP

Institute or research-cohort Data Management Plan(s) in place

GRDMP

Number and value of current grants which reference formal data management plans

Infrastructure Working Group

Infrastructure planning is in discussion with ITS Strategy. A technology roadmap is being produced with the ITS Roadmap Builder Tool. This will be published as a separate plan.

Intersect Engagement

The relationship between Intersect and UWS is covered by a member engagement plan (in development for 2014).

eResearch Team organisational Development

Capability

Q1

Q2

Q3

Q4

eResearch tool awareness

Team familiarity with data capture applications (eg):

CKAN

MyTardis

“Notebook programming”

Rstudio

Python Notebooks

ShaderToy

Academic authoring tools:

LaTex, Markdown, Pandoc, EPUB etc

TBA

Communications

Visual comms/ whiteboard training

TBA

TBA

Software development

eResearch tech people to attend workshop in one language*

eResearch tech people to attend workshop in one language

Team familiarity with modern programming principles and environments**

Conferences

Australasian Digital Humanities (Perth)

Open Repositories (Helsinki)

Google I/O (SF)

eResearch Australasia (Melb)

Google Open Source Summit (SF)

OzViz workshop (Bris?)

Metrics

  • *Certificate in Software Carpentry (Python/R)

  • **Team members to complete one MOOC or otherwise demonstrate professional development