Category Archives: Uncategorized
Is Omeka aDORAble?
So, we have been asking looking at a few different software packages, and putting them through their paces at a series of Tuesday ‘tools days’ hosted by UWS eResearch, asking “Is this software going to be one of our supported Working Data Repositories for researcher cohorts?” That is, how does it rate as a DORA, a Digital Object Repository for Academe?
Last month we had our biggest ever tools-day event, with external people joining the usual eResearch suspects. Thanks to Jacqueline Spedding from the Dictionary of Sydney, Michael Lynch & Sharyn Wise from UTS and Cindy Wong and Jake Farrell from Intersect for coming along.
Omeka is a lightweight digital repository / website building solution, originally targeting the Galleries, Archives & Museums space.
TL;DR
So what were we wanting to know about Omeka? The external folks came along for a variety of reasons but at UWS we wanted to know the following (with short answers, so you don’t have to read on).
Is this something we can recommend for researchers with the kinds of research collections Omeka is known for?
Answer: almost certainly yes, unless we turn up any major problems in further testing, this is a good, solid, basic repository for Digital Humanities projects. So, for image and document based collections with limited budgets this looks like an obvious choice.
Can Omeka be used to build a semantically-rich website in a research/publishing project like the Dictionary of Sydney?
(The reason we’re asking this, is that UWS has a couple of projects with some similarities to the Dictionary, and we at UWS are interested in exploring what options there are for building and maintaining a big database like this. The Dictionary uses an open source code-based called Heurist. Anyway, we have some data from Hart Cohen’s Journey to Horseshoe Bend project which was exported from an unfinished attempt to build a website using Heurist).
The verdict? Still working on it, but reasonably promising so far.
- Beyond its obvious purpose, is this a potential generic Digital Object Repository for Academe (DORA)?
Maybe. Of all the repository software we’ve tried at tools-days and looked at behind the scenes, this seems to be the most flexible and easily approachable.
Good
Omeka has a lot to recommend it:
It’s easy to get up and running.
It’s easy to hack, and easy to hack well, since it has plugins and themes that let you customise it without touching the core code. These are easy enough to work with that we had people getting (small) results on the day. More on that below.
It uses the Digital Object Pattern (DOP) – ie at the heart of Omeka are digital objects called Items with metadata, and attached files.
It has an API which just works, and can add items etc, although there are some complexities, more on which below.
It has lots of built-in ways to ingest data, including (buggy) CSV import and OAI-PMH harvesting.
Bad
There are some annoyances:
The documentation, which at first glance seems fairly comprehensive is actually quite lacking. Examples of the plugin API are incorrect, and the description of the external API are pretty terse and very short on examples (eg they don’t actually give an example of how to use your API key, or the pagination).
The API while complete is quite painful to use if you want to add anything – to add an item with metadata it’s not as simple as saying {“title”: “My title”} or even {“dc:title”: “My Title”} – you have to do an API call to find elements called Title, from the different element sets, then pick one and use that. And copy-pasting someone else’s example is hard: their metadata element 50 may not be the same as yours. That’s nothing a decent API library wouldn’t take care of, the eResearch team is looking for a student who’d like to take the Python API on as a project (and we’ve started improving the Python library).
Very limited access control with no way of restricting who can see what by group.
By default the MYSQL search is set up to only search for 4 letter words or greater, so you can’t search for CO2 or PTA (Parramatta) both of which are in our test data; totally fixable with some tweaking.
Measured against our principles, there’s one clear gap. We want to encourage the use of metadata to embrace linked-data principles and use URIs to identify things, in preference to strings. So while Omeka scores points for shipping with Dublin Core metadata, it loses out for not supporting linked data. If only it let you have a URI as well as a string value for any metadata field!
But maybe it can do Linked Data?
Since the hack day we have some more news on Omeka’s coming linked data support. Patrick from the Omeka Team says on their mailing list:
Hi Peter,
Glad you asked!
The API will use JSON-LD.
The Item Add interface as we’re currently imagining it has three options for each property: text input (like what exists now), internal reference (sorta bringing Item Relations into core, just with a better design), and external URI. The additional details, like using a local label for an external URI sound interesting, and we’ll be thinking about if/how that kind of thing might work.
Properties, too, will be much more LoD-friendly. In addition to Dublin Core, the FOAF, BIBO, and other vocabularies will be available both for expressing properties, and the classes available (analogous to the Item Types currently available).
Changes like this (and more!) are at the heart of the changes to design and infrastructure I mentioned in an earlier response. We hope that the additional time will be worth it to be able to address needs like these!
You can watch the progress at the Omeka S repo: https://github.com/omeka/omeka-s
Thanks,
Patrick
This new version of Omeka (Omeka-S) is due in “The Fall Semester of 2015”, which is North American for late next year, in Spring. Hard to tell from this short post by Patrick, but this looks promising. There are a few different ways that the current version of Omeka may support Linked Data. The best way forward is probably to use the ItemRelations plugin.
But what can we do in the meantime?
The Item Relations plugin desperately needs a new UI element to do lookups as at the moment you need to know the integer ID of the item you want to link to. Michael Lynch and Lloyd Harischandra both looked at various aspects of this problem on the day.
Item Relations don’t show up in the API. But the API is extensible, so that should be doable, should be simple enough to add a resource for item_realations and allow thevocab lookups etc needed to relate things to each other as (essentially) Subject Predicate Object. PT’s been working on this as a spare-time project.
Item Relations doesn’t allow for a text label on the relation or the endpoint, so while you might want to say someone is the dc:creator of a resource, you only see the “Creator” label and the title of the item you link to. What if you wanted to say “Dr Sefton” or “Petiepie” rather than “Peter Sefton” but still link to the same item?
What we did
Gerry Devine showed off his “PageMaker” Semantic CMS: Gerry says:
The SemanticPageMaker (temporary name) is an application that allows for the creation of ‘Linked Data’-populated web pages to describe any chosen entity. Web forms are constructed from a pre-defined set of re-usable semantic tags which, when completed, automatically produce RDFa-enabled HTML and a corresponding JSON-LD document. The application thus allows semantically-rich information to be collected and exposed by users with little or no knowledge of semantic web terms.
I have attached some screenshots from my local dev instance as well as an RDFa/html page and a JSON-LD doc that describes the FACE facility (just dummy info at this stage) – note the JSON-LD doesn’t expose all fields (due to duplicated keys)
A test instance is deployed on Heroku (feel free to register and start creating stuff – might need some pointers though in how to do that until I create some help pages):
https://desolate-falls-4138.herokuapp.com/
Github:
https://github.com/gdevine/SemanticPageMaker
This might be the long-lost missing link: a simple semantic CMS which doesn’t try to be a complete semantic stack with ontologies etc, it just allows you to define entities realtions and give each type of entity a URI, and let them relate to each other and to be a good Linked Data citizen providing RDF and JSON data. Perfect for describing research context.
And during the afternoon, Gerry worked on making his CMS able to be used for lookups, so for example if we wanted to link an Omeka item to a facility at HIE we’d be able to do that via a lookup. We’re looking at building on work, the Fill My List (FML) project started by a team from Open Repositories 2014 on a universal URI lookup service with a consitent API for different sources of truth. Since the tools-day Lloyd has installed a UWS copy of FML so we can start experimenting with it with our family of repositories and research contexts.
Lloyd and Michael both worked on metadata lookups. Michael got a proof-of-concept UI going so that a user can use auto-complete to find Items rather than having to copy IDs. Lloyd got some autocomplete happening via a lookup to Orcid via FML.
PT and Jacqueline chatted about rich semantically-linked data-sets like the Dictionary of Sydney. In preparation for the workshop, PT tried taking the data from the Journey to Horseshoe Bend project, which is in a similar format to the Dictionary, putting it in a spreadsheet with multiple worksheets and importing it via a very dodgy Python Script.
Peter Bugeia investigated how environmental-science data would look in Omeka, by playing with the API to pump in data from the HIEv repository.
Sharyn and Andrew tried to hack together a simple plugin. Challenge: see if we can write a plugin which will detect YouTube links in metadata and embed a YouTube player (as a test case for a more general type of plugin that can show web previews of lots of different kinds of data). They got their hack to the “Hello World, I managed to get something on the screen” stage in 45 minutes, which is encouraging.
Jake looked at map-embedding: we had some sample data from UWS of KMZ (compressed Google-map-layers for UWS campuses), we wondered if it would be possible to show map data inline in an item page. Jake made some progress on this – the blocker isn’t Omeka it was finding a good way to do the map embedding.
Cindy continued the work she’s been doing with Jake on the Intersect press-button Omeka deployment. They’re using something called Snap Deploy and Ansible.
Jake says:
Through our Snapdeploy service Intersect are planning to offer researchers the ability to deploy their own instance of OMEKA with just a click of a button, with no IT knowledge required. All you need is an AAF log in and Snapdeploy will handle the creation of your NeCTAR Cloud VM and the deployment of OMEKA to that VM for you. We are currently in the beginning stages of adapting the Snapdeploy service to facilitate an Omeka setup and hope to offer it soon. We would also like feedback from you as researchers to let us know if there are any Omeka plug-ins that you think we could include as part of our standard deployment process that would be universally useful to the research community, so that we can ensure our Omeka product offers the functionality that researchers actually need.
David explored the API using an obscure long forgotten programming language, “Java” we think he called it and reported on the difficulty of grasping it.
More on stretching Omeka
If we were to take Omeka out of it’s core comfort zone, like say being the working data repository in an engineering lab there are a number of things we’d want to do:
Create some user-facing forms for data uploads these would need to be simpler than the full admin UI with lookups for almost everything, People, Subject codes, research context such as facilities.
Create (at least) group-level access control probably per-collection.
Build a generic framework for previewing or viewing files of various types. In some cases this is very simple, via the addition of a few lines of HTML, in others we’d want to have some kind of workflow system that can generate derived files.
Fix the things noted above: better API library, Linked Data Support,
What would an Omeka service look like?
If we wanted to offer this at UWS or beyond as well as use it for projects beyond the DH sphere, what would a supported service look like?
To make a sustainable service, we’d want to:
Work out how to provide robust hosting with an optimal number of small Omeka servers per host (is it one? is it ten?).
Come up with a generic data management plan: “We’ll host this for you for 12 months. After which if we don’t come to a new arrangement your site will be archived and given a DOI and the web site turned off”. Or something.
Is Omeka aDORAble by Peter Sefton, Andrew Leahy, Gerry Devine, Jake Farrell is licensed under a Creative Commons Attribution 4.0 International License.
Is HIEv aDORAble?
[Update 2014-09-04: added a definition of DORA]
This week we held another of our tool/hack days at UWS eResearch. This time it was at the Hawkesbury Campus, with Gerry Devine, the data manager for the Hawkesbury Institute for the Environment. This week the tool in question is the DIVER product (AKA DC21 and HIEv).
Where did Intersect DIVER come from?
Intersect DIVER was originally developed by Intersect in 2012 for the University of Western Sydney’s Hawkesbury Institute for the Environment as a means to automatically capture and secure time series and other data from the Institute’s extensive field-based facilities and experiments. Called “the HIEv”, HIE has adopted Intersect DIVER as the Institute’s primary data capture application for Institute data. For more information, see here. http://intersect.org.au/content/intersect-diver
We wanted to evaluate DIVER against our Principles for eResearch software with a view to using it as a generic DORA working data repository.
Hang on! A DORA? What’s that?
DORA is a term coined by UWS eResearch Analyst David Clarke for a generic Digital Object Repository for Academe (yes, Fedora‘s an example of the species). We expressed it thusly in our principles:
At the core of eResearch practice is keeping data safe (remember: No Data Without Metadata). Different classes of data are safest in different homes, but ideally each data set or item should live in a repository
- It can be given a URI
- It can be retrieved/accessed via a URI by those who should be allowed to see it, and not by those who should not
- There are plans in place to make sure the URI resolves to something useful as long is it is likely to be needed (which may be "as long as possible").
The DIVER software is running at HIE, with more than 50 "happy scientists" (as Gerry puts it) using it to manage the research data files, including those automatically deposited from the major research facility equipment.
So, what’s the verdict?
Is DIVER a good generic DORA?
The DIVER data model is based entirely on files, which is quite a different approach from CKAN, which we looked at a few weeks ago, or Omeka, which we’re going to look at in a fortnight’s time which both use a ‘digital object’ model where an object has metadata with zero or more files.
DIVER does many things right:
It has metadata, so there’s No Data without Metadata (but with some limitations, see below)
It has API access to for all the main functionality , so researchers doing reproducible research can build recipes to fetch and put data, run models and so on from their language of choice.
The API works well out of the box with hardly any fuss.
It makes some use of URIs as names for things in the data packages it produces, so that published data packages do use URIs to describe the research context.
It can extract metadata from some files and make it searchable.
But there are some issues that would need to be looked at for deploying DIVER into new places:
The metadata model in DIVER is complicated – it has several different, non-standard, ways to represent metadata, most of which are not configurable or extensible, and a lot of the metadata is not currently searchable.
DIVER has two configurable ‘levels’ of metadata that automatically group files together. At HIE they are "Facility" and "Experiment". There’s no extensible metadata per-installation; like CKAN’s simple generic name/value user-addable metadata. This is the only major configuration change you can make to customise an installation. This is a very common issue with this kind of software, no matter how many levels of hierarchy there are a case will come along that breaks the built-in model.
In my opinion the solution is not to put this kind of contextual stuff into repository software at all. Gerry Devine and I have been trying to address this by working out ways to separate out descriptions of research context from the repository, so the repository can worry only about keeping well-described content and the research context is described by a human-and-machine-readable website, ontology or database as appropriate; with whatever structure the researchers need to describe what they’re doing. Actually Gerry is doing all the work, building a new semantic CMS app that can describe research context independently of other eResearch apps.
There are a couple of hard-wired file preview functions (for images) and derived files (OCR and speech recognition) but no plugin system for adding new ones, so any new deployment that needed new derived file types would need a customisation budget.
The only data format from which DIVER can extract metadata is the proprietary TOA5 format owned by the company that produces the institute’s data-loggers. NETCDF would be more useful.
There are some user interface issues to address, such as making the default page for a data-file more compact.
Conclusion
There is a small community for the open source DIVER product, with two deployments, using it for very different kinds of research data. To date the DIVER community doesn’t have an agreed roadmap for where it might be heading and how the issues above might be addressed.
So at this stage I think it is suitable for re-deployment only into research environments which closely resemble HIE, probably including the same kinds of data-logger (I haven’t seen the other installation so can’t comment on that). It might be possible to develop DIVER into a more generic product, but there is no obvious business case for that at the moment over adapting a more widely adopted, more generic application. I think the way forward is for the current user-communities (of which I consider myself a member) to consider the benefits of incremental change, towards a more generic solution as they maintain and enhance the existing deployments, balancing local feature development over the potential benefits of attracting a broader community of users.
And another thing …
We discovered some holes in our end-to-end workflow for publishing data from HIEv to our Institutional Data Repository, and some gaps in the systems documentation, which we’re addressing as a matter of urgency.
Is HIEv aDORAble? by Peter Sefton is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
eResearch manager’s report 2014-07-28
eResearch manager’s report
2014-07-28
Introduction
Since the last meeting of the UWS eResearch Committee on May 22nd we have updated the eResearch roadmap to reflect where we are in relation to the plan as it was set out at the beginning of 2014.
In June I attended the Open Repositories conference and a couple of other events to do with open access to publications and data, including organising an open-data publications text-mining hackfest in Edinburgh.
Looking to the future, the eResearch team has been involved in two internal funding bids in the last week:
- Research Portal 2 (P2): to develop a joined up research presence for the university, like the Research hub projects at Griffith and JCU.
- More end-to end data management via more support for the AAAA data management program we’re already running.
UWS Events – Research Bazaar
Now that UWS has all our staff positions filled, we’re making a big push to do more outreach to researchers via a number of channels, including visiting departmental meetings and research forums, along with attempting to run as many eResearch-relevant training events as we can get takers for. This is all done with the help of the eResearch Communications Working Group chaired by Susan Robbins from the UWS library.
To build eResearch capability we’re trying out Research Bazaar approach, which started in Melbourne with Steve Manos and David Flanders.
What exactly, might you ask, is the ‘Research Bazaar’ aka “ResBaz”? #ResBaz is, first and foremost, a campaign to empower researchers in the use of the University’s core IT services:
Empowering researchers to collaborate with one another through the use of research apps on our cloud services.
Empowering researchers to share the data with trusted partners via our data services.
- Empowering researchers to establish their reputation through our parallel computing and supercomputing services.
Empowering researchers to invent new ways of experimenting through our emerging technology services.
Our eResearch partners Intersect are helping with this; they offer a number of Learning and Development courses, and we’re talking to them about developing and importing more.
Speaking of importing eResearch training expertise, we ran the first of a series of Research Bazaar events: Mapping for the Digital Humanities powered Melbourne eResearcharians Steve Bennet and Fiona Tweedie.
Right at the beginning of July Alveo, the virtual laboratory for Communications Science was launched by the NSW chief scientist Mary O’Kane and UWS vice-Chancellor Barney Glover with a two-day event, starting with a hackfest day to generate ideas and interest, promote use of the lab and provide some hands on training. While we didn’t brand this as a Research Bazaar activity is certainly in the #resbaz spirit.
Projects
DC21/HIEv Wraps up
The HIEv project, née DC21 is now completed and HIEv has about 50 regular users at HIE. Thanks to Peter Bugeia at Intersect for project managing the final stages of the rollout and Gerry Devine, HIE data manager for promoting the software, and putting it to good use to build dashboards etc.
New features include:
- Log in using your account at any.university.edu.au using the Australian Access Federation.
- Share data securely with a research cohort until you’re ready to publish it to the world for re-use and citation.
New: Major Open Data Collection for the humanities
Our latest project, the Major Open Data Collections project funded by the Australian National Data Service is in the establishment phase:
- Carmi Cronje is working with the ITS Project Management Office to establish the project and its various steering committees, boards etc.
- The key staff member for the project, the data librarian has been appointed. Katrina Trewin, currently working in the UWS Library joins us on August 4th.
Adelta is nearly finished
The Adelta project is nearing completion, with users now testing the service:
- User Interface work by Intersect is nearly done, pending some discussions with the Library about accessibility requirements.
- Final bug fixes and tweaks are being applied, as per this milestone.
- We are working with Sydney development company hol.ly to integrate the service with the Design And Art Online database, so that we have a true linked-data approach, with Adelta authors being identified using DAAO URIs. This builds upon one of the Developer Competition entries from Open Repositories 2014 – the Fill My List URI lookup service.
Wonderama
Andrew Leahy consulted for the Google Atmosphere event (Tue July 22) at the Australian Technology Park, Eveleigh. This was a Wonderama demonstration in collaboration with NGIS www.ngis.com.au, showcasing some of the NSW state government data hosted with Google’s geo platform.
Cr8it project rolls on
Cr8it is a collaboration between Newcastle, Intersect and UWS to build an application which live in a dropbox-like file Share Sync See service, so that people can move their research data from being sets of files, to well-described data collections in a repository.
- User testing has started on parts of the software to do with selecting, and managing files.
- Recent development work has been focussing on re-factoring the application to make it more testable, and easier to build on, once this is done we’re on the home straight to hook it up to the Research Data Repositories at UWS and Newcastle and start publishing data.
We are now seeing a lot of uptake of Cloudstor+, the AARNeT researcher-ready version of ownCloud, on which we are planning to put Cr8it from UWS users, for example Andrew Leahy reports that a few users a week are adopting it at his suggestion.
AAAA data management
Project to establish data management practices and infrastructure in the BENS group and the Structures Lab at IIE are continuing and we are developing new AAAA projects to start soon.
Meet DORA
New eResearch Analyst David Clarke has coined the term DORA: Digital Object Repository for Academe, a name for a generic service-oriented component for storing research data, which adheres to a set of eResearch principles David and the rest of the team are working on. We are currently evaluating software against the ideal DORA model. David’s happy to talk to you about this, as he has an Open-DORA policy ☺.
eResearch manager’s report 2014-7-28 by Peter Sefton is licensed under a Creative Commons Attribution 4.0 International License.
Internal update: UWS eResearch roadmap 2014 Q3 & 4
About this document
This
is the mid-year revision of University of Western Sydney eResearch
team roadmap for 2014. This document will be consulted at the
eResearch committee and working-group meetings to track progress
throughout the year.
This
plan assumes the current level of staffing and resources for the
eResearch team and does not make any assumptions about further
project funding, apart from the ANDS Major Collections project, which
is in its initiation phase.
The
eResearch team vision statement:
Support
the objectives of the UWS research plan by creating an eResearch
Ready UWS, where information and communications technologies
support the collaborative conduct of high-impact, high-integrity
research with minimal geographical and organisational constraints.
eResearch will assist in the transition to a research culture where
IT and communications technologies are integral to all research, from
the fundamental underpinnings of data acquisition and creation,
management and archiving, to analytical and methodological processes.
Our aim is to work with stakeholders within and beyond the
university to ensure UWS researchers have the information and
communications technology resources, infrastructure, support and
skills required, wherever they are on the path to an eResearch ready
UWS.
The
eResearch plan is aligned with and supports the UWS
Research plan. (Note this plan is now obsolete, a new one is
coming with a greater emphasis on impact and community engagement and
broadening research income beyond competitive grant income).
Objective 1 – Increase external research income to the University
Objective 2 – Increase the number of fields of research at UWS
operating above or well above world standard
Objective 3 – Increase the number and concentration of funded
research partnerships
These objectives depend on UWS having a high-integrity research
environment in which the institution will be able to support
researchers in meeting their obligations under the Australian
Code for the Responsible conduct of Research and funder
expectations about data management, which is attractive to
researchers, funders and collaborators. Building eResearch
infrastructure, via the projects discussed below, and the forthcoming
ITS research infrastructure roadmap will help create an environment
conducive to successful income generation, and improve support for
researchers aiming for high research performance.
During 2014 eResearch will begin replicating the successful roll out
of end-to-end data management at HIE by creating small, tightly
focused projects with clear success criteria which are aligned to
the research goals of the university (Via the AAAA
data management project methodology currently in development).
eResearch will continue to work closely with eResearch-intensive
groups, for example by supporting phase two of Alveo
(formerly HCS vLab) a NeCTAR grant ($1.3M, with a total project
budget of ~ $3M) to set-up and implement a Virtual Laboratory for
the multiple partners in the project: Above and Beyond Speech,
Language and Music: A Virtual Lab for Human Communication Science.
During 2014 eResearch will be implementing programs to support HDR
students, along with early-career researchers and the rest of the
research community. This includes the establishment of
self-supporting eResearch communities via a trial of the University
of Melbourne ‘Research
Bazaar’ model.
eResearch
Manager – Peter Sefton
eResearch
Technical Advisor (~0.8 FTE) – Andrew Leahy
eResearch
Support Officer / eResearch Analyst – TBA
eResearch
Project Implementation Officer / Communications – Cornelia (Carmi)
Cronje.
Intersect
eResearch Analyst – Peter Bugeia
The
resources are from other areas of the university and are financed by
that cost centre. They are currently on loan to the eResearch team
until October 2014.
Application
Developer, ITS
Web
Application Developer (provided by ITS – until ITS restructure
unfolds)
The
eResearch Associates are employed in key UWS research institutes or
schools and work closely with the eResearch team and provide
technical expertise to assist researchers.
Gerard
Devine – HIE Data Manager –
Jason
Ensor – Research Development Officer (Digital Humanities)
Nathan
Mckinlay – ICT Professional Officer
– IIE
James
Wright – Technical Officer in
Bioelectronics & Neuroscience – BENS
The
eResearch team has no formal budget separately from the office of the
DVCR&D. Recommendation: consolidate remaining project funds into
an eResearch projects account to support projects in the eResearch
portfolio.
Money
that’s in the MS23 financial account ~ $22,244.24
RDR
budget remaining ~ 100K (subject to confirmation from ITS)
The
policy working group is chaired by Kerrin Patterson, Associate
Director Performance and Quality (Acting), Office of Engagement,
Strategy & Quality. The group has identified two priorities:
Establishing
an Open Access (OA) policy for both research publications and
research data.
Creating
a Research Data Management (RDM) policy.
The
working group has made substantial progress on the Open Access (OA)
policy, and has asked the Manager, eResearch to review the policy
framework at UWS, particularly the Research Code before starting on
the Research Data Management (RDM) policy. Recent changes to
Australian Research Council (ARC) funding
rules for Discovery grants, mean this is now a
pressing issue for both the OA and RDM policies at UWS:
A11.5.2
Researchers and institutions have an obligation to care for and
maintain
research
data in accordance with the Australian Code for the Responsible
Conduct
of Research (2007). The ARC considers data management
planning
an important part of the responsible conduct of research and
strongly
encourages the depositing of data arising from a Project in an
appropriate
publicly accessible subject and/or institutional repository.
Q1
Q2
Q3
Q4
Open
Access Policy
Draft
presented to DVCR
Policy
adopted
Support
DVCR&D in progressing policy thru the UWS process
Revise
materials to support the policy, new Powerpoint slide show
possible statements from Scott Holmes
See
communications working group plan
Research
Data Management policy
Review
of UWS policy, particularly the Research Code
Review
of UWS policy complete
Policy
WG finish gap-analysis/comparison of UWS policies
Policy
WG recommend whether we need an RDM policy and what its scope
should be
Policy
working group produce draft of RDM policy and/or updates to
related policies
The
Communications working group is chaired by Susan Robbins, Research
Services Coordinator for the UWS library. The following table sets
out the broad goals for this area.
During
2014 the eResearch will be working with Intersect to establish an
organisational development approach to eResearch under the “Research
Bazaar” banner.
Q1
Q2
Q3
Q4
Communications
plans
Generic
matrix to be used for eResearch messaging
Implement
for eResearch website
Communications
WG publish updated plan
eResearch
publish an events calendar
As
directed by comms WG
Awareness
campaign for OA policy
Launch
of some sort?
Web
pages published
Webinars
and face to face briefings
Publish
web pages about the policy on main site
Set
up calendar for webinars and other outreach
Library
to run OA promotion campaign to get more deposits
ORS
to include comms about OA in research lifecycle touchpoints
Capability-
Building in research groups*
Planning
Produce
training resources and communicate they exist?
Run
1 #ResBaz** workshop from Melb.
Book
in 2 Intersect courses
1
event run at each of HIE, DHRC, MARCS
Trial
1 Software Carpentry
Alignment
of eResearch with research lifecycle
Planning
/ development
Two
diagrams HDR and Researchers
Produce
draft of lifecycle
Get
feedback on draft from stakeholders (lib, ORS, eResearch,
researchers)
Physical
posters for use by key stakeholders
Publish
lifecycle on eResearch website
Integrate
lifecycle into stakeholder websites
Dissemination:
Conference
presentations, journal articles, The Conversation etc
Identify
potential topics and co-authors
Contact
collaborators and commence writing online opinion pieces, blog
posts etc.
Submit
conference abstracts
Open
Repositories
eResearch
Australasia
Facilitate
BOF session
eResearch
included in Research Training agenda and materials
Planning
Plan
established with ORS
Plan
with ORS (Mary Krone, Luc Small)
As
per plan
Work
with Intersect on establishing Research Bazaar
Planning
Run
as many existing Intersect courses as possible/relevant.
Initial
pilot of Melbourne Uni courses
Run
existing Intersect courses.
Expanded
pilot of Melbourne Uni courses
Software
carpentry
Research
Bazaar established, program to be maintained jointly by Intersect
and eResearch team
Wonderama
internal & organisational dev
Developing
Wonderama as platform for the Digital Humanities and the Project
For Western Sydney outreach and consulting.
Developing
a consulting/business model ($$)
Google
Summer of Code
PX
students UWS Solar Racer
CompSci
Advanced projects?
Wonderama
external and outreach activities
($$)
= paid gig
UWS
HiTech Fest (Careers market)
iFly
Downunder launch at Panthers (indoor skydiving) ($$)
CeBIT
conference (SCEM to sponsor?)
Google
Atmosphere ($$)
TBD
**
#resbaz = Research Bazaar
*Capability
building: Count number of figures/tables/citations/programs in
publications/theses produced using workshop tools and/or programming
languages.
The
following table lists projects which report to the eResearch Projects
Working Group committee. This table shows the broad project stage for
each project over the year, a separate schedule/dashboard which will
be presented to the eResearch Projects committee will show detailed
targets for each.
Q1
Q2
Q3
Q4
Adelta
Phase
1 finished
Discuss
library hosting of Adelta
Possible
Integration into Library search box for greater discoverability
HI
Sandra,
Google
analytics to measure use
Cr8it
core app
Negotiate
sustainable support offer from Intersect/AARNET
Start
of trials
Implementation
Realisation
ANDS
Major collection
Scoping
complete
Project
running
Project
running
Project
running
AAAA
Data Management Projects
HIEv
Realisation
Realisation
(Set
up reporting of research-focused metrics)
Realisation
Realisation
IIE
Structures Lab
Planning,
initiation
Implementation
Realisation
Realisation
MARCS
BENS
Planning,
initiation
Implementation
Realisation
Realisation
To
Be Advised
(Digital
Humanities)
Planning
Initiation
Implementation
To
Be Advised
(something
sciency)
Planning
Initiation
Implementation
Establish
“AA” data management for facilities (Acquire & Archive)
AMCF
(SEM+)
SIMS
Planning
Implementation
Realisation
NGS
(Sequencing)
Planning
Implementation
Realisation
BMRF
(NMR)
Planning
Implementation
MSF
(MassSpec)
Planning
Implementation
CBIF
(Confocal)
Planning
Implementation
Each
AAAA data management project will be measured with a variety of
metrics. Targets will be agreed with the project stakeholders both at
project initiation and in the realisation phase and maintained in a
separate AAAA dashboard. These metrics are designed to show not just
raw use of the AAAA methodology in terms of users or data sets (both
of which are gameable metrics) but to focus on the effect of the AAAA
program on research performance and ‘eResearch readiness’.
R#
Number
of researchers who have been inducted/trained and have access to
AAAA infrastructure
DAR
Datasets
Archived in RDR
ACD
Total
# of articles in UWS publications repository citing datasets in
RDR (including via repository metadata)
IDMP
Institute
or research-cohort Data Management Plan(s) in place
GRDMP
Number
and value of current grants which reference formal data management
plans
Infrastructure
planning is in discussion with ITS Strategy. A technology roadmap is
being produced with the ITS Roadmap Builder Tool. This will be
published as a separate plan.
The
relationship between Intersect and UWS is covered by a member
engagement plan (in development for 2014).
Capability
Q1
Q2
Q3
Q4
eResearch
tool awareness
Team
familiarity with data capture applications (eg):
CKAN
MyTardis
“Notebook
programming”
Rstudio
Python
Notebooks
ShaderToy
Academic
authoring tools:
LaTex,
Markdown, Pandoc, EPUB etc
TBA
Communications
Visual
comms/ whiteboard training
TBA
TBA
Software
development
eResearch
tech people to attend workshop in one language*
eResearch
tech people to attend workshop in one language
Team
familiarity with modern programming principles and environments**
Conferences
Australasian
Digital Humanities (Perth)
Open
Repositories (Helsinki)
Google
I/O (SF)
eResearch
Australasia (Melb)
Google
Open Source Summit (SF)
OzViz
workshop (Bris?)
*Certificate
in Software Carpentry (Python/R)
**Team
members to complete one MOOC or otherwise demonstrate professional
development
Summary
The
timelines below have traffic-light colours to show progress. Green
means things are going according to plan. Yellow means there have
been delays or setbacks but these are being managed and monitored.
Red means targets were not met. The main ‘red’ area is the Open
Access policy – a draft has been developed and has received support
from the eResearch committee and DVCR&D, is undergoing review in
the office of the DVCR&D.
Assumptions
Vision
How does this fit with the UWS research plan?
Objectives 1-3
Objective 4 – Ensure UWS attracts and graduates high quality Higher
Degree Research (HDR) students to its areas of research strength.
eResearch will work with our eResearch partner, Intersect to
start delivering a broad range of eResearch training, building on
previous training that has been delivered for High Performance
Computing, see Communications
and Organisational Development. HDR students will be key to this,
as both one of the main audiences for training, and also serving as
trainers, promulgating eResearch techniques and mind-set throughout
the university.
Resources
Assumed Core Resources
Other Resources
Associates
Funding
Focus areas
Policy Working Group
Communications and Organisational Development
Measures of success
eResearch Projects
AAAA projects: measures of success
Infrastructure Working Group
Intersect Engagement
eResearch Team organisational Development
Metrics
First Research Bazaar event at UWS, Mapping for humanities
The eResearch team, just finished running a two day session on mapping tools for the humanities, delivered by visiting trainers from the University of Melbourne eResearch team, under the Research Bazaar #resbaz umbrella. Resbaz is about enabling communities of practice for eResearch, rather than building expensive centralized support. We had lots of positive feedback from participants, and a good vibe; you know it’s working when people sit at the computers and keep playing well after the lunch has arrived.
The session served-up two main packages:
- CartoDB – a nice online tool for map building – putting (fancy) dots on online maps. See the slides. CartoDB is available as a paid service, but stay tuned for a version that’s free for researchers.edu.au.
- Tilemill, a more comprehensive tool for making publication quality print and online maps (available as a desktop app).
More workshops coming soon – see these offerings from Intersect, our eResearch partner. The Open Refine course in particular is really useful for anyone who deals with spreadsheet or table data.
- 5 August 2014 Cleaning & exploring your data with Open Refine at UWS.
- 5 August 2014: Data Visualisation with Google Fusion Tables at UWS.
We don’t have all the results in from the official feedback survey yet, but the verbal feedback was positive from the participants. One thing we’d like to look at for future #resbaz training is making sure we add a little dash of data management and consideration of the end-to-end research process to each workshop.
- Depending on the course, take the time at the start to set people up with Cloudstor+ storage, a git repository or another appropriate management system for working data and a place to publish results, maybe github, maybe figshare, or a discipline specific or institutional repository.
- Keep online notes, maybe using one of the online lab/research notebook platforms – (we’re watching Egon Willighagen’s ongoing review of these systems attentively – please keep it up Egon!).
- At the end of the workshop, publish something – in the case of the maps it would be good to actually work though the process of getting a good print or web version of the map, and making sure all the data and code used to create it are saved and published.
- Oh, and I’d love to be able to offer a prize for the first published map in an article or submitted thesis to come out of the workshop.
First Research Bazaar event at UWS, Mapping for humanities by Peter Sefton is licensed under a Creative Commons Attribution 4.0 International License.
Trip report: Peter Sefton @ Open Repositories 2014, Helsinki, Finland
Trip report: Peter Sefton @ Open Repositories 2014, Helsinki, Finland by Peter Sefton is licensed under a Creative Commons Attribution 4.0 International License.
From June 9th-13 th I attended the Open Repositories conference way up North in Helsinki. This year I was not only on the main committee for the conference, but was part of a new extension to the Program Committee, overseeing the Developer Challenge event, which has been part of the conference since OR2008 in Southampton . I think the dev challenge went reasonably well, but probably requires a re-think for future conferences, more on that below.
In this too-long-you-probably-won’t read post I’ll run through a few highlights around the conference theme, the keynote and the dev event.
Summary: For me the take-away was that now we have a repository ecosystem developing, and the OR catchment extends further and further beyond the library, sustainability is the big issue , and conversations around sustainability of research data repositories in particular are going to be key to the next few iterations of this conference. Sustainability might make a good theme or sub-theme. Related to sustainability is risk; how do we reduce the risk of the data equivalent of the serials crisis if there is such a crisis it won’t look the same, so how we will stop it?
Keynote
The keynote this time was excellent. Neuroscientist Erin McKiernan from Mexico gave an impassioned and informed view of the importance of Open Access: Culture change in academia: Making sharing the new norm (McKiernan, 2014). Working in Latin America McKiernan could talk first-hand about how the scholarly communications system we have now disadvantages all but the wealthiest countries.
There was a brief flurry of controversy on Twitter over a question I asked about the risks associated with commercially owned parts of the scholarly infrastructure and how we can manage those risks. I did state that I thought that Figshare was owned by McMillan’s Digital Science, but was corrected by Mark Hahnel; Digital Science is an investor, so I guess “it is one of the owners” rather than “owns”. Anyway, my question was misheard as something along the lines of “How can you love Figshare so much when you hate Nature and they’re owned by the same company”. That’s not what I meant to say, but before I try to make my point again in a more considered way, some context.
McKiernan had shown a slide like this:
My pledge to be open
I will not edit, review, or work for closed access journals.
I will blog my work and post preprints, when possible.
I will publish only in open access journals.
I will not publish in Cell, Nature, or Science.
I will pull my name off a paper if coauthors refuse to be open.
If I am going to ‘make it’ in science, it has to be on terms I can live with.
Good stuff! If everyone did this, the Scholarly Communications process would be forced to rationalize itself much more quickly than is currently happening and we could skip the endless debates about the “Green Road” and the “Gold Road” and the “Fools Gold Road”. It’s tragic we’re still debating in this using this weird colour-coded-speak twenty years in to the O A movement .
Anyway, note the mention of Nature .
What I was trying to ask was: How can we make sure that McKiernan doesn’t find herself, in twenty years time, with a slide that says:
“I will not put my data in Figshare”.
That is, how do we make sure we don’t make the same mistake we made with scholarly publishing? You know, where academics write and review articles, often give up copyright in the publishing process, and collectively we end up paying way over the odds for a toxic mixture of rental subscriptions and author-pays open-access, with some risk the publisher will ‘forget’ to make stuff open.
I don’t have any particular problem with Figshare as it is now, in fact I’m promoting its use at my University, and working with the team here on being able to post data to it from our Cr8it data publishing app . All I’m saying is that we must remain vigilant. The publishing industry has managed to transform itself under our noses from: much needed distribution service of tangible goods ; to rental service where we get access to The Literature pretty-much only if we keep paying ; to its new position as The custodian of The Literature for All Time , usurping libraries as the place we keep our stuff.
We need to make sure that the appealing free puppy offered by the friendly people at Figshare doesn’t grow into a vicious dog that mauls our children or eats up the research budget.
So, remember, Figshare is not just for Christmas.
Disclosure: After the keynote, I was invited to an excellent Thai dinner by the Figshare team, along with Erin and a couple of other conference-goers. Thanks for the Salmon and the wine, Mark and the Figshare investors. I also snaffled a few T-Shirts from a later event ( Disruption In The Publishing Industry: Digital, Analytics & The Future ) to give to people back home.
Conference Theme, leading to discussions about sustainability
The conference theme was Towards Repository Ecosystems .
Repository systems are but one part of the ecosystem in 21st century research, and it is increasingly clear that no single repository will serve as the sole resource for its community. How can repositories best be positioned to offer complementary services in a network that includes research data management systems, institutional and discipline repositories, publishers, and the open Web? When should service providers build to fill identified niches, and where should they connect with related services? How might these networks offer services to support organizations that lack the resources to build their own, or researchers seeking to optimize their domain workflows?
Even if I say so myself, the presentation I delivered for the Alveo project (co-authored with others on the team) was highly theme-appropriate; it was all about researcher-needs driving the creation of a repository service as the hub of a Virtual Research Environment, where the repository part is important but it’s not the whole point .
I had trouble getting to see many papers, given the dev-wrangling, but there was definitely a lot of eco-system-ish work going on, as reported by Jon Dunn :
Many sessions addressed how digital repositories can fit into a larger ecosystem of research and digital information. A panel on ORCID implementation experiences showed how this technology could be used to tie publications and data in repositories to institutional identity and access management systems, researcher profiles, current research information systems, and dissertation submission workflows; similar discussions took place around DOIs and other identifiers. Other sessions addressed the role of institutional repositories beyond traditional research outputs to address needs in teaching and learning and administrative settings and issues of interoperability and aggregation among content in multiple repositories and other systems .
One session I did catch (and not just ‘cos I was chairing it) had a presentation by Adam Field and Patrick McSweeney on Micro data repositories: increasing the value of research on the web (Field and McSweeney, 2014). This has direct application to what we need to do in eResearch, Adam reported on their experience setting up bespoke repository systems for individual research projects, with a key ingredient missing in a lot of such systems; maintenance and support from central IT. We’re trying to do something similar at the University of Western Sydney, replicating the success of a working-data repository at one of our institutes ( reported at OR2013 ) across the rest of the university, I’ll talk more to Adam and Patrick about this.
For me the most important conversation at the conference was around sustainability. We are seeing more research-oriented repositories and Virtual Research Environments like Alveo, and it’s not always clear how these are to be maintained and sustained.
Way back, when OR was mainly about Institutional Publications Repositories (simply called Institutional Repositories, or IRs) we didn’t worry so much about this; the IR typically lived in The Library, the IR was full of documents and The Library already had a mission to keep documents. Therefore the Library can look after the IR. Simple.
But as we move into a world of data repository services there are new challenges:
-
Data collections are usually bigger than PDF files, many orders of magnitude bigger in fact making it much more of an issue to say “we’ll commit to maintaining this ever-growing pile of data”:
-
“There’s no I in data repostory (sic)” – i.e. many data repositories are cross-institutional which means that there is no single institution to sustain a repository and collaboration agreements are needed. This is much, much more complicated that a single library saying “We’ll look after that”.
And as noted above, there are commercial entities like Figshare and Digital Science realizing that they can place themselves right in the centre of this new data-economy. I assume they’re thinking about how to make their paid services an indispensible part of doing research, in the way that journal subscriptions and citation metrics services are, never mind the conflict of interest inherent in the same organization running both.
Some libraries are stepping up and offering data services, for example, work between large US libraries.
The developer challenge
This year we had a decent range of entries for the dev challenge, after a fair bit of tweeting and some friendly matchmaking by yours truly. This is the third time we’ve run the thing a clearly articulated set of values about what we’re trying to achieve .
All the entrants are listed here, with the winners noted in-line. I won’t repeat them all here, but wanted to comment on a couple.
The people’s choice winner was a collaboration between a person with an idea, Kara Van Malssen from AV Preserve in NY, and a developer from the University of Queensland, Cameron Green, to build a tool to check up on the (surprisingly) varied results given by video characterization software . This team personified the goals of the challenge, creating a new network, while scratching an itch, and impressing the conference-goers who gathered with beer and cider to watch the spectacle of ten five-minute pitches.
My personal favorite came from an idea that I pitched (see the ideas page ) was the Fill My List framework, which is a start on the idea of a ‘ Universal Linked Data metadata lookup/autocomplete ’. We’re actually picking up this code and using it at UWS. So while the goal of the challenge is not to get free software development for the organizers that happened in this case (yes, this conflict of interest was declared at the judging table). Again this was a cross-institutional team (some of whom had worked together and some of whom had not). It was nice that two of the participants, Claire Knowles of Edinburgh and Kim Shepard of Auckland Uni were able to attend a later event on my trip at a hackfest in Edinburgh . There’s a github page with links to demos .
But, there’s a problem. The challenge seems to be increasingly hard work to run, with fewer entries arising spontaneously at recent events. I talked this over with members of the committee and others. There seem to be a range of factors:
-
The conference may just be more interesting to a developer audience than it used to be. Earlier iterations had a lot more content in the main sessions about ‘what is a(n) (institutional) repository’ and ‘how do I promote my repository and recruit content’ whereas now we see quite detailed technical stuff more often.
-
Developers are often heavily involved in the pre-conference workshops leaving no time to attend a hack day to kick of the conference.
-
Travel budgets are tighter so if developers do end up being the ones sent they’re expected to pay attention and take notes.
I’m going to be a lot less involved in the OR committee etc next
year, as I will be focusing on helping out with
Digital
Humanities 2015
at UWS. I’m looking forward to seeing what
happens next in the evolution of the developer stream at the OR
conference. At least it’s not a clash.
The Open Repositories Conference (OR2015) will take place in Indianapolis, Indiana, USA at the Hyatt Regency from June 8-11, 2015. The conference is being jointly hosted by Indiana University Libraries , University of Illinois Urbana-Champaign Library , and Virginia Tech University Libraries .
References
Field, A., and McSweeney, P. (2014). Micro data repositories: increasing the value of research on the web. http://eprints.soton.ac.uk/364266/.
McKiernan, E. (2014). Culture change in academia: Making sharing the new norm. http://figshare.com/articles/Culture_change_in_academia_Making_sharing_the_new_norm_/1053008.
Mapping for Humanities Researchers
Do you have data that needs to be displayed on a map?
The eResearch team at the University of Western Sydney and its partner, Intersect, are flying in experts from Melbourne University to run a series of workshops designed for postgraduate students and academics.
This event will be a series of short workshops on how to turn cultural and communications research into physical and interactive maps using CartoDB and TileMill.
What will I learn?
Participants will learn all the skills to make a beautiful map; from making their data geospatially compliant through to using a cartography formatting language to tell a story with the map.
Each participant will walk away with the ability to produce beautiful visual maps for their research papers, for their presentations, and even publishing interactive maps on their own website.
Who should come?
The first three workshops will be aimed at postgraduate students and academics (we estimate about up to 20 people). NB: We welcome anyone from NSW: industry, other Universities and anyone interested in learning more about these mapping tools. The final workshop is aimed at technical support people and trainers.
When and how will these workshops happen?
There will be four workshop sessions (3 hours each) plus an optional session to teach participants how to format their data so it can be wrangled into a map and then stylised to highlight the map with various cartography techniques, so as to help ‘tell the story’ of the research and its data.
About the Trainers
Steve Bennett has extensive experience providing tools and training to researchers in both government and academia. Prior to joining the University of Melbourne’s ITS Research Services team, he led data management projects at VeRSI (now V3 Alliance), working with researchers from a wide range of disciplines. An open data enthusiast, Steve has contributed extensively to projects such as Open Street Map and Wikipedia and he is the driving force behind Melbourne’s DataHack meetup group. He has run mapping workshops for The University of Melbourne and Deakin University and his mapping projects have featured on the ABC and The Age. Steve believes that everyone needs maps.
Fiona Tweedie was until recently a research and policy officer for the Australian Charities and Not-for-profits Commission, before joining the University of Melbourne’s ITS Research Services group. As a research community manager for the humanities and social sciences, she is helping to build communities of researchers around tools including Tilemill and CartoDB. With a PhD in Roman history, she knows first hand the need for researchers to produce maps for themselves, and has created her own maps showing patterns of Roman colonisation. She is also an ambassador for the Open Knowledge Foundation, leading the organisation of the Victorian branch of GovHack 2014, a nationwide open data hackfest taking place in July.
When
Monday 21 July Workshops 1 & 2
Tuesday 22 July Workshops 3 & 4
Where
All workshops will be held at UWS’s Parramatta South Campus. Building EB, Level 3, Room 36
Workshop details
Workshop 1: Monday 21 July, 9.30am-12.30pm
CartoDB (visualisation of data on a map, useful to many researchers)
Workshop 2: Monday 21 July, 1.30pm-4.30pm
Introduction to TileMill (basic cartography)
Workshop 3: Tuesday 22 July, 9.30am-12.30pm
Advanced TileMill (working with data to create a complete custom basemap)
Workshop 4: Tuesday 22 July, 1.30pm-4.30pm
Building TileMill servers and technical briefings.
Cost
Free for attendees
RSVP
Friday 11 July to: http://bit.ly/1nIYnD7
Alveo Launch
Over the last eighteen months the Alveo (formerly Human Communication Science Virtual Laboratory) team have been building a virtual laboratory for Human Communications Science. The lab is being launched Tuesday 1 July 2014, 4:00 pm – 5:00 pm by Professor Mary O’Kane, NSW Chief Scientist & Engineer, and Professor Scott Holmes, DVC R&D, University of Western Sydney.
There are also two training/development events:
Monday 30 June: Alveo HackFest, for developers, programmers and testers (9:30 am – 6:00 pm)
Tuesday 1 July: The first Alveo Users Workshop, for researchers and end-users (9:30 am – 4:00 pm)
What’s Alveo?
Alveo provides on-line infrastructure for accessing human communication data sets (speech, texts, music, video, etc.) and for using specialised tools for searching, analysing and annotating that data.
- Data Discovery Interface: Browse and search collections, view documents and create lists of items for further analysis. The Data Discovery Interface provides the jumping-off point for further analysis using the Galaxy Workflow Engine, the NeCTAR Research Cloud, the R statistical package or any other preferred tool or platform. A fully featured API underpins the Data Discovery Interface, providing opportunities to extend the functionality of the Virtual Laboratory.
- Galaxy Workflow Engine: Initially targeted at genomics researchers, Galaxy is a scientific workflow system which is largely domain agnostic. The Galaxy Workflow Engine provides Alveo users with a user-friendly interface to run a range of text, audio and video analysis tools. Workflows defining a sequence of steps in an analysis can be created and then shared with other researchers.
Who should attend?
The two days of events will be of interest to researchers working in Human Communications Science; speech technology, computer science, language technology, behavioural science, linguistics, music science, phonetics, phonology, sonics, and acoustics or related fields, as well as computer scientists and eResearch staff who support them.
Which day to attend?
The second day Tuesday July 1st will be a gentle introduction to the lab, and would be suitable for any researcher who wants to learn about a new approach to research, involving:
Finding data from the data collections already in the lab, and running it through the existing lab tools for textual and audio analysis
Running repeatable workflows on data both from the lab and elsewhere, via the Galaxy workflow engine
The first day, June 30th will be a hands-on hackfest experience, where we will assist participants in forming teams to explore the potential of the lab. The aim is to team-up programmers and other techies with researchers to introduce them to the potential of the lab:
Tackle some traceable problems, such as generating a word-cloud from a large defined set to learn the lab’s interface (API)
Explore the process of importing new stand-alone tools into the lab
Get some advice or make a start on a research project that might use one of the data
Talk to the vLab team about importing new datasets (corpora)
Staff from UWS eResearch and Intersect Australia will be on hand to assist researchers and tech staff in interacting with the lab. If you are an adventurous researcher please consider attending, even if you don’t have the tech skills to deal with scripting, and APIs, and so on we will team you up with people who do, and who can help you approach your research problems.
Professor Denis Burnham The MARCS Institute Director, Alveo Project Director, cordially invites you to the:
Launch and Reception for Alveo, the multi-institutional virtual laboratory for Human Communication Science
Formal launch Professor Mary O’Kane, NSW Chief Scientist & Engineer, and Professor Scott Holmes, DVC R&D, University of Western Sydney
Tuesday 1 July 2014, 4:00 pm – 5:00 pm Followed by drinks
Venue Female Orphan School (Building EZ) University of Western Sydney, Parramatta South campus
RSVP Wednesday 11 June 2014 Dr Dominique Estival, Alveo Project Manager (02) 9772 6596 or d.estival@uws.edu.au
Join the Research Bazaar
This event is part of the #ResBaz movement. Born out of the University of Melbourne, the Research Bazaar is a campaign to empower researchers in the use of the University’s core IT services:
Empowering researchers to collaborate with one another through the use of research apps on cloud services
Empowering researchers to share the data with trusted partners via data services
Empowering researchers to establish their reputation through parallel computing and supercomputing services
Empowering researchers to invent new ways of experimenting through emerging technology services
Visit the Research Bazaar tumblr to learn more about the #ResBaz mission and conference in 2015.
eResearch for UWS Future Research Leaders
Here are some notes for a presentation by members of the eResearch team to the University of Western Sydney Future Research Leaders Program session on Thursday June 6 th 2014.
With only a ten minute slot in which to present, we decided to keep the presentation at a very high level, and what better way to that by tying it to the eResearch ‘vision statement’.
Support the objectives of the UWS research plan by creating an eResearch Ready UWS , where information and communications technologies support the collaborative conduct of high-impact, high-integrity research with minimal geographical and organisational constraints. eResearch will assist in the transition to a research culture where IT and communications technologies are integral to all research, from the fundamental underpinnings of data acquisition and creation, management and archiving, to analytical and methodological processes. Our aim is to work with stakeholders within and beyond the university to ensure UWS researchers have the information and communications technology resources, infrastructure, support and skills required, wherever they are on the path to an eResearch ready UWS.
The eResearch vision has three clauses. Lets go through them one by one.
Q. Why are we here? A. Impact & Integrity
|
Research IntegrityUWS website says:
|
Advisors…
… and you can talk to the eResearch team about data management and planning the ‘e’ part of your research , to maximize the integrity of your research. |
Research ImpactThe library has a page on measuring research impact . See also Alt Metrics as new ways of measuring impact. But:
|
Research Impact: new ways of working
|
Doing eResearch will help with impact via publishing and reuse of data and enabling new modes of research that increase its reach and effectiveness, for example the UWS-led $3M Alveo project .
Q. Why are we here? A. Training and organizational development
|
We’re building capability by grass-roots training and engagement
|
UWS eResearch is trying-out the Research Bazaar approach created by David Flanders and Steve Manos at Melbourne uni . The Melbourne approach is to enlist HDR workers to up-skill research groups from the inside out.
Q. Why are we here? A. To help set the agenda for our IT department
|
We work with ITS, and our eResearch partner Intersect to make sure that the right services are on offer.
Number one take-away from today! To get help or adviceGo to MyIT: http://MyIT.uws.edu.au Click on:
Then click on:
And scroll down to eResearch. Or email eresearch@uws.edu.au |
|
eResearch for UWS Future Research Leaders by Peter Sefton is licensed under a Creative Commons Attribution 4.0 International License.
Touring eResearch @ Western Sydney with Barney Glover
Touring eResearch @ Western Sydney with Barney Glover by Peter Sefton and Andrew Leahy is licensed under a Creative Commons Attribution 4.0 International License.
On Monday March 17 th , the eResearch team at UWS hosted a visit from Vice-Chancellor Barney Glover. Barney came to see The Wonderama Project Lab and we took advantage of the precious VC-time to tell him about a few eResearch highlights. As well as Barney we had Andrew Leahy, Peter Sefton, associate PVCR Deborah Sweeney and Peter Bugeia from Intersect, our eResearch partner. Intersect helped use to build much of what we talked about, and continue to assist UWS in driving eResearch uptake.
Wonderama is an interactive multi-screen visual experience. The lab’s primary goal is to push the boundaries of what’s possible with common visualization applications and APIs (programming interfaces) using off-the-shelf hardware and to have fun along the way. Wonderama is used to host visits by school groups and the occasional corporate gig (thanks, Google and Powerhouse Museum). To this end, eResearch through Andrew Leahy, enlists final year B.CompSci, B.ICT and B.Music students to work on projects involving applications such as Google Earth, Second Life, Microsoft World Wide Telescope, and inexpensive controller hardware like hand-held tablets, the Microsoft Kinect and Leap Motion.
We structured the visit around a virtual tour of eResearch at UWS, showing off the display and visualization tech the Lab has built-up and, um, borrowed, at the Kingswood campus through scrounging, being in the right place in the right time to pick up Google hand-me-down equipment and applications to the School of Science Engineering and Mathematics (SCEM).
For the tour we used the Wonder Wall which is a 6m wide ultra-widescreen high-definition projection surface. We flew to each campus using a tweaked version of Google Earth that mimics atmospheric haze and time-of-day. Location-based pop ups were used as speaking points for each campus. The images below are screenshots from the presentation.
Penrith (Kingswood) – Wonderama
Our first stop was Kingswood, where we looked at Wonderama Lab itself. Andrew spoke about working with undergraduates, and how the transportable immersive Wonderama rigs have been fantastic for outreach and engagement with a wide range and number of different audiences.
Penrith (Werrington South) – the Research Data Repository
Werrington South is the home of the UWS Library, which looks after the Research data repository. This infrastructure was partially funded by the Australian National Data Service. We also talked briefly about some of the High Performance Computing used by the Institute for Infrastructure Engineering (IIE) with assistance from eResearch and Intersect.
This repository is one part of the data management fabric at UWS; its function is Archiving and Advertising data, other systems look after Acquiring data and providing a platform for researchers to Act on that data.
Bankstown
Bankstown is home of the MARCS institute, where Prof Denis Burnham leads the $3M Human Communications Science Virtual Laboratory.
The lab brings together a growing range of data sets related to human communications, including speech, text and music, in a variety of formats, for use by a huge range of researchers across many disciplines.
Hawkesbury
Flying back to Hawkesbury, we come to the place where eResearch engagement with research has been the deepest, thanks to the Australian National Data Service (ANDS). Here we have the Hawkesbury Institute for the Environment and their HIEv research data system, built for UWS by Intersect. This system started life as ANDS Data Capture project number 21 but we now like to think of it as the Acquire and Act front-end to the AAAA data management picture. Once the data’s been captured and the researchers have done research with it the UWS Research Data repository, and potentially discipline specific repositories take care of Archiving and Advertising for re-use.
Parramatta
Final stop on this tour was Parramatta, with the Sydney CBD peeking out behind the ‘slide’. At Parramatta one of our most exciting engagement is with the newly minted Digital Humanities Research Group , led by Prof Paul Arthur. This group will eventually collaborate with every institute and school at UWS in one way or another. At the moment we’re working on developing a couple of new projects with the DH group, one of the things we want to be able to do is to locate archival and current data in time and space, particularly in Western Sydney.
We have been talking to new Research Fellow from the Institute for Culture and Society, Sarah Barns about projects involving geo-temporally located historical imagery, below is a screenshot from Cities In Time, one of Sarah’s previous projects.
A look at the (potential) new airport
From Parramatta on the projected screen we switched to the Liquid Galaxy rig, where Andrew has a portable display made up of seven screens each driven by its own rack-mounted PC.
We went to have a look at the proposed new Sydney airport site at Badgerys Creek. Andrew had loaded one of the many plans for the airport into Google Earth and we were able to fly over it. Sure, you could do this in a browser on your laptop, but the wrap-around display gives a much better sense of place. This would be an ideal rig for hosting planning and strategy meetings, particularly when we get more data loaded; 3d models of various proposals, transport corridors, public health information, real estate price data, historical imagery (such as Not In My Backyard Airport protests) and so on. Here Vice-Chancellor Barney Glover and Associate PVCR Prof Deborah Sweeney discuss the shape of the airport. The effect of Google Earth, with the mocked-up image and the Wonderama display is rather like being in the observation deck on very stable blimp.
Historical images in modern context
We finished up with another visualization mock-up, this time using Google Street View (via seven coordinated web browser sessions all with a slightly different perspective) to locate historical images from the UWS Archives in a modern context. Here’s Barney checking out some pictures of the old Hawkesbury Agricultural College. We particularly like the one of the women, apparently from Sydney Uni cutting the grass at the college, some time in the 1920s. With Scythes no less! We’ll have to wait and see if this sparks any new collaboration with that particular institution.
Hey, does this Data taste funny?
Hey, does this Data taste funny? by Andrew Leahy is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
We were slotted after Janette’s talk about Ethics approvals which was all about understanding and managing risk, this made an easy segue into risk about data.
At which point a USB key – figuratively loaded with 3 years of research data an almost completed thesis and spiked with a small amount of potassium permanganate – was unceremoniously dropped into a beer glass… ooooopppps!
So, Data Management. We know it’s deadly boring, but it’ll make you cry if you don’t get it right. Please think about it as you start planning your research.
The eResearch Data Management and Technology Planning page is a good place to start.
UWS students, refer to your green HDR handbook, page 47, and if you have any IT related questions please check the UWS MyITPortal.
Good Luck!
What’s in the CKAN?
What’s in the CKAN? by Peter Sefton and Kim Heckenberg, photos by Andrew Leahy is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
What’s in the CKAN?
On Tuesday the 4th March 2014, the extended UWS eResearch Team and our friend Gerry Devine the Data Manager at Hawkesbury Institute of the Environment (HIE) met on the UWS Hawkesbury campus to have the first of a planned series of ‘Tool Day’ exploration and evaluation sessions.
These days are an opportunity to explore various eResearch applications, ideas and strategies that may directly benefit UWS researchers during the research life cycle, this particular day was looking at a back-end eResearch infrastructure tool, but we will also be running researcher-focussed workshops and training sessions, using the Research Bazaar (#resbaz) methodology being developed by Steve Manos, David Flanders and team at the University of Melbourne.
The first application on the list was CKAN, which is the acronym for Comprehensive Knowledge Archive Network and is an open-source;
data management system that makes data accessible – by providing tools to streamline publishing, sharing, finding and using data. CKAN is aimed at data publishers (national and regional governments, companies and organizations) wanting to make their data open and available. See more at: http://ckan.org/#sthash.qMFkyVG8.dpuf
We are interested in the potential for CKAN as a data capture and working-data repository solution. In terms of the AAAA data management model we’re developing at UWS, that covers the first two A’s:
- Acquiring data – CKAN can accept data both from web-uploads and via an API.
- Acting on data – CKAN has a discovery interface for finding data sets, simple access control via group-permissions and ways to deal with tabular, spreadsheet-ish data online. It looks like a reasonable general-purpose place to put all kinds of data but particularly CSV-type stuff such as time-series data sets, which CKAN can preview, and plot/graph.
- Archiving data, archiving at UWS is expected to be handled by the institutional Research Data Repository (RDR) or a discipline specific repository, so we’re looking at how CKAN can be used to identify and describe data sets and post them to an appropriate archival repository.
- Advertising data. The default for disseminating research data in Australia is to make sure data collection descriptions are fed to Research Data Australia, along with making sure that any relevant discipline specific discovery services are aware of the data too.
Joss Winn at Lincoln in the UK has explored CKAN for research data management. He says:
Before I go into more detail about why we think CKAN is suitable for academia, here are some of the feature highlights that we like:
- Data entry via web UI, APIs or spreadsheet import
- versioned metadata
- configurable user roles and permissions
- data previewing/visualisation
- user extensible metadata fields
- a license picker
- quality assurance indicator
- organisations, tags, collections, groups
- unique IDs and cool URIs
- comprehensive search features
- geospacial features
- social: comments, feeds, notifications, sharing, following, activity streams
- data visualisation (tables, graphs, maps, images)
- datastore (‘dynamic data’) + file store + catalogue
- extensible through over 60 extensions and a rich API for all core features
- can harvest metadata and is harvestable, too
You can take a tour or demo CKAN to get a better idea of its current features. The demo site is running the new/next UI design, too, which looks great.
To start exploring the basic I/O capabilities of the CKAN application, the team separated into groups to perform various tasks. Andrew/Alf’s job was to build an instance of the CKAN environment on a UWS virtual machine running CentOS. The task involved chasing-down a current installation guide that actually works. This proved challenging as the documentation regarding CentOS was six months old. Andrew achieved his mission, and claims to have learned something.
Peter B and Gerry were tasked with uploading data through the CKAN API; we (naively) thought that we might be able to write a quick script to suck data out of HIEv, the working-data repository for Gerry’s institute and push it to the test CKAN instance that Intersect have set up as part of the Research Data Storage Initiative (RDSI). Initial progress was promising, and Gerry and Peter managed to create data sets in CKAN, but getting a file, any file, uploaded into a data set proved beyond us on the day.
Lloyd and Graham explored the PHP CKAN API library, which is four-years since its last update and not very complete. The library came complete with a hard-coded URL for a CKAN site (what that means is that it was set up to always talk to the same CKAN server, normally an API library would take the server as an argument). Lloyd had fixed that and will offer it back to the developer, if we get a chance to test it. At the moment, though, we don’t have much confidence in that code.
(By the following evening we had sorted out the API problems which seemed to be as simple as us trying to use the latest API library against a not-so-new server, and Gerry was able to upload data files to data sets.)
Open Questions about CKAN:
- Are the good ways to package multiple DataSets together for deposit as a data collection?
- How can we follow linked-data principles and avoid using strings to
describe things? We’d really like to be able to link data sets to
their research context, as discussed
on PT’s blog:
Turns out Gerry has been working describing the research context for his domain, the Hawkesbury Institute for the Environment. Gerry has a draft web site which describes the research context in some detail – all the background you’d like to have to make sense of a data file full of sensor data about life in whole tree chamber number four. It would be great if we could get the metadata in systems in HIEv pointing to this kind of online resource with statements like this:
<this-file> generatedByhttps://sites.google.com/site/hievuws/facilities/eucface
A couple of CKAN annoyances:
- It’s not great that the API talks about “Packages” while the user interface says “Data Sets”.
- Installation is a bit of a chore, as Andrew puts it, it’s “scary”; you follow a long set of steps and only at the end find out whether it works. The Ubuntu installation is a little bit more structured, but still, some way-points would be good.
- It seems odd that the default installation does not include a data store, so by default it is only a catalogue, this tripped us up when trying to use the API.
This was our first try at an eResearch Tools Day, here are some note for ourselves:
- While going out lunch at the Richmond Club was quintessentially Western Sydney and quite pleasant, it is probably better to eat on-site and not break the flow by all jumping in the eResearch van. Pizza, delivered next time.
- We do want to invite other eResearch types and where appropriate some researchers to some of these days, but want the first few to be with people we know well so we can refine the format. (As noted above these are technically focussed days for technical people, all about learning basic infrastructure, not about research questions, there will be other venues for researcher collaboration).
- It should not take ten days for us to blog about an event – next time we’ll appoint a communications officer.
Linux Desktops for Research Software
UWS eResearch at the DVC’s strategy day
This is a presentation for the UWS Deputy Vice Chancellor’s Strategy day, convened by Andrew Cheetham, but since it’s basically a “meet the team here’s what we do” kind of presentation I thought I’d make it into a blog post as well.
This post also takes the opportunity to try-out the work done for UWS eResearch by a group of Professional Experience students in third year computer science, on an inline web-based slide show viewer [Update 2014-01-06: Note, this code is hosted on github and the students chose a license for the code which contains a rude word, which raised some eyebrows amongst colleagues here at UWS. If you don’t like to look at rude words don’t visit this link, but if you do want to use the code on your own prohect then you will have to read the license rude word and all.] .
UWS eResearch
We’re here to collaborate. Read more about us …
NotesConsider this slide going up a soft-launch of the new UWS eResearch web site . |
eResearch missionThe eResearch mission is to:
NotesWe do this via:
|
Governance & context
|
Desire paths and goat tracks
Notes
A big part of our job involves working out new routes to
Over the past two years the eResearch unit at UWS has established the basis for an eResearch-ready research community. A small group has been established with skills in:
|
Ancient history: 2012“First steps in establishing data management”
|
The Future: 2014
Key Challenges:
|
Case Study: The HCS vLab drag and drop research!
NotesThis project is worth around $3M with $1.3 coming from the government, USW-MARCS leads a multi-institutional team in an effort to conenct data, tools and users in a large scale joining up of a variety of disciplines under the banner “Human Communication Science” The two slides here show how a researcher can drag and drop analytical tools to run analysis on audio data. This is one example – the lab contains a huge variety of text, audio and video data and a growing number of tools that can be run on the data. |
HCS vLab
|
Case-study AAAA data management at HIE
NotesRead more about AAAA data management in this presentation from Open Repositories 2013 |
UWS eResearch at the DVC’s strategy day by Peter Sefton is licensed under a Creative Commons Attribution 4.0 International License.