Mapping for Humanities Researchers

Do you have data that needs to be displayed on a map?

The eResearch team at the University of Western Sydney and its partner, Intersect, are flying in experts from Melbourne University to run a series of workshops designed for postgraduate students and academics.

This event will be a series of short workshops on how to turn cultural and communications research into physical and interactive maps using CartoDB and TileMill.

What will I learn?

Participants will learn all the skills to make a beautiful map; from making their data geospatially compliant through to using a cartography formatting language to tell a story with the map.

Each participant will walk away with the ability to produce beautiful visual maps for their research papers, for their presentations, and even publishing interactive maps on their own website.

Who should come?

The first three workshops will be aimed at postgraduate students and academics (we estimate about up to 20 people). NB: We welcome anyone from NSW: industry, other Universities and anyone interested in learning more about these mapping tools. The final workshop is aimed at technical support people and trainers.

When and how will these workshops happen?

There will be four workshop sessions (3 hours each) plus an optional session to teach participants how to format their data so it can be wrangled into a map and then stylised to highlight the map with various cartography techniques, so as to help ‘tell the story’ of the research and its data.

About the Trainers

Steve Bennett has extensive experience providing tools and training to researchers in both government and academia. Prior to joining the University of Melbourne’s ITS Research Services team, he led data management projects at VeRSI (now V3 Alliance), working with researchers from a wide range of disciplines. An open data enthusiast, Steve has contributed extensively to projects such as Open Street Map and Wikipedia and he is the driving force behind Melbourne’s DataHack meetup group. He has run mapping workshops for The University of Melbourne and Deakin University and his mapping projects have featured on the ABC and The Age. Steve believes that everyone needs maps.

Fiona Tweedie was until recently a research and policy officer for the Australian Charities and Not-for-profits Commission, before joining the University of Melbourne’s ITS Research Services group. As a research community manager for the humanities and social sciences, she is helping to build communities of researchers around tools including Tilemill and CartoDB. With a PhD in Roman history, she knows first hand the need for researchers to produce maps for themselves, and has created her own maps showing patterns of Roman colonisation. She is also an ambassador for the Open Knowledge Foundation, leading the organisation of the Victorian branch of GovHack 2014, a nationwide open data hackfest taking place in July.

When

Monday 21 July Workshops 1 & 2

Tuesday 22 July Workshops 3 & 4

Where

All workshops will be held at UWS’s Parramatta South Campus. Building EB, Level 3, Room 36

Workshop details

Workshop 1: Monday 21 July, 9.30am-12.30pm
CartoDB (visualisation of data on a map, useful to many researchers)

Workshop 2: Monday 21 July, 1.30pm-4.30pm
Introduction to TileMill (basic cartography)

Workshop 3: Tuesday 22 July, 9.30am-12.30pm
Advanced TileMill (working with data to create a complete custom basemap)

Workshop 4: Tuesday 22 July, 1.30pm-4.30pm
Building TileMill servers and technical briefings.

Cost

Free for attendees

RSVP

Friday 11 July to: http://bit.ly/1nIYnD7

Research Data Repository Services Delivered in Stage One

The Research Data Repository over the last year has delivered an impressive array of infrastructure and services.

Services that exist now

Service How the service is delivered
  • A researcher can request a Research Shared Drive up to 1 TB with multiple users and access anywhere on UWS campus. FAQ is online.
  • The request can originate from the researcher or from eResearch, and then the ITS team provision the share in accordance with the Support plan.
  • The request form is online.
  • A researcher can back up their git repository onto Research Data Store.
  • The service is delivered ad hoc by the eResearch team.
  • A researcher can request a virtual machine.
  • The request can originate from the researcher or from eResearch, and then the ITS team provision the virtual machine in accordance with the relevant SOP.
  • A researcher can deposit their research data in the Research Data Catalogue.
  • There are two ways to initiate the request, by Self-service by using an online form.
  • Or, in discussions with eResearch and the Library Research Services team.
  • Once initiated, the Research Services Coordinator – Library follows Library procedure in creating a new collection record and storing the data collection (as applicable).
  • Library systems can harvest metadata from UWS and web sources of truth, on a regular basis.
  • This metadata is stored in the Research Data Catalogue and provides lookup for applications like ReDBox and HIEv.
  • The service is delivered in accordance with Library procedures.
  • This is self-service by obtaining a copy of the checklist online, with support from the eResearch team as needed.
  • This is self-service by obtaining a copy of the checklist online, with support from the eResearch team as needed. The eResearch team can and do occasionally write Data Management Plans on behalf of the researchers, using the same template.
  • This is self-service via reading website content and following links for more information with assistance from the eResearch team.
 

External services that we are supporting

Service How the service is delivered
  • A researcher can obtain a NeCTAR virtual machine up to 2 cores at a time for up to 3 months (eResearch can assist with access and set up).
  • A researcher can apply for a medium and large (high intensity) virtual machines from NeCTAR.
  • A researcher can get a Cloudstor+ account through AARNET, this is cloud storage for research, located within Australia (eResearch are actively promoting this service and seeking user evaluations of it.
  • A ReDBox administrator can initiate a bug fix or issue with QCIF for resolution.
  • QCIF provide support, with assistance from the eResearch team.
 

What infrastructure has been delivered

Infrastructure – Storage
  • 127 Tb of high quality disk for researchers and research related-uses has been deployed. This storage is highly flexible and extensible and can be utilised as SAN or NAS depending on the need. > Migration of all data from old 70 Tb SAN
  • Established new service, Research Shared Drive (SIF share) > New FAQ/README with instructions for install, and also best practices in data management > New support plan through close coordination with eResearch and ITS > 10 research teams are currently using the RDS.
  • Storage has been connected to a number of virtual machines for research specific projects and applications.
Collaborative Storage
  • Explored and trialled several collaborative storage solutions, including Oxygen Cloud, WOS cloud, SparkleShare, and OwnCloud.
  • Selected OwnCloud based on experience at other organisations (such as AARNET and Lincoln University in UK).
  • A trial was conducted whereby a link was made between Dropbox and the Research Shared Drive. The team set up a Dropbox account which can receive a copy of a researcher’s Dropbox, and store that data on the same Researcher’s Shared Drive. This system is still in development stages.
  • A trial was conducted whereby a link was made between Source Code Repositories (version control systems) and the Research Data Store. The link is demonstrated by a UWS git server which clones public access git repositories. By way of example, we cloned the eResearch-apps repository.
Up Next:
  • Trial a collaborative storage option based on OwnCloud.
  • Establish a mechanism by which a user pushes their git repository to UWS storage.
  • Serve the needs of researchers who use other version control systems such as Mercurial and SubVersion.
Infrastructure – Compute
  • 4 servers have been provisioned for research use, 2 existing from HIE, and 2 provided through RDR, this is the Research Cluster.
  • The Research Cluster comprises 160 processor cores and 1024 Gb of memory available.
  • 6 vm’s which had been created previously were successfully migrated onto the Research Cluster.
  • There are 9 virtual machines which have been created in the research cluster, with plans to migrate more virtual machines across from the School of Medicine and other schools and institutes.
  • We can provision up to approximately 40 ‘medium intensity’ virtual machines.
Up Next:
  • Create canned virtual machines which comes ready-ready with tools needed to analyse data.
Infrastructure – Software
  • New packaging software was developed for research data, called CrateIt (Cr8it). Cr8it was started under two different approaches. The first approach was to leverage a toolset called The Fascinator, and the other approach was to incorporate new features into OwnCloud.
  • Document conversion, such as ePub generation, was ported into OwnCloud-Cr8it.
  • An automatic generation of a combined metadata catalogue record plus manifest was started. The manifest will be human and machine readable, leveraging work done by the HIEv (DC21) project.
Up Next:
  • Create a Cr8it trial and roll it out.
  • Flesh out what metadata record needs to be created by the Cr8it packaging process.
Research Data Catalogue
  • A simple form was developed that a researcher can use to indicate that they have a data set they would like to archive.
  • A pro forma questionnaire has been developed by the Research Services team at the Library. A process for including a new data set was also developed by the Library Research team.
  • 3 new procedure documents were created which formalised the ingest of metadata from RHESYS (University Research Management System) and from external sources, such as ReDBox wiki, NHMRC and ARC. Approximately 1,500 researchers and 500 projects are in the Research Data Catalogue available via lookup when a new data collection record is created.
  • New Research Data Catalogue entries (30+) were added to Research Data Australia, searchable by anyone with web access.
  • The ReDBox application was set up so that people who create data sets at UWS also have their unique details merged with an existing (or newly created) record in the National Library of Australia database, which is linked to any other data sets or publications which they have created in the same field or under the same name.
  • A new feature in ReDBox was added whereby an administrator can view the results of ingesting records about people and research projects. These results are presented in the form of ingest reports, describing what was ingested, modified, or removed, to support Quality Assurance going forward.
  • A ReDBox support agreement was negotiated with QCIF, which provides bug fixes and technical support until December 2014.
  • A new wizard for creating a data management plan inside the data catalogue is currently being trialled. The idea is that any data management plan which is created will be stored in the catalogue along with the data, and can be exported as a pdf if needed.
Services – Research Data Management
  • A new Data Management Plan Checklist was created.
  • A new Data Management Plan Template was created.
  • Additional page was added to the Office of Research Services pages, which included: > Data Management defined, > Data Management best practices, > Links to RDR services, > Links to external services and more information as applicable, and > Standard pro forma language that researchers can use to complete their research application forms.
  • Internal application forms were improved to ask researchers to explain how data management will be addressed, including: > Internal grant application for UWS funded research, and, > Application form to start new external grant application through ORS.
  • eResearch interviewed researchers with live projects and created 3 Data Management Plans using the Data Management Plan Template, plans which have been provided to the researchers.
  • eResearch interviewed managers of research facilities and drafted 4 Data Management Plans thus far, which have been provided to the facility managers.
Up Next:
  • Finalise Data Management Plans for our research facilities. In addition eResearch is currently assisting with new shared drives for these facilities (this is really BAU but is within the scope of the project).
  • Deposit the Data Management Plans in the Research Data Catalogue.
 

Examples of the growing eResearch Infrastructure at UWS


This work by Toby O’Hara, Peter Bugeia & Peter Sefton is licensed under a Creative Commons Attribution 3.0 Unported License.

In this post, we list some of the basic infrastructure for eResearch that are already set up and ongoing and the University of Western Sydney. You’ll notice a few themes, such as use of virtual computing environments, use of shared storage including the nascent UWS Research Data Repository (RDR), and consultation and coordination. Some of these are stop-gap solutions until the RDR and Research Computing Environment becomes fully operational as part of the continuing RDR project.

These examples from a number of disciplines in across our flagship institutes and schools all show the importance of having dedicated research computing support and consulting and hardware infrastructure – none of the research teams we talk about below would be able to perform their research without services that go well beyond the standard ITS offerings available for administrative computing at UWS.

Centre for Complementary Medicine Research (CompleMed)

This group had started collecting data about plant samples, and needed a way to store their data in a central location which could be shared and reused amongst the team and with other researchers. eResearch were able to set up a virtual server with attached storage from the Research Data Store and install an FTP service on the server so that at any time a person with a login could download raw data, and upload analysed data. The system was set up so that authorised users are managed directly, and persons inside and outside the university could gain access. This is still going, and has the potential to grow. There is interest from other research teams as well. To facilitate continued growth within CompleMed, more storage will be needed and more services to allow global collaboration.

In the near future the FTP service will be seamlessly migrated to the Research Compute Environment, which is planned infrastructure designed for this purpose (and other needs like it. It is also a good candidate for local cloud storage, another component of the RDR, which is planned for implementation next year.

Centre for Positive Psychology in Education (CPPE)

The research involved CPPE creating, distributing, and marking of paper surveys which needed to be scanned and stored. The scans then needed to be ‘sent’ or ‘read’ by an image reader and then translated into data files that could subsequently be analysed. The team needed a central storage point where the scans could be automatically deposited. They also needed computing capability that could run the conversion software. The resulting analysable raw data would also be stored centrally, so that an authorised person could then access it. To meet all of these needs, a portion of the Research Data Store was allocated for storage of the scanned images, and the related data. A virtual server was also created, which then hosted the software to convert the scanned images into data sets. This virtual server is not yet part of the Research Compute Environment, and would be a good candidate. This set up is still going strong. eResearch are aware of additional tools that would make the conversion and sharing of data more seamless, and are working to make this possible. To be able to provide this would reduce cost and processing time to the researcher.

HIE – Hawkesbury Forest Experiment

The Whole Tree Chamber component of the Hawkesbury Forest Experiment collects data about carbon/ tree interactions. Sensor data measuring needs to be stored centrally where multiple members of the team, and researchers who are in the team but outside the university would be able to analyse the data and query the data for various research projects. Our implemented solution was to use a virtual server, install a Secure FTP application, with secure logins, and attached to the Research Data Store. When the Research Computing Environment is ready, this SFTP is planned to be migrated across. HIE is also going to be the beneficiary of a new data capture capability which is an integrated system for capturing and storing data together with its metadata, into an organised format for facilitated access and reuse. The application will be run from the Research Compute Environment, and utilise the Research Data Store.

The EucFACE is another tree data component of the Hawkesbury Forest Experiment. Intersect, the eResearch consortium in NSW, in consultation with UWS eResearch assisted in creating a new research data management plan, which contains direction and instruction regarding all aspects of managing the data including data capture, storage, documentation, retention, reuse, disposal and archiving.

This data management plan has many elements that can be used across HIE. The next step for us will be to collaboratively implement a cross-institute plan, as a guideline for data held in common as well as for individual research projects. It is a goal of eResearch to create a research data management plan for institutes and schools, tailored to their research methods and data.

HIE have been a strong supporter or eResearch, many of their researchers are enthusiastic about what we are working on, such as centralised collaborative data sharing, easy to access collaboration tools, and structured archive and reuse of data.

HIE – Genomic Life Sciences

This particular example is a large project involving multiple streams within the Genomic Life Science group. The stream we assisted with was the collection, storage and retrieval of genetic data. eResearch and ITS consulted with the research team and developed specifications for equipment which was then procured and implemented. The solution consisted of 2 large servers, networked together, with virtual workstations configured for retrieving and analysing data. There were also a number of hard disks procured, sufficiently large that it could retain the copious amounts of data being produced.

This is a good example of a group that would benefit from the Research Data Store and the Research Computing Environment. To have storage and servers for research, available upon request as a service to researchers would quickly and conveniently provide for projects such as this one.

This is also an example of the consultative nature of eResearch, and the crucial recommendations and advice for researchers who have requirements that can be met more effectively with the advances in technology of which they might not be aware.

Centre for Health Research

eResearch has worked with the Centre for Health Research on a number of projects. Some examples include:

  • Setting up an environment that can store and work with a large amount of data available as a Public Health Database to be used by several researchers on different projects. This included a central repository for the data, sufficiently large to retain the data and the working copies that were being used by the different teams. There was also the purchase and installation of several servers, which were then carved into workstations for retrieving and analysing the data. This is still ongoing. Last we heard, it was very popular, and we will be going back to them soon and devise the best approach to increase the capacity.

  • The Centre are also the custodians of another set of data, made available through the Department of Families, Housing, Community Services and Indigenous Affairs. This data is available to researchers inside and outside the School of Medicine. Very soon this data will be residing on the Research Data Store, with shared drive access, and with well defined access control and rules for use. eResearch has not only facilitated the technical solution, but also the development of a data management plan for this data. By consulting with the appointed data manager and other users of the data, the data management plan was created to clearly define how the data was to be stored, used, safeguarded, and governed.

CHR have been a strong supporter of eResearch as well, and are very interested in the possibilities in collaborative tools and sharing locally and globally.

The Australian Speech Corpus (AusTalk) or the Big ASC

AusTalk is a LEAF funded project, run by the MARCS Institute, intended to collect audio and video samples of people all over Australia speaking and responding to a series of standardised interview questions. There was a great deal of work around getting a kit with hardware and software, with contributions from the ITS group. eResearch also assisted the AusTalk project manager in creating a data management plan, to spell out the requirements of storing, securing, and accessing the data.

Nanoscale Organisation and Dynamics

This research within the Nanoscale Organisation and Dynamics group consists of collecting a large number of nano-scale images and storing them for comparison and analysis. eResearch and ITS determined that a large data storage was required, as well as virtual servers and workstations. The data has been added to the Research Data Store and first steps are implemented to set up the virtual servers and workstations, which were procured through the research funds.

Smaller requests for assistance

eResearch have also been able to facilitate a number of smaller requests for assistance from researchers. These include:

  • Archival storage

  • A persistent identifier for their data, to be used to reference the data

  • Advice for good data management best practices

  • Consultation in choosing the best technology solution