How eResearch-y are you?

How eResearch-y are you? An extremely serious quiz comprised of eight multiple-choice questions.
  1. Your latest published article has a graph in it. If the eResearch police asked you to reproduce the plot exactly using the original data you’d:
    1. Check out the code archived with the article, and re-run the make-file, which would not only re-generate the plot using Knitr, but the whole article, which would also be made available as an interactive website using Shiny with an option to re-run the models on data which is crowd-transcribed from the logs of 17th century slave ships.
    2. Redo the diagram in Excel, using the clearly set out method and supplemental material from the article.
    3. Find the data (by borrowing back last year’s laptop from a postgrad), then fiddle around with what you think is the right spreadsheet make something that looks pretty much like the one in the paper.
    4. Plot? What plot? And what was all that babble in option A?
  1. Turns out that some of the photos and recordings you made when documenting a research site contain images and sounds of a Yeti. If you can provide complete records of where and when you collected this data, you can collect a $1,000,000 prize from a cable TV station. Your next step is to:
    1. Provide the DOI to the dataset which you have archived in your institution’s data repository. The repository record with the data attached provides all the information required to support your claim.
    2. Scan the relevant pages from your field notebook and annotate these with supporting information specific to the Yeti sighting.
    3. Rummage around the office: you last saw that scrap of paper you scribbled on during the fieldwork with the pile on top of your filing cabinet.
    4. Quickly throw together some handwritten notes and scorch them with a candle so they look old. No actually, you couldn’t be bothered. Also you don’t believe in Yetis or Santa.
  2. You have so much data to analyse and your models are getting so complicated that your laptop is getting hot, so you:
    1. Use Docker to create a 128 Node compute cluster in the NeCTAR cloud, get some results, archive all the code, data and outputs with DOIs and go home early.
    2. Enrol in Intersect’s High Perfomance Computing (HPC) courses and learn how to run your job on shared infrastructure.
    3. Give it to one of the PhD students to sort out.
    4. We have you mixed up with someone else – your iPad never gets hot unless you watch too much YouTube in the sun.
  1. When archiving data you always:
    1. Take care to use standard file-formats that are easily machine readable, and make sure all code and as much provenance information as possible, are also archived.
    2. Fill in the metadata fields on the institutional data catalogue application as carefully as you can.
    3. Try to change the worksheet names on your Excel files from Sheet 1 to something more meaningful, if you get time.
    4. Use the shredder in the research office. It’s more fun than the old technique of scrunching up the envelope on which the data were written and trying to get it in the bin for a three-pointer.
  1. The best place to store research data during your project is:
    1. On a secure, backed-up, cloud storage server (with data held in an appropriate jurisdiction) which you can access from anywhere with an internet connection, and share with designated collaborators.
    2. On a secure, backed-up drive accessible only from your office.
    3. On a Dr Who USB stick.
    4. You delete your raw data after you’ve analysed it. Although, actually, sometimes raw data doesn’t agree with you; so you cook some up to better fit your conclusions.
 
  1. A data management plan is:
    1. An important tool which facilitates planning for the creation, storage, access and preservation of research data. Creating this at the start of a research project and referring to it as a living document informs the research workflow and specifies how data will be managed.
    2. Something to think about once you’ve collected some data.
    3. More paperwork to bog down the research process, like Ethics. Oh for the good old days when we used to be able to electrocute the students without filling out so many forms.
    4. Data management plan? I’m not even in management so don’t interupt me, I’m enjoying my holidays
  1. Collaborative research is:
    1. Enabled by eResearch technologies and supported by Open Access to published research data.
    2. Maximising the funding universities receive by sharing resources and equipment for a research project.
    3. Popping next door to ask a colleague a question.
    4. Not something you’re interested in. Your data will die with you.
  1. If you wanted to share your completed research dataset with others you would:
    1. Contact the Library or eResearch and discuss publishing the data and related methodology to the institutional data catalogue, which can then also be included in the Research Data Australia discovery portal. The data would be described using appropriate metadata, and linked to related collections, fields of research, people and facilities.
    2. Publish the data on your personal website and ask people to contact you via a hotmail address for more information.
    3. Email the file to colleagues you think would be interested
    4. You told us before – your data will die with you.
Your score:

Mostly As – We’d love to talk to you about becoming an eResearch champion. You have embraced the benefits of eResearch technology and methodology and have put comprehensive plans in place for the use and re-use of your valuable data.

Mostly Bs – You understand that technology is a useful tool but you’re hesitant to rely on it for your research. Try putting aside your trust issues and play around with one new tool or habit this week – it might spark an idea or save you valuable time. There are lots of opportunities to attend training or do a self-paced online course to increase your comfort level.

Mostly Cs – It might be time to chat to the eResearch team about joining the 21st century. Although your existing research process may be valid, eResearch boosts the research process through opportunities to add computing power, streamline workflows, and collaborate with like-minded researchers from around the globe.

Mostly Ds – Bah, humbug.

Creative Commons License
How eResearch-y are you? An extremely serious quiz comprised of eight multiple-choice questions. by Peter Sefton & Katrina Trewin is licensed under a Creative Commons Attribution 4.0 International License.

Thanks Kim Heckenberg for your input and sorry Alf, we didn’t put in anything about multi-screen immersive visualization.

Is Omeka aDORAble?

So, we have been asking looking at a few different software packages, and putting them through their paces at a series of Tuesday ‘tools days’ hosted by UWS eResearch, asking “Is this software going to be one of our supported Working Data Repositories for researcher cohorts?” That is, how does it rate as a DORA, a Digital Object Repository for Academe?

Last month we had our biggest ever tools-day event, with external people joining the usual eResearch suspects. Thanks to Jacqueline Spedding from the Dictionary of Sydney, Michael Lynch & Sharyn Wise from UTS and Cindy Wong and Jake Farrell from Intersect for coming along.

Omeka is a lightweight digital repository / website building solution, originally targeting the Galleries, Archives & Museums space.

TL;DR

So what were we wanting to know about Omeka? The external folks came along for a variety of reasons but at UWS we wanted to know the following (with short answers, so you don’t have to read on).

  • Is this something we can recommend for researchers with the kinds of research collections Omeka is known for?

    Answer: almost certainly yes, unless we turn up any major problems in further testing, this is a good, solid, basic repository for Digital Humanities projects. So, for image and document based collections with limited budgets this looks like an obvious choice.

  • Can Omeka be used to build a semantically-rich website in a research/publishing project like the Dictionary of Sydney?

(The reason we’re asking this, is that UWS has a couple of projects with some similarities to the Dictionary, and we at UWS are interested in exploring what options there are for building and maintaining a big database like this. The Dictionary uses an open source code-based called Heurist. Anyway, we have some data from Hart Cohen’s Journey to Horseshoe Bend project which was exported from an unfinished attempt to build a website using Heurist).

The verdict? Still working on it, but reasonably promising so far.

  • Beyond its obvious purpose, is this a potential generic Digital Object Repository for Academe (DORA)?
  • Maybe. Of all the repository software we’ve tried at tools-days and looked at behind the scenes, this seems to be the most flexible and easily approachable.

Good

Omeka has a lot to recommend it:

  • It’s easy to get up and running.

  • It’s easy to hack, and easy to hack well, since it has plugins and themes that let you customise it without touching the core code. These are easy enough to work with that we had people getting (small) results on the day. More on that below.

  • It uses the Digital Object Pattern (DOP) – ie at the heart of Omeka are digital objects called Items with metadata, and attached files.

  • It has an API which just works, and can add items etc, although there are some complexities, more on which below.

  • It has lots of built-in ways to ingest data, including (buggy) CSV import and OAI-PMH harvesting.

Bad

There are some annoyances:

  • The documentation, which at first glance seems fairly comprehensive is actually quite lacking. Examples of the plugin API are incorrect, and the description of the external API are pretty terse and very short on examples (eg they don’t actually give an example of how to use your API key, or the pagination).

  • The API while complete is quite painful to use if you want to add anything – to add an item with metadata it’s not as simple as saying {“title”: “My title”} or even {“dc:title”: “My Title”} – you have to do an API call to find elements called Title, from the different element sets, then pick one and use that. And copy-pasting someone else’s example is hard: their metadata element 50 may not be the same as yours. That’s nothing a decent API library wouldn’t take care of, the eResearch team is looking for a student who’d like to take the Python API on as a project (and we’ve started improving the Python library).

  • Very limited access control with no way of restricting who can see what by group.

  • By default the MYSQL search is set up to only search for 4 letter words or greater, so you can’t search for CO2 or PTA (Parramatta) both of which are in our test data; totally fixable with some tweaking.

  • Measured against our principles, there’s one clear gap. We want to encourage the use of metadata to embrace linked-data principles and use URIs to identify things, in preference to strings. So while Omeka scores points for shipping with Dublin Core metadata, it loses out for not supporting linked data. If only it let you have a URI as well as a string value for any metadata field!

But maybe it can do Linked Data?

Since the hack day we have some more news on Omeka’s coming linked data support. Patrick from the Omeka Team says on their mailing list:

Hi Peter,

Glad you asked!

The API will use JSON-LD.

The Item Add interface as we’re currently imagining it has three options for each property: text input (like what exists now), internal reference (sorta bringing Item Relations into core, just with a better design), and external URI. The additional details, like using a local label for an external URI sound interesting, and we’ll be thinking about if/how that kind of thing might work.

Properties, too, will be much more LoD-friendly. In addition to Dublin Core, the FOAF, BIBO, and other vocabularies will be available both for expressing properties, and the classes available (analogous to the Item Types currently available).

Changes like this (and more!) are at the heart of the changes to design and infrastructure I mentioned in an earlier response. We hope that the additional time will be worth it to be able to address needs like these!

You can watch the progress at the Omeka S repo: https://github.com/omeka/omeka-s

Thanks,

Patrick

This new version of Omeka (Omeka-S) is due in “The Fall Semester of 2015”, which is North American for late next year, in Spring. Hard to tell from this short post by Patrick, but this looks promising. There are a few different ways that the current version of Omeka may support Linked Data. The best way forward is probably to use the ItemRelations plugin.

But what can we do in the meantime?

  • The Item Relations plugin desperately needs a new UI element to do lookups as at the moment you need to know the integer ID of the item you want to link to. Michael Lynch and Lloyd Harischandra both looked at various aspects of this problem on the day.

  • Item Relations don’t show up in the API. But the API is extensible, so that should be doable, should be simple enough to add a resource for item_realations and allow thevocab lookups etc needed to relate things to each other as (essentially) Subject Predicate Object. PT’s been working on this as a spare-time project.

  • Item Relations doesn’t allow for a text label on the relation or the endpoint, so while you might want to say someone is the dc:creator of a resource, you only see the “Creator” label and the title of the item you link to. What if you wanted to say “Dr Sefton” or “Petiepie” rather than “Peter Sefton” but still link to the same item?

What we did

Slightly doctored photo, either that or Cindy attended twice!

Slightly doctored photo, either that or Cindy attended twice!

Gerry Devine showed off his “PageMaker” Semantic CMS: Gerry says:

The SemanticPageMaker (temporary name) is an application that allows for the creation of ‘Linked Data’-populated web pages to describe any chosen entity. Web forms are constructed from a pre-defined set of re-usable semantic tags which, when completed, automatically produce RDFa-enabled HTML and a corresponding JSON-LD document. The application thus allows semantically-rich information to be collected and exposed by users with little or no knowledge of semantic web terms.

I have attached some screenshots from my local dev instance as well as an RDFa/html page and a JSON-LD doc that describes the FACE facility (just dummy info at this stage) – note the JSON-LD doesn’t expose all fields (due to duplicated keys)

A test instance is deployed on Heroku (feel free to register and start creating stuff – might need some pointers though in how to do that until I create some help pages):

https://desolate-falls-4138.herokuapp.com/

Github:

https://github.com/gdevine/SemanticPageMaker

This might be the long-lost missing link: a simple semantic CMS which doesn’t try to be a complete semantic stack with ontologies etc, it just allows you to define entities realtions and give each type of entity a URI, and let them relate to each other and to be a good Linked Data citizen providing RDF and JSON data. Perfect for describing research context.

And during the afternoon, Gerry worked on making his CMS able to be used for lookups, so for example if we wanted to link an Omeka item to a facility at HIE we’d be able to do that via a lookup. We’re looking at building on work, the Fill My List (FML) project started by a team from Open Repositories 2014 on a universal URI lookup service with a consitent API for different sources of truth. Since the tools-day Lloyd has installed a UWS copy of FML so we can start experimenting with it with our family of repositories and research contexts.

Lloyd and Michael both worked on metadata lookups. Michael got a proof-of-concept UI going so that a user can use auto-complete to find Items rather than having to copy IDs. Lloyd got some autocomplete happening via a lookup to Orcid via FML.

PT and Jacqueline chatted about rich semantically-linked data-sets like the Dictionary of Sydney. In preparation for the workshop, PT tried taking the data from the Journey to Horseshoe Bend project, which is in a similar format to the Dictionary, putting it in a spreadsheet with multiple worksheets and importing it via a very dodgy Python Script.

Peter Bugeia investigated how environmental-science data would look in Omeka, by playing with the API to pump in data from the HIEv repository.

Sharyn and Andrew tried to hack together a simple plugin. Challenge: see if we can write a plugin which will detect YouTube links in metadata and embed a YouTube player (as a test case for a more general type of plugin that can show web previews of lots of different kinds of data). They got their hack to the “Hello World, I managed to get something on the screen” stage in 45 minutes, which is encouraging.

Jake looked at map-embedding: we had some sample data from UWS of KMZ (compressed Google-map-layers for UWS campuses), we wondered if it would be possible to show map data inline in an item page. Jake made some progress on this – the blocker isn’t Omeka it was finding a good way to do the map embedding.

Cindy continued the work she’s been doing with Jake on the Intersect press-button Omeka deployment. They’re using something called Snap Deploy and Ansible.

Jake says:

Through our Snapdeploy service Intersect are planning to offer researchers the ability to deploy their own instance of OMEKA with just a click of a button, with no IT knowledge required. All you need is an AAF log in and Snapdeploy will handle the creation of your NeCTAR Cloud VM and the deployment of OMEKA to that VM for you. We are currently in the beginning stages of adapting the Snapdeploy service to facilitate an Omeka setup and hope to offer it soon. We would also like feedback from you as researchers to let us know if there are any Omeka plug-ins that you think we could include as part of our standard deployment process that would be universally useful to the research community, so that we can ensure our Omeka product offers the functionality that researchers actually need.

David explored the API using an obscure long forgotten programming language, “Java” we think he called it and reported on the difficulty of grasping it.

More on stretching Omeka

If we were to take Omeka out of it’s core comfort zone, like say being the working data repository in an engineering lab there are a number of things we’d want to do:

  • Create some user-facing forms for data uploads these would need to be simpler than the full admin UI with lookups for almost everything, People, Subject codes, research context such as facilities.

  • Create (at least) group-level access control probably per-collection.

  • Build a generic framework for previewing or viewing files of various types. In some cases this is very simple, via the addition of a few lines of HTML, in others we’d want to have some kind of workflow system that can generate derived files.

  • Fix the things noted above: better API library, Linked Data Support,

What would an Omeka service look like?

If we wanted to offer this at UWS or beyond as well as use it for projects beyond the DH sphere, what would a supported service look like?

To make a sustainable service, we’d want to:

  • Work out how to provide robust hosting with an optimal number of small Omeka servers per host (is it one? is it ten?).

  • Come up with a generic data management plan: “We’ll host this for you for 12 months. After which if we don’t come to a new arrangement your site will be archived and given a DOI and the web site turned off”. Or something.

Creative Commons License
Is Omeka aDORAble by Peter Sefton, Andrew Leahy, Gerry Devine, Jake Farrell is licensed under a Creative Commons Attribution 4.0 International License.