Is Omeka aDORAble?

So, we have been asking looking at a few different software packages, and putting them through their paces at a series of Tuesday ‘tools days’ hosted by UWS eResearch, asking “Is this software going to be one of our supported Working Data Repositories for researcher cohorts?” That is, how does it rate as a DORA, a Digital Object Repository for Academe?

Last month we had our biggest ever tools-day event, with external people joining the usual eResearch suspects. Thanks to Jacqueline Spedding from the Dictionary of Sydney, Michael Lynch & Sharyn Wise from UTS and Cindy Wong and Jake Farrell from Intersect for coming along.

Omeka is a lightweight digital repository / website building solution, originally targeting the Galleries, Archives & Museums space.

TL;DR

So what were we wanting to know about Omeka? The external folks came along for a variety of reasons but at UWS we wanted to know the following (with short answers, so you don’t have to read on).

  • Is this something we can recommend for researchers with the kinds of research collections Omeka is known for?

    Answer: almost certainly yes, unless we turn up any major problems in further testing, this is a good, solid, basic repository for Digital Humanities projects. So, for image and document based collections with limited budgets this looks like an obvious choice.

  • Can Omeka be used to build a semantically-rich website in a research/publishing project like the Dictionary of Sydney?

(The reason we’re asking this, is that UWS has a couple of projects with some similarities to the Dictionary, and we at UWS are interested in exploring what options there are for building and maintaining a big database like this. The Dictionary uses an open source code-based called Heurist. Anyway, we have some data from Hart Cohen’s Journey to Horseshoe Bend project which was exported from an unfinished attempt to build a website using Heurist).

The verdict? Still working on it, but reasonably promising so far.

  • Beyond its obvious purpose, is this a potential generic Digital Object Repository for Academe (DORA)?
  • Maybe. Of all the repository software we’ve tried at tools-days and looked at behind the scenes, this seems to be the most flexible and easily approachable.

Good

Omeka has a lot to recommend it:

  • It’s easy to get up and running.

  • It’s easy to hack, and easy to hack well, since it has plugins and themes that let you customise it without touching the core code. These are easy enough to work with that we had people getting (small) results on the day. More on that below.

  • It uses the Digital Object Pattern (DOP) – ie at the heart of Omeka are digital objects called Items with metadata, and attached files.

  • It has an API which just works, and can add items etc, although there are some complexities, more on which below.

  • It has lots of built-in ways to ingest data, including (buggy) CSV import and OAI-PMH harvesting.

Bad

There are some annoyances:

  • The documentation, which at first glance seems fairly comprehensive is actually quite lacking. Examples of the plugin API are incorrect, and the description of the external API are pretty terse and very short on examples (eg they don’t actually give an example of how to use your API key, or the pagination).

  • The API while complete is quite painful to use if you want to add anything – to add an item with metadata it’s not as simple as saying {“title”: “My title”} or even {“dc:title”: “My Title”} – you have to do an API call to find elements called Title, from the different element sets, then pick one and use that. And copy-pasting someone else’s example is hard: their metadata element 50 may not be the same as yours. That’s nothing a decent API library wouldn’t take care of, the eResearch team is looking for a student who’d like to take the Python API on as a project (and we’ve started improving the Python library).

  • Very limited access control with no way of restricting who can see what by group.

  • By default the MYSQL search is set up to only search for 4 letter words or greater, so you can’t search for CO2 or PTA (Parramatta) both of which are in our test data; totally fixable with some tweaking.

  • Measured against our principles, there’s one clear gap. We want to encourage the use of metadata to embrace linked-data principles and use URIs to identify things, in preference to strings. So while Omeka scores points for shipping with Dublin Core metadata, it loses out for not supporting linked data. If only it let you have a URI as well as a string value for any metadata field!

But maybe it can do Linked Data?

Since the hack day we have some more news on Omeka’s coming linked data support. Patrick from the Omeka Team says on their mailing list:

Hi Peter,

Glad you asked!

The API will use JSON-LD.

The Item Add interface as we’re currently imagining it has three options for each property: text input (like what exists now), internal reference (sorta bringing Item Relations into core, just with a better design), and external URI. The additional details, like using a local label for an external URI sound interesting, and we’ll be thinking about if/how that kind of thing might work.

Properties, too, will be much more LoD-friendly. In addition to Dublin Core, the FOAF, BIBO, and other vocabularies will be available both for expressing properties, and the classes available (analogous to the Item Types currently available).

Changes like this (and more!) are at the heart of the changes to design and infrastructure I mentioned in an earlier response. We hope that the additional time will be worth it to be able to address needs like these!

You can watch the progress at the Omeka S repo: https://github.com/omeka/omeka-s

Thanks,

Patrick

This new version of Omeka (Omeka-S) is due in “The Fall Semester of 2015”, which is North American for late next year, in Spring. Hard to tell from this short post by Patrick, but this looks promising. There are a few different ways that the current version of Omeka may support Linked Data. The best way forward is probably to use the ItemRelations plugin.

But what can we do in the meantime?

  • The Item Relations plugin desperately needs a new UI element to do lookups as at the moment you need to know the integer ID of the item you want to link to. Michael Lynch and Lloyd Harischandra both looked at various aspects of this problem on the day.

  • Item Relations don’t show up in the API. But the API is extensible, so that should be doable, should be simple enough to add a resource for item_realations and allow thevocab lookups etc needed to relate things to each other as (essentially) Subject Predicate Object. PT’s been working on this as a spare-time project.

  • Item Relations doesn’t allow for a text label on the relation or the endpoint, so while you might want to say someone is the dc:creator of a resource, you only see the “Creator” label and the title of the item you link to. What if you wanted to say “Dr Sefton” or “Petiepie” rather than “Peter Sefton” but still link to the same item?

What we did

Slightly doctored photo, either that or Cindy attended twice!

Slightly doctored photo, either that or Cindy attended twice!

Gerry Devine showed off his “PageMaker” Semantic CMS: Gerry says:

The SemanticPageMaker (temporary name) is an application that allows for the creation of ‘Linked Data’-populated web pages to describe any chosen entity. Web forms are constructed from a pre-defined set of re-usable semantic tags which, when completed, automatically produce RDFa-enabled HTML and a corresponding JSON-LD document. The application thus allows semantically-rich information to be collected and exposed by users with little or no knowledge of semantic web terms.

I have attached some screenshots from my local dev instance as well as an RDFa/html page and a JSON-LD doc that describes the FACE facility (just dummy info at this stage) – note the JSON-LD doesn’t expose all fields (due to duplicated keys)

A test instance is deployed on Heroku (feel free to register and start creating stuff – might need some pointers though in how to do that until I create some help pages):

https://desolate-falls-4138.herokuapp.com/

Github:

https://github.com/gdevine/SemanticPageMaker

This might be the long-lost missing link: a simple semantic CMS which doesn’t try to be a complete semantic stack with ontologies etc, it just allows you to define entities realtions and give each type of entity a URI, and let them relate to each other and to be a good Linked Data citizen providing RDF and JSON data. Perfect for describing research context.

And during the afternoon, Gerry worked on making his CMS able to be used for lookups, so for example if we wanted to link an Omeka item to a facility at HIE we’d be able to do that via a lookup. We’re looking at building on work, the Fill My List (FML) project started by a team from Open Repositories 2014 on a universal URI lookup service with a consitent API for different sources of truth. Since the tools-day Lloyd has installed a UWS copy of FML so we can start experimenting with it with our family of repositories and research contexts.

Lloyd and Michael both worked on metadata lookups. Michael got a proof-of-concept UI going so that a user can use auto-complete to find Items rather than having to copy IDs. Lloyd got some autocomplete happening via a lookup to Orcid via FML.

PT and Jacqueline chatted about rich semantically-linked data-sets like the Dictionary of Sydney. In preparation for the workshop, PT tried taking the data from the Journey to Horseshoe Bend project, which is in a similar format to the Dictionary, putting it in a spreadsheet with multiple worksheets and importing it via a very dodgy Python Script.

Peter Bugeia investigated how environmental-science data would look in Omeka, by playing with the API to pump in data from the HIEv repository.

Sharyn and Andrew tried to hack together a simple plugin. Challenge: see if we can write a plugin which will detect YouTube links in metadata and embed a YouTube player (as a test case for a more general type of plugin that can show web previews of lots of different kinds of data). They got their hack to the “Hello World, I managed to get something on the screen” stage in 45 minutes, which is encouraging.

Jake looked at map-embedding: we had some sample data from UWS of KMZ (compressed Google-map-layers for UWS campuses), we wondered if it would be possible to show map data inline in an item page. Jake made some progress on this – the blocker isn’t Omeka it was finding a good way to do the map embedding.

Cindy continued the work she’s been doing with Jake on the Intersect press-button Omeka deployment. They’re using something called Snap Deploy and Ansible.

Jake says:

Through our Snapdeploy service Intersect are planning to offer researchers the ability to deploy their own instance of OMEKA with just a click of a button, with no IT knowledge required. All you need is an AAF log in and Snapdeploy will handle the creation of your NeCTAR Cloud VM and the deployment of OMEKA to that VM for you. We are currently in the beginning stages of adapting the Snapdeploy service to facilitate an Omeka setup and hope to offer it soon. We would also like feedback from you as researchers to let us know if there are any Omeka plug-ins that you think we could include as part of our standard deployment process that would be universally useful to the research community, so that we can ensure our Omeka product offers the functionality that researchers actually need.

David explored the API using an obscure long forgotten programming language, “Java” we think he called it and reported on the difficulty of grasping it.

More on stretching Omeka

If we were to take Omeka out of it’s core comfort zone, like say being the working data repository in an engineering lab there are a number of things we’d want to do:

  • Create some user-facing forms for data uploads these would need to be simpler than the full admin UI with lookups for almost everything, People, Subject codes, research context such as facilities.

  • Create (at least) group-level access control probably per-collection.

  • Build a generic framework for previewing or viewing files of various types. In some cases this is very simple, via the addition of a few lines of HTML, in others we’d want to have some kind of workflow system that can generate derived files.

  • Fix the things noted above: better API library, Linked Data Support,

What would an Omeka service look like?

If we wanted to offer this at UWS or beyond as well as use it for projects beyond the DH sphere, what would a supported service look like?

To make a sustainable service, we’d want to:

  • Work out how to provide robust hosting with an optimal number of small Omeka servers per host (is it one? is it ten?).

  • Come up with a generic data management plan: “We’ll host this for you for 12 months. After which if we don’t come to a new arrangement your site will be archived and given a DOI and the web site turned off”. Or something.

Creative Commons License
Is Omeka aDORAble by Peter Sefton, Andrew Leahy, Gerry Devine, Jake Farrell is licensed under a Creative Commons Attribution 4.0 International License.

3 thoughts on “Is Omeka aDORAble?

  1. Many thanks for this write-up and evaluation. It seems to me to reflect lots of the questions and issues we are working on, and gives us some very helpful use cases that will clarify what we might work toward.

    Here’s a couple additional thoughts.

    Regarding Item Relations:

    * A nicer UI for lookups would definitely be great, especially something similar to what ExhibitBuilder and the soon-to-be-released Posters. Posters adapted what’s in Exhibit Builder, and so it might be a helpful example of javascript to work from for that.

    * The lesser-used and more complex (funny how those are related!) Record Relations aims to be a generalized version of Item Relations, but it has no UI of its own. It can be used, for example, to connect an Item to a User. Don’t know if that’s a relevant use-case.

    * The labelling issue is a complex one, and quickly gets into weeds. Multiple labels gets into how to display them. Another request that quickly emerges in that conversation is labels in different languages. Thus, layers of database complexity build up quickly.

    * This still keeps us in the realm on _internal_ relations between Items. Is there a need for external relations (say, to a DBpedia URI?)

    Regarding access control / workflow:

    This, too, is something we hear a lot about. and also quickly becomes very complicated when we think about all the variations in needs. Some people want to control access by tags on items, some by collection, some by exhibit, some by finer-grained distinctions in user roles. That makes a general solution extremely difficult to produce. Different groups are attacking their own needs in different ways, and I think that’s the best way to go — I’d love to see lots of Omeka plugins that demonstrate focused solutions that others can pick up and/or adapt. That’ll lead to faster solutions for the most people, I suspect.

    Again, many thanks for this thoughtful and insightful write-up.

    Patrick Murray-John
    Omeka Dev Team Manager

    • Thanks for dropping by Patrick. Re access control, I’d love to see some examples as well. This is an area I have not been able to get very far, are you aware of any examples of someone doing something, anything with Access Control Lists that might give us a leg-up?

      As an eResearch manager, looking to roll out software to a large population, I think that if we want to move from per-project Omeka installs (which is still a compelling use-case) to research-cohort installs for working-data we could do with per-collection access control, that is create a collection-owner status, and collection-owners to add others to a collection with the same basic set of admin roles as Omeka already has (need to think through issues around what ‘public’ might mean in this context etc). Do you have any examples that might help us with this?

      • For a now-idle project, there’s crazy complexity for ACL in a Groups plugin. Permissions are much more about adding items to a group, rather than selective editing permissions for items, but the GroupsAclAssertion class is an example of the kind of approach that might be needed for the more finely nuanced access control, at least in Omeka 2.x.

        Something there _might_ be able to handle the research-cohort setup. That also echos some of the questions we are wrangling with in Omeka S, but again it’s all on the drawing board–nothing built yet.

Comments are closed.