AusPreserves May Monthly Meetup - Metadata Challenges for Digital Object Complexity, a new Item Model, and Validation at Scale

Did you miss the May Meet Up? I nearly did and was so glad I could catch it I promised a short and sweet posting on it to follow. 

As you may find I struggled with the short so as to do justice to the valuable work highlighted by 3  captivating presentations. Elements of the lively discussion between presenters and audience are integrated and the session was recorded. These notes will hopefully augment and/or inspire its watching.

Too Many Daves - Carey Garvie, The National Archives of Australia (NAA)

Carey Garvie opened with this great teaser title (nod to Dr. Seuss) covering the outcomes of a significant program to overhaul the identifier system/s for multiple digital representations and complex digital objects at the National Archives of Australia. 

When all the identifiers for multiple representation, surrogates, derivatives are the same how do you know which is which?

Existing metadata schema, whilst dealing well with managing a 1-1 representation, hits issues for management of complex digital objects like digital surrogates and multiple representations which require a 1 - many model to identify relationship, technical variation, supporting uniqueness and links with multiple representations. 

Audio visual material need to be clearly defined so that the complexity and uniqueness of these digital entities is “not hidden behind a single series, number, or identifier”.

The working group that formed identified 43 requirements across 2 main categories of work; one being broadly adaptation of the separation of the intellectual from the technical metadata standards as set by PREMIS into a combined model, and, a significant overhaul of the unique identification system to integrate and articulate the new elements and existing properties required to achieve this goal, such as; uniqueness, domain & agency, immutability, system/s agnostic, URI, prescribed/uniform and human readability.

The “must” do and the “should” do guided the actions and priorities toward the overarching goal of interoperable and accessible (searchable) digital assets through better metadata standards and supported by strong governance.

Fig 1 "Too Many Daves", slide 11, courtesy Carey Garvey, NAA

Some remaining challenges are human and or systems transition based:
  • overcoming and/or the “letting go of entrenched processes” 
  • “avoiding clashes” with other systems (integrations and workflows that share identifiers or prefixing/roles) 
  • “what to leave behind and when to draw the line?” deciding when to retire existing systems and processes. 
Whether a system is superseded or not is in the eye of the beholder and ultimately needs to be directed by and with the core users. If users are still relying on systems targeted for retirement make sure the plans for wind up secures current programs and workflows in operation. 
Fig 2  "Too Many Daves" slide 12, courtesy Carey Garvey, NAA

Queensland State Archives Item Model - Katrin Hulimann-Graham & Elizabeth Hawkins, The Queensland State Archives (QLD State Archives)

Next up Katrin Hulimann-Graham’s & Elizabeth Hawkins’ presentation echoed similarly issues faced with adaptation of older and/or existing systems designed for managing physical items.
Whilst dealing with the “intellectual” description well, the configuration of ArchivesSpace isn’t “friendly” to managing the paper-based collections within series. Additionally, at item level, the system was not set up to manage and/or distribute a building and complex array of digital representations and/or surrogates with their own distinct “technical” descriptions and characteristics.

The current systems configuration comprising of a Systems Management layer, Public Access portal and Agent Discovery layers.  Improvements were needed to support user access and discoverability to all and/or the ability to target any digital representations/surrogates. Without a model to handle the technical descriptions uniquely human understanding of representation/surrogate and relationship would not be systematically achievable. A working group was formed to tackle the response.

Fig 3 Queensland State Archives Item Model, slide 4, courtesy Katrin Hurlimann Graham and Elizabeth Hawkins

The biggest decision of the working group was to identify 3 new additions to the PREMIS entity model. The adaptation of Function and Mandate sustain and reinforcing Agency and Series and with a flexible expansion of Representation to enable and individualise the distinct technical expression of each digital representation and support for linked relationships. 

With the adaptation came flexibility concerning representation/s which have a state and status that can change, whilst formalising and securing links to the stable core Item level. This had the added benefits of freeing up access for users with clearer use/access and visibility articulation.

Main advantages

  • the new model will integrate with any transition to preservation models and or system/s (when the organisation is ready),
  • each representation is accessible with clearer visibility and use provision and are linked to their core item

Remaining challenges

  •  linkages between representations
  • working at scale. This will be critical to preventing confusion and legacy.


A new model more complex model might have attracted some resistance to establish but has been remarkably well adopted and quickly adapted to by users. 

To Paraphrase Katrin and Elizabeth: “It was a great exercise…of reviewing…to answer the questions of what is needed and do our existing systems and models work?” 
The result has been more flexibility overall, introduced new and stand-alone entities, reinforced mandates and functions that will improve discoverability for Agency and Public clients and, ultimately a reinforcement, through the opportunity to convene its detailed review, that the Australian Series System is still by far and away the best. 

Some Q&A followed with a main point of discussion on the adoption of permanent linkages to representations, was it possible? Carey raised the exploration of “meaningless identifiers” in contrast to human readable ids as a challenge the archive faced in terms of user adoption and acceptance (people have worked with human readable Identifiers for generations and it works to a point). Ultimately a systems driven approach is not always the best option however a systematic solution/s will best address the issues this creates at scale.

Speaking of at scale.

Tip of the Validation Iceberg - Jay Gattuso, The National Library of New Zealand (NLNZ)

Do check out the recorded session . These notes of mine are the very “tip” of the challenges Jay's presentation discussed and posed.

The dilemma Jay raised is a huge challenge for interrogation of JHOVE validation errors at scale. To identify what is wrong, to know what action may be necessary when validation errors are reported, post deposit is a complex issue at scale, not least for limited human resourcing and/or related skills gaps inherent in such an exercise. For example: where 5% of 20 million files report a validation error, but not what that error might be, how do you approach it systematically and efficiently, and with what tools? 

To answer this challenge for NLNZ Jay has been working on building and testing a semi-automated Validation Testing Framework. In the slide shown below, Jay’s green steps add automation into the workflow, whilst the white boxes describe manual steps. 
Figure 4 The Tip of the Validation Iceberg, slide 4, courtesy Jay Gattuso, NLNZ

The questions that arose related to tracking, progressing and authorising movement through the process. When decisions are made, who makes what decision, what actions are recorded against the Master and who records them? And, how are they most efficiently recorded? Jay additionally sought insight from colleagues in regards how to record what can justifiably be characterised as being like a Conservation treatment against a digital object. 

At this point you could feel and see (nods and virtual thumbs up) the virtual agreement with such a proposition. 

What is missing is a technical model/schema, and complimentary organisational policy, conforming to digital object requirements and operational expectations, that permits recording relevant information by the correct person, engineer or specialist (my terms).

Decision issues start with the tools and where they reside. “It all happens on one laptop.” 

Then who signs off changes and what changes? Jay raised the example of fixing non-ISO standard date/time separators in items that specify ISO standard date/time units to make it compliant. Is this a recordable change?

Another example Jay raised is when JHOVE cannot determine the issue the solution might be found by opening the file in relevant software and re-saving it to the same format. This is a resourcing (correct software) challenge and is a change (opening the file in software writes information into it). If we believe it is relevant to document such a change/fix where do we record it?

The separation of the Intellectual and Technical data as pointed out by our prior presenters is relevant here too. Is the intellectual, or both intellectual and technical, a curatorial responsibility/role? Engagement between the technician and curatorial (the help of a GUI takes time and resourcing to create if they are to be the one and same person) becomes like our conservator and curator roles, to help us to describe who records what. Then the problem becomes technical again due to scale, the where to record, the how and how much (histories) to be recorded.

In dealing with what Jay classified as beyond fixity comparisons when testing, determining actions and signing off on Treatment that may be characterised as visual, mathematical or technical change, who owns the decision to record this change? Information we may be asked to provide in future Audits.

All this, however invisible to interrogation other than by utilising specialised tools, asserts change upon the Original. The preservation of multiple versions may not be so much a question of capacity for storage as a question of what do you run future preservation and validation testing on? 

Jay observed,
  • adaptation of PREMIS Representations would assist to reinforce the Intellectual Entity through the different format contexts, 
  • that validation is a Conservation treatment,

Digital Treatment warrants being recorded consistent with preservation practice, and as such recorded for reconstruction/reconstitution of the digital object through preservation of the digital treatment/s.

Is it possible to preserve the validations information as a treatment history, to manage/store multiple digital versions of a treated original in (I imagined) a linked cluster of files where inheritance and new information identify and signify placement in this preserved chain of inheritance to support decision making. A little different to the pre-existing Derivation Relationship entity. 

Did I get that right? It was an enthralling fast paced discussion with Jay's take away questions (slide extract below) resonating with the audience.

Finally can alignment of policy concerning the physical preservation and the digital intellectual and technical preservation be of assistance to improving the systems capture of this information? 

Many in the audience thought so. Carey confirming the NAA were seeking to do so, Matt confirming that at the State Library of NSW these policies were linked as one. At Museums Victoria, whilst relatively manual by comparison, similarly an alignment of policies works to bridge the digital/physical management/practice divide which are in addition strengthen by linked underpinning procedures that bring this alignment of conservator (technical) and curator (intellectual).

Fig 5 The Tip of the Validation Iceberg, slide 17, courtesy Jay Gattuso, NLNZ

As established by all 3 presentations critical to progress and success was the additional goal to act on opportunities to re-examine, test, and continuously strengthen existing standards, policy, procedure and processes concerning metadata capture of the intellectual and technical, however long prior methods have been in place. 

Post a Comment