Progress Beats Perfection: Trevor Owens on Digital Preservation

This is a guest post from Scott Richard St. Louis.

Trevor Owens. The Theory and Craft of Digital Preservation. Baltimore: Johns Hopkins University Press, 2018.

Day by day, our digital heritage matures and proliferates, reaching ever-greater levels of unthinkable volume and stunning complexity. For Trevor Owens, Director of Digital Services at the Library of Congress, the challenge of extending access to born-digital material is one without an easy solution. This absence of a simple way forward yields a persistent need for information professionals equipped to spread a keen awareness that their work is never truly done:

For many executives, policy makers, and administrators new to digital preservation, it seems like the world needs someone to design a super system that can ‘solve’ the problem of digital preservation. The wisdom of the cohort of digital preservation practitioners … who have been doing this work for half a century suggests this solution is an illusory dream … Ensuring long-term access to digital information is not a problem for a singular tool to solve. Rather, it is a complex field with a significant set of ethical dimensions. It’s a vocation. It is only possible through the consistent dedication of resources from our cultural institutions.” (page 2)

Replacing the unobtainable dream of digital preservation with a more actionable intellectual foundation – in other words, replacing the easy fantasy of one-click pathways to eternal access with a vision grounded in the hard choices and unrelenting demands of reality – is the key task for The Theory and Craft of Digital Preservation. In addressing this task, Owen succeeds both in fueling the imagination and in providing readily comprehensible, concrete insights earned through years of experience.

These insights begin with sixteen axioms of digital preservation, so called because Owens considers them “the basis for digital preservation work” (page 4). Four of these axioms contain particularly useful insights for those entirely new to the field of digital preservation. For instance, axiom number four explains that an organization’s level of commitment to digital preservation is primarily discernible not from “their code, [or] their storage architecture,” but instead from the budget: “If an organization is serious about digital preservation, it should be evident in how they spend their money” (page 5). Dependable resource allocation is the wellspring from which all potential for effective long-term digital preservation flows. Memory, though invaluable, has a price.

Axiom number five reminds readers that digital preservation is not synonymous with hoarding. The former depends on “a clear and coherent approach to collection development, arrangement, [and] description” in order to maximize accessibility and discoverability for users, within the constraints of available staff capacity (page 6). Hoarders, on the other hand, compile digital material without any limiting principles of acquisition or consistent approach to establishing order. If something is undiscoverable to users and even to the collecting individual or organization, it is not truly being preserved.

Though hoarding is not digital preservation, Owens states in axiom number fifteen that the “scale and inherent structures of digital information suggest working more with a shovel than with tweezers” (page 8). More explicit explanation might have been useful to help readers new to the field understand where working with a shovel ends and where hoarding begins. At what point does the use of tweezers become a waste of time? When does using a shovel yield one scoop too many? In any case, the key takeaway is that – in a world of limited time and large backlogs of work – it is “often best to focus digital preservation decision making at scale” by relying on the “realities of digital files’ containing significant amounts of contextual metadata” (page 9). The goal here is to think iteratively, providing users with access to a lightly processed digital collection faster and improving from there, rather than providing access far slower to a more meticulously processed collection. Progress beats perfection.

In axiom number sixteen, we learn that “digital preservation requires thinking like a futurist … Our preservation risks and threats are based on the technology stack we currently have and the stack we will have in the future, so we need to look to the future in a way that we didn’t need to with previous media and formats” (page 9). What might users require or desire of a given collection in the future? What technologies will – or will not– be available for exploring such a collection at a later point in time? What can we simplify now in order to serve as judicious long-term stewards of the finite resources available to us? Where must complexity remain for the benefit of our present and future users?

As thought-provoking as these and the other twelve axioms are, they take up only the first few pages of the book. What follows is an exploration of that which digital preservation shares with its analog predecessors, namely a lack of consistency regarding what is even meant when we refer to the act of preserving something: “It’s often taken for granted that before ‘the digital’ there was a nice and tidy singular notion of what preservation was. There was not … Before pinning down what digital preservation is, it’s critical to establish its context in a range of long-standing and divergent lineages of preservation” (page 13). For Owens, these lineages include the artifactual, the informational, and the folkloric:

In the artifactual frame, we attempt to extend the life of physical media. It is the historical contiguity of the artifact that is the focus of preservation. In the informational frame, we work to clearly establish criteria for copying encoded information from one   media forward to the next. In this case, the physical medium is simply a carrier or host     for the encoded information. In the folkloric frame, variability and hybridity of information play a key role in how stories and sequences of information preserve but also change and adapt to new circumstances” (page 33).

The boundaries between these separate frameworks are somewhat fluid. Consider this example:

“Surprisingly, a considerable amount of valuable information can be found in the dirt left behind in books by people who have handled them over the years. Using densitometers and very high quality digital images of copies of manuscripts, scholars have been studying the relative dirtiness of individual pages, which in turn can tell us a story about the use of objects … in all seriousness, there is some ongoing interest in studying the smell of different early modern books, as it could help demonstrate aspects of the circulation of these books. These smells have chemical properties that can be recorded … Individual copies of Shakespeare works have informational properties, but they also exist as rare and unique artifacts in their own right … all artifacts have information in them, and information is always entangled in the artifactual qualities of the media it’s encoded on.” (page 20)

A question naturally follows: how might these frameworks inform our thinking about digital objects? Metaphorically, could digital material in its own way have a kind of smell?

To begin applying Owens’ three frameworks to digital objects, it is important to note that “the artifactual historical contiguity approach has been effectively abandoned in the mainstream of digital preservation work. The challenges of trying to keep digital objects functioning on their original mediums are just too massive … There is little hope for a future where a hard drive conservator would fix or treat a fifty-year-old hard drive to make it ready for a patron to access” (pages 55-57). On what, then, does digital preservation focus? “As a result, the mainstream of digital preservation practice is about informational objects, not the underlying material that they are encoded on … It is now generally accepted that if you are able to recover data from such media, the first step in preservation is to copy that information off the media and move it into a contemporary digital storage system” (pages 55-58). Even so, Owens notes that “there is a range of strangely artifactual qualities tucked away in the nooks and crannies of our platform-based informational objects” (page 61). A popular video game proves illustrative:

“Those who grew up with the game Oregon Trail will remember how, over time, the landscape of the game became littered with the tombstone memorials of the various players who went before … What you wrote out for your characters on their tombstones          persisted into the games of future players of your particular copy of the game … Piracy resulted in one of these tombstones taking on a cultural life of its own … When players booted up one of these pirated copies, they would find a tombstone in it that said, ‘Here lies andy; peperony and cheese.’ For context on the joke, Tombstone pizza ran a   commercial in the 1990s where a sheriff who was about to be executed was asked what he wanted on his tombstone, that is, his gravestone, but responded with the toppings he would like on his Tombstone brand pizza, in this case, pepperoni and cheese. So some    player, presumably named Andy, put a misspelled joke about this commercial in his copy of the game and then let others copy his copy of the game. That copy of the game then became the basis for many pirated copies of it, and as a result the monument that player wrote into their local copy of the game has become a commemoration that took on a life   of its own as a meme outside the game world.” (pages 65-66)

In summation, the artifactual component of the game’s often-pirated transmission from player to player yielded an informational quirk – the tombstone – that took on a folkloric dimension, as players remembered this joke when recalling their experiences playing the game. All three frames can thus find their way into digital preservation work, often in surprising (and even humorous) ways.

While useful, the three frameworks that Owens describes to organize the lineages of preservation are incomplete for the purposes of deciding what aspects or qualities of digital objects to preserve. Such decisions must rest on the stated mission of the organization doing the collecting. Another example from the video gaming realm is instructive. Consider how many different preservation initiatives World of Warcraft could inspire:

“The game includes a significant amount of artwork, so an organization could acquire it as an aesthetic object, in which case they might make decisions about which aspects of its visual and audio art are most important. The game made several significant advances in game design, so an institution acquiring it as part of a design collection would find its source code to be of particular interest. The game is a hugely successful global business product, so its born-digital corporate records could be valuable for an institution that preserves records of the history of business, technology, and enterprise. The game’s in-game chat system has transmitted billions of informal communications between players, which could be studied by linguists, sociologists, and folklorists interested in the language habits of its global population of players. World of Warcraft can be collected and preserved for multiple purposes. Only by fully articulating your intentions and connecting those to your plans for collection development can you ensure access to collections in the future” (pages 97-98).

The lessons are clear: in a world of increasingly abundant opportunity for undertaking digital preservation initiatives, information professionals need to simplify the possibilities before them by articulating precise, achievable goals and sustainable policies, drawing clear guidance from their institutional missions in order to avoid wasting time and resources (always in short supply).

With the ever-growing volume and humbling complexity of digital objects in mind, what insight does Owens feel compelled to share with a rising generation of preservation experts? The answer is nothing short of a recommendation to prepare for a rapidly evolving self-image and considerable changes in everyday work:

“We are moving away from a world in which an archivist or a cataloger establishes an order and authors a description to a world where archivists and catalogers leverage, wrangle, and make sense of information flows. This is less about applying descriptions or imposing arrangement and more about surfacing descriptions and clarifying and deciding which order inside digital content to privilege. In many cases, roles are also shifting to enable various kinds of users to describe and organize content. In this space, it becomes more and more critical to take the lessons of the More Product, Less Process approach from archival theory and apply that to practices that enable us to work at higher levels of organization and description and let the lower-level aspects of arrangement and description be covered by embedded metadata and the forms of order and structure that come with all kinds of digital objects to begin with.” (page 158)

As focus shifts to these “higher levels” of preservation work, a variety of digital skills – from basic computer programming to user experience research to technology-intensive project management – will only become more important to establishing successful careers in the memory professions.

            Owens concludes his excellent book by explaining that digital curators “cannot predict what the future mediums and interfaces will be, or how they will work, but we can select materials from today, work with communities to articulate aspects of them that matter for particular use cases, make perfect copies of them, and then work to hedge our bets on digital technology trends to try and make the next hand-off as smoothly as possible” (pages 199-200). Surely this important work of scholarship and practical guidance will facilitate the hand-offs of our digital heritage from one generation to the next.

Scott Richard St. Louis is a 2021 graduate of the Master of Science in Information program at the University of Michigan, where he focused on digital curation.

