Author’s Note: Over the last few months, I’ve heard from several archives students that they’ve had trouble gaining experience with Encoded Archival Description (EAD) in their classes. Luckily, EAD is something that students and practitioners can easily teach themselves. This post is meant to serve as a guide to learning how EAD works and point readers to resources that can help with metadata creation. In a subsequent post, I will address implementation considerations – which is where the rubber meets the road!
This post would not have been possible without instruction and help from Lori Lindberg, Michael Fox, Bronwen Masemann, Ruth Kitchin Tillman, and the support of my coworkers at Hennepin County Library Special Collections, Edward Hathaway and Bailey Diers.
The Shortest (EVER) Introduction to EAD and XML
You’ve probably heard that EAD is a archival metadata schema and XML. But, what does that mean?
Creating metadata that describes archival collections is not bibliographic cataloging. Nor is it the same as creating metadata for digital objects. You can certainly map EAD elements to Dublin Core or MARC (or vice versa), but a finding aid is not the same thing as a record in a catalog or a digital library. Sometimes a finding aid can be as brief as a MARC record, but often, finding aids are longer registers of the content of archival collections. Because they contain narrative text, it’s easy to overlook the structure of finding aids when you’re new to archives work, but they are just as structured as any other metadata record.
EAD is an XML (Extensible Markup Language) document type. That means that the standard is written as a set of XML elements or “tags” and attributes. As a markup language, XML is meant to hold and transport data. This is different than HTML, which focuses on displaying information on the web.
The EAD Structure
Here is a basic outline of the structure of an EAD document, aka “instance”, with links to the EAD3 tag library. Note that this outline does not include the XML declaration or references to any schemas, which are necessary to make valid EAD instances. I’ll have more on that in a subsequent post.
<recordid></recordid><titleproper></titleproper></titlestmt></filedesc><maintenancestatus></maintenancestatus><maintenanceagency></maintenanceagency><maintenancehistory></maintenancehistory></control><did></did></archdesc></ead>
Like all XML documents, an EAD instance has a root element: <ead>. This element contains no data but “wraps” all other elements in the instance. Note that each tag opens (ex: <ead>) and closes (ex: </ead>). Also note that the tags “nest” within each other. This nesting, or hierarchy, is especially important when we start describing the collection as a whole and the series, subseries, files, and items within it. The hierarchical nature of EAD reflects the intellectual framework archivists use to think about collections.
Within <ead> are <control> and <archdesc>, the two main sections of the EAD instance.
<control> is where you record information about the finding aid itself such as:
- <recordid> Identifier for the EAD instance
- <titleproper> Finding aid title
- <maintenancestatus> Version status of the instance (is it new or revised?)
- <maintenanceagency> Organization responsible for creation/maintenance of the instance
- <maintenancehistory> Information about creating, revising, or updating the instance
Some of these elements in <control> have other optional or required subelements. The important thing to remember, though, is that <control> only contains information about the finding aid – not the archival materials. Also, note that like <ead>, <filedesc> and <titlestmt> don’t contain any data, they just wrap other elements. There are many “wrapper” elements such as these in the EAD schema.
<archdesc> (Archival Description) is a wrapper element for information about the archival materials. The “level” attribute is required and encoders must record the highest level of material described by the finding aid.
<archdesc> must contain the <did> (Descriptive Identification) element. <did> wraps other elements that contain key identification information for the materials such as collection number, title, creator, dates, language of the material, and physical description. The <did> element within <archdesc> is referred to as the “upper-level” or “high-level” <did> because it describes the materials as a whole.
The following elements are also often included in <archdesc> and contain some of the most information for users:
- <bioghist> Biographical or historical information about creators
- <scopecontent> Description of content coverage and material types
- <arrangement> Intellectual organization of the materials, often written as a list of series
Finally, you’re probably wondering where in this structure you put that giant box list or inventory you just made. The <dsc> (Description of Subordinate Components) is where that goes! Here’s a snippet from the tag library showing how the <dsc> looks. Note the hierarchy! Each <c0x> represents an intellectual component of the collection. As a unit, these don’t necessarily correspond exactly to the containers that hold the material, although often a folder does correspond directly to a “file” as an intellectual component. This distinction wasn’t something I really thought about that much before learning EAD, but it is an important concept to grasp in order to describe materials efficiently.
Tools
You don’t need much to get started marking up finding aids. All you really need is a text editor like Notepad for Windows or TextEdit for Mac. However, I strongly recommend investing in a more robust XML editor. At home, I use an academic version of Oxygen XML Editor which cost me around $100. It was well worth it. Oxygen validates my files, auto-fills code, performs batch and single transformations, and imports data easily. All of that is starting to get into the implementation side of working with EAD which I’ll speak to in a subsequent post, but trust me, an XML editor is worth the investment. I did test drive Dreamweaver, but I never could get it to do everything that Oxygen can do, likely because it caters to other markup languages. Bottom line: get an editor that caters to XML specifically.
Another tool you may need is an XML validator. Here’s a free validator that is easy to use. XML editors will also validate your code.
EAD 2002 to EAD3
If you’re looking for examples of XML finding aids on the web, chances are you’ll bump into some EAD 2002 – the previous version of the EAD schema. EAD3 was released in 2015, so it’s still very new to the archival community. I recommend becoming acquainted with some of the old elements so you’re not caught off guard when you see them. For example, <control> used to be called <eadheader> in EAD 2002. The EAD3 tag library lists elements that are now obsolete. Ruth Kitchin Tillman’s site EADiva is also a great resource for learning more about changes and corresponding elements in EAD 2002 and EAD3.
One of the most important changes from EAD 2002 to EAD3 is that there are more opportunities to better structure your data. For example, previously you could make unstructured statements about dates and physical characteristics with <unitdate> and <physdesc>. You still have these options in EAD3, but now there are alternative elements (<unitdatestructured> and <physdescstructured>) that allow you to break your data into smaller chunks. More granular data makes for better indexing and searching.
Resources
Here are some of the EAD resources I’ve found most helpful:
EADiva by Ruth Kitchin Tillman
Honestly, I don’t know what I would do without this website. It’s like the official tag library, but it does a better job of organizing elements and explaining how EAD works.
EAD3 Tag Library from the Society of American Archivists and the Library of Congress
There is an HTML version, PDF version, and a hard copy version of the tag library available to purchase.
XML Tutorial and XML in 10 Points from W3C
Good introductions to XML and how it works generally.
Harold, E. R., & Means, W. S. (2004). XML in a Nutshell, 3rd Edition (3rd ed.). O’Reilly Media, Inc.
This is a great book that explains XML in easy to understand terminology and examples.
Official EAD Site from the Library of Congress
EAD Best Practices at the Library of Congress
I’ve found answers to many odd and specific questions here. It’s written for EAD 2002 at the moment, but you can still use it to write EAD3.
Github sites for SAA EAD Roundtable and SAA Technical Subcommittee on Encoded Archival Standards
These are fantastic and you should poke around here periodically. They’ve got examples of EAD3 XML documents and stylesheets to transform EAD 2002 to EAD 3 and EAD to HTML. The EAD3 Toolkit is especially useful for those learning the schema or trying to implement EAD3 in their repository.
ArchivesSpace is the new content management system for archives that replaced Archivists’ Toolkit. Try inputting some data into the ArchivesSpace sandbox and export the EAD to see how the ArchivesSpace fields correspond to EAD elements.
Describing Archives: A Content Standard (DACS)
You can’t talk about EAD without talking about DACS (or ISAD(G) if you’re not in the U.S.). The 25 elements in DACS correspond to elements in EAD3 and tell you how to actually record the archival metadata that you’ll encode with EAD tags.
Happy coding and see you in a few months for Your EAD Primer: Part 2!
Carissa Hansen is the Archives Assistant at Hennepin County Library Special Collections in Minneapolis, MN, an online student in UW-Madison’s Master’s LIS program, and the Community Manager at Hack Library School. At Hennepin County Library Special Collections, she works mainly with archival processing, arrangement, description, and digital collections. At her job, she’s also currently helping to develop a strategy for EAD3 implementation. If you’d like to chat with her, you can find her on Twitter @libchans or email carissakhansen@gmail.com.
Featured image by Marino Gonzalez, licensed under CC BY-NC-ND 2.0.
Categories: Starter Kits, Technology
3 replies ›