Earlier this week my colleague Jenifer posted about emerging technologies in archives. In keeping with the themes promoted for International Archives Week, this post will focus on our efforts to preserve digital content in the Institute Archives and why it’s so important.
Preservation has always been a part of archival work, from storing paper documents in acid-free folders and boxes to conserving fragile or damaged items. Digital preservation is much more complex, and it’s almost certainly the greatest challenge facing archives today. Between the vast array of file formats (and multiple versions of them) to the sheer amount of data produced on a daily basis, digital content poses a variety of difficulties for long term preservation.
So what is digital preservation? There are a variety of definitions, but to keep it simple, here’s an excerpt from the Digital Preservation Coalition’s glossary. Digital preservation “refers to the series of managed activities necessary to ensure continued access to digital materials for as long as necessary.” In an archival setting, as long as necessary typically means permanently. We want to be able to share digital images, documents, audio & video recordings, and other media types indefinitely, just as we always have for materials in analog formats.
I should also state what digital preservation is NOT. Digitization does not mean preservation. Archivists typically digitize stuff to facilitate remote access. This supports research conducted by those who cannot easily visit our reading room to use collections. In fact, digitized documents add to our preservation workload, because anything we create also needs to be stored, managed, and made accessible to end users.
So how do archivists manage to preserve all this digital material? There’s really no one-size-fits-all solution; each repository must find its own solution. Some institutions purchase a preservation system, others use a variety of applications to perform numerous functions. For example, one step in the process is generating checksums, which are used to make sure files haven’t changed over time (a.k.a. fixity checks). Another step is called normalization, in which a duplicate file in a special format is created to improve either preservation or access. These and many other steps are taken in order to ensure the trustworthiness of a file over time and to ensure that it will be in a format that future technologies can read.
At RPI we’re using an open-source system called Archivematica, which aggregates a number of micro-services to perform a variety of preservation functions. These are necessary to conform to the OAIS reference model, an international standard for digital preservation (ISO 14721). Our Archivematica implementation serves as a dark archive for preservation quality files in formats such as TIFF, PDF/A, WAV, etc. We began by uploading a series of digitized George M. Low images into Archivematica; later we’ll add scanned copies of RPI publications, including back issues of The Rensselaer Polytechnic, the alumni magazine, and several RPI histories. Ultimately we plan to use the system to preserve born-digital content, which is becoming an ever-larger portion of our acquisitions. Without an active preservation program, digital materials are subject to deterioration (bit rot), loss through human error, and other calamities. This is especially risky for born digital collections since they lack analog backups.
The future of archives is undoubtedly digital, both for sharing our records and for preserving electronic files. Long term preservation and access do not come cheap, and the trick will be to develop sustainable systems. And while open source solutions avoid vendor fees, they come at a price in terms of staff time spent maintaining them. Hence, we’ve contracted with a vendor to host our preservation system, whose technical support has been a big help as we face the moving target of digital preservation. As RPI archivists, we value the materials in our care no matter what forms they take, and we’re committed to preserving their valuable information for future researchers.