Today, we have access to information and data that 15 years ago would have scarcely seemed possible. It seems that almost everything is being created and used in the digital realm. Documents such as your history report, the spreadsheet that shows last year’s travel budget and more were likely all generated on your computer. However, though we use computers for so many things, we often don’t give much thought to preserving what we generate until it is too late. Most people can remember at least one horror story of lost data, whether it happened to them or to a friend: the research paper that was lost when the computer crashed or the scattered and disorganized family photos that were only saved to one hard drive – that eventually crashed! This list of lost digital data illustrates the potential fragility of digital information. There are several reasons why digital objects are so fragile.
Fragility of Digital Objects
One reason that digital information is fragile is that software, hardware, and other technologies can be very quickly superseded by newer ones and become obsolete. Once newer technologies become accepted as the norm, it can be difficult to use any digital object that exists in an older format. Although there is currently some backwards capability available for popular programs, this is not necessarily the case for less widely used programs and proprietary formats from small companies. Obsolescence can also occur with the media that digital information is stored on. Media that once were widely used, like 3 1/2 inch and 5 1/4 inch floppy disks, are now virtually unusable on new computers. These obsolete media or formats may contain unique information that may be very difficult or impossible to recover.
Another problem associated with digital preservation is media degradation. The digital media, such as hard drives, magnetic tapes, flash drives, floppy discs, optical discs, etc., that store the ones and zeros making up our digital files are all subject to degradation. For example, magnetic media used in hard drives can degrade over time to the point where it loses its magnetic orientation and may cause a one to flip to a zero. If enough bits flip, the file can become corrupted to the point where it is unreadable. Some media are less stable than others, but all media will degrade, putting the content stored on them at risk.
What is Digital Preservation?
So how do we deal with these problems? One way we can do this is through active digital preservation. Digital preservation is the management and maintenance of digital objects (the files, or groups of files, that contain information in digital form) so they can be accessed and used by future users. It is important to start thinking about digital preservation early in the life cycle of a digital object. While traditional print on paper may last relatively unharmed for decades untouched, this is not the case with digital objects, which have significantly shorter life spans. Therefore, by thinking about the preservation of a digital object early on, even while it is being created, we save a great deal of time and stress later on when trying to retrieve the information before it is too late. In this sense, digital preservation, and especially digital preservation actions taken early in the object’s lifecycle, is important not only for personal data management but also for large repositories that manage many objects. Though personal horror stories of lost data seem to be scattered and only happen from time to time, for larger repositories that contain many hundreds and thousands of digital objects, lost data can be a societal catastrophe. There are several strategies used to help preserve digital objects; the principal ones are data redundancy, emulation, and migration.
Digital Preservation Strategies
Backups: One of the best ways to help preserve digital objects is by data redundancy. Data redundancy is, simply put, making sure there are many copies, or backups, of important files. Redundancy mitigates many risks associated with hardware, like a computer crashing or losing that small flash drive. Adding geographic diversity to your backups, like having your home computer backed up to a hard drive that you store in another location, can further mitigate risks like fires and natural disasters. While backups are an important first step that should be considered by everyone, places like archives and libraries employ further strategies to ensure important digital content remains accessible through time.
Bit-level preservation: At the most basic level, digital content is a series of ones and zeros (bits) stored on a disk. If those ones and zeros change in some way, like through storage degradation (often called bit-rot) and unintended changes, that content can become irreparably damaged. Bit-level preservation uses tools to check the strings of ones and zeros and, if an error occurs, is able to recover an uncorrupted copy from backup. Backups and bit-level preservation work together so that content remains accessible as deposited for long periods of time. Further strategies are needed to deal with format obsolescence.
Migration: Migration is the purposeful act of changing the file format of a particular piece of digital content. For example, a Microsoft Word file (.docx) might be migrated to a PDF/A to ensure that content is not tied to proprietary software and can be opened by a broader array of software. Migration’s strength is its reliance on formats that are heavily used in the community, but it will change the look and feel of some content. Migration is the strategy most used to address format obsolescence at cultural heritage institutions, including the University of Michigan Library.
Emulation: Unlike migration, which changes the content being preserved, emulation keeps the preserved content as-is by using software that imitates the original, obsolete hardware or software to render a digital object. While emulation tools are currently only in the development stage and are expensive to implement, it is the best strategy in cases where it is important to retain the look and feel of content, or in cases where migration is not an option. Video game preservation is an example of a space where emulation tools are beginning to be widely developed and implemented.
One last way to help preserve digital objects is to make sure that as much information as possible is gathered when they are created. This information is called metadata; metadata can include basic descriptive information about the file as well as information about the file format of the object. The metadata collected about an object helps to place it in context, as well as preserve specific technical information. This is essential for making sure that digital objects are authentic. Authenticity is the assurance that the file hasn’t been added to or modified in any way: the file is the digital object created by the producer and the content of the digital object was not modified once it was placed in the digital repository. Tracking authenticity is especially important for digital files. Erasures or changes to a paper document are often readily apparent to a person looking at it; changes to a digital file can be easy to make and difficult to detect by a future user. In addition, metadata can also help to track what was done to preserve the object throughout its life cycle, such as migration from one format to another. Metadata can be linked to the digital object or encapsulated with the digital object itself. Encapsulating the metadata with the object, for example, placing the metadata with the object in the same folder in a zip file, ensures that the information stays with the file no matter where it goes. Linking the object to metadata which is stored somewhere else (not with the file), ensures that the information about the file can be recovered even if the object itself is lost.
Digital preservation, whether an individual managing personal files or an organization managing whole archives of files, combines policy, process, and technology. Policies mean that intentional decisions are made about what to preserve and how to preserve it, and these decisions are applied consistently over time. Processes spell out how the policies are implemented and identify the resources needed. Technological expertise is needed to monitor how digital objects are created and used, understand how practice impacts preservation, and make the adjustments necessary to adapt to a constantly changing landscape.