Have you ever wondered about what happens to the digital files that get donated to the American Heritage Center? Or what happens with files that were created on software that no longer exists? Can the AHC deal with 5” floppy disks and ZIP disks? What about email and websites?
A trip to the Born Digital unit reveals a trove of equipment and software that lets the AHC care for digital files – whether common or obscure. Among its assortment of hardware are drives and players that read both 3” and 5” floppy disks, Zip disks, Blu Rays, SATA and HDD internal computer hard drives, and DAT tapes. Its collection of software programs likewise opens a wide variety of files from the defunct – ClarisWorks and Lotus Word Pro – to the popular – Photoshop and Word – to the emerging – Microsoft Outlook email and WARC website files.
In a strange sense, digital files are very fragile. It might not seem that way given how widespread they are, but it’s very easy to accidentally delete a file or make an unintended edit. It’s also not uncommon for a computer to make an error and corrupt a file so that you can’t open it, or the font suddenly looks like Wingdings characters – ehimrtvyz.. It’s the responsibility of the Born Digital unit to protect the digital files in the archive so that researchers can read them, be inspired, and make discoveries now and far into the future.
What steps does the Born Digital unit take to preserve digital files?
Computer errors can come up when you transfer files between folders or across devices. To prevent this, the Born Digital unit checks the digital fingerprints of each file that gets transferred from a disk, like a CD or a flash drive, and put onto the workstation computer. If the fingerprint remains the same after the transfer, all is good. If the fingerprint is different, the digital archivist will investigate what went wrong and fix it.
Digital files have a lot of metadata. Simply put, metadata is data about data. It tells us who created a file, when it was created, what software it was created on, how big it is, and so on. This information gives archivists clues about how to preserve it, as well as context into how one file might connect to a second file within a folder. The Born Digital unit collects this metadata and organizes it to prepare the file for researcher access.
Files need to be in a stable format that can be opened twenty plus years into the future. This means migrating old or obscure formats from their original type to one that is very commonly used or open source. Open source means that the software code is openly available and if it becomes necessary, software developers can look at the code to recreate a way to read the file. In practice, this means converting an old .doc file into a .pdf. Microsoft may let you open a .doc file on Word now, but it’s an old format and it’s very probable that the company will no longer support it at some time in the future.
At this point, files get renamed as well. Renaming involves replacing spaces and special characters, like an ampersand or an asterisk, with safe characters, like an underscore or a dash. Some special characters or sequences of special characters mean a specific thing to a computer. By replacing them with safe characters, we remove the potential risk of a computer misreading a file.
Sometimes a file won’t open or tell you what kind of format it is. Sometimes you have a corrupted file and you want to dig around for clues to see if you can open any part of it. In these cases, digital forensics tools are used to take a deeper, computer-level look at a file.
Storage and Maintenance
Once a file is stable and renamed, and all the metadata is collected, it gets saved into three identical copies that have the same digital fingerprint. The three copies act as backups in case one of the files gets deleted or accidentally altered.
It might seem at this point that everything is finished and there is no more work to be done. This is not so. Digital files require ongoing maintenance. The files’ digital fingerprints need to be continually checked to show that they haven’t been corrupted or changed. File formats might need to be converted as software versions are updated or as companies go out of business. The servers or hard drives where the three copies are stored need to be replaced every 5-7 years before they die or crash. The care for digital files is an ongoing task.
The AHC works hard to makes sure that digital files are ready for you now and far into the future. Ask us how you can access our many exciting digital collections. To learn more about digital preservation, contact the AHC’s Digital Archivist Rachel Gattermeyer at firstname.lastname@example.org.
– Post contributed by Rachel Gattermeyer.