search archive
Explore the Texas Digital Archive

Best Practices for Long-Term Preservation

Best Practices for Long-Term Preservation of Electronic Records

Version 0.2

Introduction

Almost all records today are created electronically.  The few that are not are usually scanned into an electronic system for access and/or storage.  Regardless of format, all records need to be retained based on the agency’s retention and disposition schedule. Some records have enduring or historical value that will require that the document be preserved for long periods of time. Electronic records have a life cycle that includes the creation, management, and use of the digital object. Preserving a digital object throughout this life cycle presents unique challenges for records creators and users. This document outlines the threats to digital materials and recommended strategies for long-term preservation and access.

NOTE: Although “agency” and “state agency” are used throughout this discussion for ease of presentation, the scope of the State Archive’s responsibilities for state records extends to those created, maintained, and/or received by all three branches of state government – executive, legislative, and judicial.

If the agency makes a commitment to keep records permanently, then the agency must properly manage and maintain those records in their original form. Texas requires state agencies to create human-readable preservation duplicates of analog records that have permanent value as identified in the records and disposition schedules. As described in:  [not sure if Texas does]

Similarly, if an electronic record is listed as having permanent value on the agency’s retention and disposition schedule, it is the agency’s responsibility to maintain appropriate access to that record over time.  Preservation of an electronic record includes ensuring authenticity of the original record and retaining prescribed metadata.

What is meant by Long-Term?

The practical application of retaining electronic records can be more complicated than the law suggests. Because paper is an eye-readable medium (it can be read with the naked eye) there are no hardware or software dependencies to impact the records adversely.  Paper records can exist for decades, even centuries, in a box on a shelf with no intervention at all. The longevity of digital materials is dependent on a host of elements within a records manager’s control such as logical file naming conventions and choosing stable or sustainable file formats. But there are also elements outside of our control that need to be managed such as storage media, hardware, operating systems, and software applications.

The Open Archives Information System (OAIS) reference model (ISO 14721) defines “long-term” as long enough to be concerned with the impacts of changing technologies, including support for new media and data formats, or with a changing user community.[1] This period of time could be as little as a few years or, for records designated as “archival[2],” this period extends indefinitely into the future. A general best practice, and for the purpose of this document, “long-term” is a record with a retention period of longer than 10 years.  A record with a 5-10 year retention could undergo a format change, or need to copied to new media, but the record can usually be disposed of before these changes become a major problem.

Unfortunately, many public and private organizations are playing “catch-up” as a large amount of data has already been lost to the various threats. Section 2 Threats to Long-term Preservation of Digital Materials discusses these threats in more depth. However, it is important to note that rediscovering and recreating digital information can be expensive. The benefits of preservation may be most compellingly expressed in terms of negative benefits—the costs incurred if data are not preserved. These costs may reflect the time and effort needed to recreate the information or, if it cannot be recreated, the kinds of uses that would then not be possible.[3] Maintaining electronic records long-term is an ever-evolving task. Once a commitment is made to preserving digital materials, it is important to stay apprised to changing technology and standards in order to ensure the longevity of record use; otherwise, it is easy to succumb to data loss.

Threats to Long-term Preservation of Digital Materials

Electronic records face challenges that have not been issues for the preservation of paper records. These new threats will require that record creators and managers vigilantly address these issues in order to ensure the long-term preservation of and access to electronic records. Digital materials cannot wait until they are transferred to the State Archives before the preservation process begins. Instead, agencies have to take an active role in beginning the preservation process.  Part of beginning this process is to be aware of some of the issues that plague electronic records and taking measures to evade those problems. The following section lists some of the problematic issues related to digital materials.

Application Obsolescence

Traditionally, preservation meant keeping items unchanged and in their original format.  Due to the constant evolution of hardware and software, digital information is vulnerable to obsolescence. Obsolescence occurs when old technology is replaced by a newer version and materials created on the outdated technology are no longer accessible. In today’s competitive market, hardware and software companies come and go at a rapid rate.  As such, much of the hardware and software used today may not be available in the near future, and digital public records may be unreadable by new systems.   File formats and applications must also be compatible with replacement systems. If digital records are not converted to formats that are congruent to new systems, there is risk of loss by obsolescence.

  • When possible, use standard software or save files you want to keep in open-standard or sustainable file formats. An open-standard or sustainable format is one that increases the likelihood of a record being accessible in the future. To assist you, TSLAC has published File Formats Guidelines for Management and Long-Term Retention of Electronic Records.   If business needs require the use of proprietary or specialized software or formats (such as with GIS applications) then look to converting to open-standard or sustainable formats after the immediate business needs have been fulfilled, or when transferring those records to long-term storage.
  • To prevent records from becoming incompatible with modern systems and applications, records’ users can employ migration. When new hardware is purchased, immediately transfer data from the old hardware. Also, if new software is used, move digital records into the new programs and applications so that the records do not remain in the superseded structures.
  • Migrating digital records is important to retain content, but it is also beneficial to maintain a copy of the native, or original, format.  By maintaining the integrity of the source data, one lessens the chance of losing the data altogether and increases the chances that migration will produce a successful copy.  Researchers have argued that it is easier to recover data from its original source than from its copies, especially after several migrations.

Corruption

Paper materials decay over time, but professional preservation techniques reverse some of the corrosion and slow future processes of degradation.  Electronic records do not decompose like paper and other analog records, but they do face a unique set of preservation issues.  Digital materials are made up of ‘bits’ – a series of digits that stand for the material’s information or content. These strings of digits are read by a device such as a computer and displayed or communicated for the human eye (making digital records “machine-readable” as opposed to “eye-readable”). Corruption or  alteration of some of these bits can result in the record becoming unreadable and inaccessible.  It is possible these bits can become corrupted anytime the record is being transferred to new media or new applications.  Bits can also become corrupted over time without the records being moved as storage media age, this is a process known as “bit rot.”

Corruption also relates to the vulnerability of digital materials. Ensuring authenticity and trustworthiness in digital records requires consideration of network access, encryption settings,  system protections, and audit trails. It is recommended that agencies run regular virus scans and have systems designed to track and monitor access to records.

  • Applying a checksum, or hash, to a record helps to insure the authenticity and integrity of digital material over time. A checksum is a mathematical algorithm that applies a unique stamp or identifier to the digital object. If any aspect of the digital object changes, even one bit, then the checksum will not authenticate when run again.  Digital objects stored for long periods should have checksums applied and monitoring software should be in place to regularly run the checksum to avoid bit rot mentioned above.

Usability

There is a proliferation of digital materials in the workplace. Proper management of electronic records has never been more important.  Digital files can become essentially lost and useless if one cannot locate them or if multiple copies are on different drives (or devices) outside of the architecture developed or approved by the agency’s IT staff.  Active management practices of digital files ensure that they can be found for use in your office or in the case of a public information act request.

Employees need to be able to differentiate between the many digital records in order to carry out a productive preservation plan for permanent records. One of the simplest steps public employees can take to improve the management of their digital records is to employ conscientious file-naming customs. Consistent file-naming helps employees organize and locate records.

Mutability of Electronic Records

Electronic records can easily be changed and updated. However, this convenience also presents challenges for record-keepers. An electronic record can be edited with little indication of if and when those changes were made. To create an authentic record, it is important that the electronic record be preserved in a stable manner so that changes are not made either purposely or accidentally. Again, practicing descriptive and consistent file-naming is important to indicating major changes to a record. For each new version, the file can be saved with the version number in the file name.  Additionally, descriptive file naming provides context to the content of the electronic records. This context will give clear indication when major changes to the record have been made.

Recordkeeping systems that contain active records should have appropriate controls in place to protect records from accidental change by monitoring who has access to the system (log-ins and users accounts) and the ability to track changes in records stored in the system.  Records should also be stored in a “read only” format, meaning that any changes to a record would have to be saved as a separate draft or copy.  Checksums can be employed as a way of monitoring change in a record once that record has reached a final fixed or published state, or when the record has reached the end of its active life but still needs to be retained long-term.

Strategies for Digital Preservation

Types of Digital Media

Electronic records are saved on three main types of media—magnetic, optical, and solid state.

Magnetic: Digital information is encoded as microscopic magnetized needles on the surface of the magnetic medium being used.[4] Two commonly used magnetic mediums are disks (hard drives) and tape.

  • Magnetic disks are the most common type of permanent data Disks include a computer’s internal hard drive, which saves your computer programs and documents; and external hard drives that connect to your computer through a port and provide additional storage options. The advantage to magnetic disk storage is that it relatively inexpensive and it allows for fast access to data. The disadvantage to magnetic disks is that it can be affected by environmental factors including magnetic fields and dust. Over time, hard disks can fail, which can lead to data loss.
  • Magnetic tape can be reel-to-reel or cartridge Common as a backup medium, it is used primarily for data storage of large amounts of information because it is relatively inexpensive. It is slightly more cumbersome since it provides sequential access, meaning you have to go through all preceding data before finding the information you may need. Sequential access differs from random access which allows the user to access the data at an arbitrary period of time. Magnetic tape has a life span of about 15 to 30 years.[5]

Optical: Optical disks use focused lasers to create microscopic holes on the surface of the medium. These holes represent coded data. The lasers are used to both write and read data.[6] CDs, DVDs, and Blue-Ray disks are all examples of optical disks.

  • Advantages to optical disks are that they are portable, fairly inexpensive, and
  • Some disks, such as CD-R and DVD-R, do not allow data to be
  • Disadvantages to optical disks include the limited amount of storage space and they can be expensive compared to other types of Optical disks require drives to read and write on the disks, and therefore, there can be compatibility issues, especially at the rate technology changes.  (For example many new laptops, net-book computers, and tablets lack optical drives)

Solid State Storage Device (SSD): Solid state media uses flash memory for data storage.[7] Flash memory can be erased and reprogrammed; however, there are limitations on the number of times this can occur before the device will begin to fail.  Examples of SSDs include flash memory cards, USB flash drives,. These devices connect to a computer through a card reader or USB port to exchange the saved data.  Many new portable computers and tablets are now using internal solid state hard drives

  • Advantages to SSDs are that they are durable, they have a longer life expectancy than most optical drives, and they write and retrieve data
  • One disadvantage of internal SSDs is that they are expensive (compared to magnetic disk hard drives); however, with technological advances, SSD are getting cheaper and have increased their data storage
  • A more practical problem with external SSDs (flash or “thumb” drives) are their small size which means they can lost or misplaced very easily. Therefore, external flash drives  are not recommended for long-term

Steps to maintaining digital media

Studies have shown that under optimal conditions the life expectancy of magnetic media ranges from 10 to 20 years and up to 30 years for optical media; however, be cautious of vendors that claim longer life expectancy rates than industry standards.[8] Additionally, wear and tear of these materials can lower their life expectancy.   Recognizing the type of data storage you are using helps in the long-term planning of your electronic records. You may be required to migrate your data from one device to another to ensure that your records remain usable and authentic. Additionally your office can take the following precautions:

  • Use high quality storage media and batch test newly purchased storage devices to ensure they are not
  • Prohibit eating and drinking in areas where storage devices are being
  • Store these devices in a cool, dry place and keep the area free from dust and other environmental contaminants.

Agencies need to keep in mind that just because the life expectancy of the media itself may be 10-20 years or more, other factors may present risk to records stored on the media.

  • Hardware can change so drives required to read the digital media may disappear. As mentioned previously, many newer laptops and net-book computers are using solid state storage instead of optical disk and no longer have CD/DVD drives.  Tablets and other handheld devices are becoming more and more popular and do not have drives to read CD/DVD’s.
  • Software can change in 10-20 years or more. Applications that are around today may fade away, or be significantly altered or updated over time.  This is a major risk to records that have reached the inactive stage in the lifecycle – records that are required to be retained for several years, but are not accessed often.  In many cases these records are stored on removable media and while agencies may migrate and update records on hard drives and servers when applications are updated or changed, the records being stored on a disk on a shelf are not.  This can lead the records being inaccessible even though the storage media (the disk) is still viable.

2.1 Cloud Computing and Storage

The National Institute of Standards and Technology (NIST) defines cloud computing as a “model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released  with minimal management effort or service provider interaction.”[9] Cloud computing allows you to retrieve, use, and store records regardless of computing device or location. The service provider maintains the equipment used to create and store the data.  Cloud computing can be used as another source to store a copy of your data.

However, there are several challenges to using cloud storage including managing all additional copies and syncing them so that the copy saved in cloud matches the copy stored locally. As long as you have a local copy, you can use cloud storage as part of your data management strategy. Section 3.3 discusses in more detail the benefits and challenges of saving multiple copies. Additionally, cloud vendors do not guarantee data integrity; therefore, it is recommended that there are checksums in place prior to moving files out of the cloud.[10]

Different cloud vendors will provide varying degrees of long-term preservation functionality. It is important to consider what you need from a cloud service in order to have it fit into your preservation plan before choosing a vendor. Some providers will have more robust options for managing and storing metadata, searching content, and verifying checksums. Once a vendor is chosen, the contract should be written in a way that is consistent with your preservation plan.  Additionally, this contract should include clear expectations if data is to be moved from one provider to another. Some cloud vendors make it difficult and costly to move data from one cloud to another.[11]

Preservation of a Digital Record through its Life Cycle

Much like a paper record, digital records have a life cycle. The life cycle of a digital record includes its creation, management, and use and re-use.  It is important to take an active role in preserving the document at each stage of the record’s life cycle.

Creation:

Preservation of an electronic record begins with its creation. The creator should add relevant metadata; save the record in a recommended file format; and give the record a descriptive file name.[12]

As noted in other places in this document, there are many resources available to assist creators in taking steps to preserve their electronic records.

Management:

Electronic record creators may become managers of those records or another employee may become responsible for maintaining records created by others in her office. Effective electronic records management includes understanding the scope of the materials you have to manage; running regular virus checks to ensure the digital records are saved in a safe environment; storing more than one copy in multiple locations; and ensuring that the file formats are still readable on current technology.[13] If an electronic document is about to become obsolete, it is the responsibility of the manager to employ a preservation strategy to migrate or emulate the record.

LOCKSS

As part of an electronic records management strategy, it is recommended that more than one copy of the record be stored in multiple locations. This model is known as “Lots of Copies, Keep Stuff Safe (LOCKSS).”[14] Preserving electronic records in a distributed manner helps reduce potential technical threats faced by digital materials and ensures longevity. The LOCKSS model was designed to help protect records being stored or the long-term.  Maintaining multiple copies of records it is not a useful method for daily management of active electronic records. When employing LOCKSS, it is important to have a workflow in place to keep track of the various copies especially if they are stored on multiple drives or in cloud storage.  The LOCKSS model is separate from system backups for disaster recovery.  Multiple copies can be a part of a larger disaster plan or strategy, but data backups are not handled or maintained in a way to provide proper access to long-term electronic records.  SYSTEM BACKUPS ONLY EXIST FOR DISASTER RECOVERY PURPOSES. (see section 3.5 Disaster Preparedness for more information)

Use and Re-Use:

Users can assist records managers in preserving digital materials by providing feedback if there is trouble finding or accessing an electronic record. The symbiotic relationship between the user and record manager will help ensure that digital materials are cared for in a manner that promotes a long life span of the record.

For more information about digital preservation during a record’s life cycle, visit the Digital Preservation Education for NC State Government Employee webpage on Digital Preservation Best Practice and Guidelines, located here: http://digitalpreservation.ncdcr.gov/

Additionally, you can print the “State employee checklist for digital preservation” one-sheet to assist with the daily management of electronic records. http://digitalpreservation.ncdcr.gov/checklist_dig_pres.pdf

Electronic Records as Public Records

Public records need to be regularly backed up, especially those records with permanent retention. Files can be backed up at a remote location or on a network drive.  If digital records are copied onto an external format such as a CD or microform, multiple copies should be made and stored in different locations.  Having multiple copies reduces the risk of losing the primary content of the records.  For most permanent records, preservation duplicates are required and must be stored in an off-site location. These preservation duplicates must be in a human- readable format – paper hard copy or microfilm.  The State Archives stores preservation duplicates for Texas agencies and local governments. For more information on human- readable preservation duplicates, please see: www.ncdcr.gov/archives

Disaster Preparedness

Another essential piece to the planning process is disaster preparedness and response.

Since agencies’ permanent records are considered essential, disaster protections should be among the first established.  Consult an IT professional to ensure that all electronic records are being backed-up regularly. Additionally, each agency should have policies in place to outline specifications for data backups including how often backup files are made and the length those backup files are kept.  However, IT backups are designed to aid in a recovery situation, not ensure preservation or permanence. Disaster preparedness is just one of the initial steps to long-term preservation of electronic records.

Conclusion

The key to compliant and responsible record-keeping is planning.  Digital preservation is based on risk and access management—guaranteeing future usability of and accessibility to digital content. This process warrants attention to the issues discussed in this document, among others.  Agencies’ unique concerns should be worked through in the preservation planning process according to priority.

Endnotes

[1] ISO 14721:2012(E). 2012. Space data and information transfer systems – Open archival information system – Reference model. Geneva: ISO. p.1-1.

[2] Texas Government Code Chapter 441, Subchapter L, Section 441.180(2) defines an archival state record as “a state record of enduring value that will be preserved on a continuing basis by the Texas State Library and Archives Commission or another state agency until the state archivist indicates that based on a reappraisal of the record it no longer merits further retention.”

[3] “Sustainable Economics for a Digital Planet: Ensuring Long-Term Access to Digital Information.” The Blue Ribbon Task Force on Sustainable Digital Preservation and Access, Feb. 2010. Web.

[4] Electronic Records Management Guidelines: Digital Media. Minnesota Archives. March 2012.http://www.mnhs.org/preserve/records/electronicrecords/erdigital.html

[5] Ibid.

[6] White, R. (2008). How computers work (9th ed.). Indianapolis, IN: Que Pub.

[7] Electronic Records Management Guidelines: Digital Media. Minnesota Archives. March 2012.http://www.mnhs.org/preserve/records/electronicrecords/erdigital.html

[8] Electronic Records Management Guidelines: Digital Media. Minnesota Archives. March 2012.<http://www.mnhs.org/preserve/records/electronicrecords/erdigital.html>

[9] Peter Mell and Timothy Grance, The NIST Definition of Cloud Computing (Draft), NIST, January 2011. http://csrc.nist.gov/publications/drafts/800-145/Draft-SP-800-145_cloud-definition.pdf

[10] Report on Digital Preservation and Cloud Services (Public). Minnesota Historical Society and Instrumental, Inc March 3013. < http://www.mnhs.org/preserve/records/docs_pdfs/Instrumental_MHSReportFinal_Public_v2.pdf>

[11] Session Laws of North Carolina 2011-391 (HB 22).

[12] North Carolina Department of Cultural Resources. Digital Preservation Best Practices and Guidelines: I create Files. Digital Preservation Education for NC State Government Employees. http://digitalpreservation.ncdcr.gov/

[13] Ibid.

[14] Stanford University. Lots of copies, keep stuff safe. <http://www.lockss.org/>

Powered by Preservica
Texas State Library and Archives Commission | 1201 Brazos St., Austin TX 78701 | 512-463-5455 | ref@tsl.texas.gov | P.O. Box 12927, Austin TX 78711-2927