NARA tests grid computing for preservation
Connecting state and local government leaders
The National Archives and Records Administration is running a prototype grid computing network to test how well it can preserve electronic materials.
The National Archives and Records Administration is running a prototype grid computing network to test how well the emerging technology can preserve electronic materials, according to Reagan Moore, the data and knowledge systems group director of the San Diego Supercomputer Center, which is participating in the project.
Moore introduced the grid, called the Persistent Archives Prototype, at a NARA-sponsored electronic-records symposium held earlier this week.
The prototype has three separate nodes, one at NARA, one at the San Diego Supercomputer Center and one run by the University of Maryland's Institute for Advanced Computer Studies. The San Diego Supercomputer Center is a research unit of the University of California, San Diego.
The purpose of the prototype is to test how well a data grid can manage multiple copies of electronic materials residing in separate locations with different types of hardware, Moore said. Grid computing lets users tap computer processing power and large databases over networks without having to know the configurations of each individual system. Thus far, the technology, based on grid standards, has been used mostly by the scientific research community.
According to Moore, grid computing could offer some distinct advantages to the archival community as well. One advantage would be the ability to easily back up copies of material across multiple remote locations, essential for disaster recovery planning. The grid frees administrators from hardware constraints. Different locations can use their choice of hardware and operating systems, since the grid software works across multiple platforms.
Although storage vendors offer virtualization software to back up material over different types of storage systems, the NARA data grid is different in that it performs "data virtualization rather than storage virtualization," Moore said. Each piece of material is given a unique identifier, which remains the same regardless of the type of file system or physical hardware the data resides on.
A storage repository system identifies files names, storage locations, access constraints and other attributes of the data. With grid computing, "We manage each of those independent of storage systems," Moore said.
On the NARA system, each of the three locations runs server software developed by the San Diego Supercomputer Center, called the Storage Resource Broker. All queries are done through one central portal and all the material is located in multiple nodes throughout the grid.
The abstraction of the hardware layer might also help with the ever-present problem of hardware obsolescence, Moore said. Unique identifiers can speed the migration process to newer hardware, since the material will not have to be reorganized to work with the new file systems and hardware devices.
NARA has been developing this prototype since 1999, Moore said. The grid holds a variety of sample information, from e-mail to physics research data. The three locations are tied together through a number of high-speed networks, including Internet 11, TeraGrid, the Energy Science Network and the California Research and Education Network. Each organization runs its own independent storage system, ranging in size from 160G to over 1T.
NEXT STORY: Harvey confirmed as Army secretary