USGS juggles data requests
Connecting state and local government leaders
SIOUX FALLS, S.D.—The Earth Resources Observation System Data Center here is struggling to meet the high demand for its archive of satellite remote sensing data. Global change researchers want historical imagery and related spatial data sets. "Our challenge is to make sure we preserve all that," said Stuart W. Doescher, program manager for systems engineering at the Geological Survey site.
SIOUX FALLS, S.D.The Earth Resources Observation System Data Center here is
struggling to meet the high demand for its archive of satellite remote sensing data.
Global change researchers want historical imagery and related spatial data sets.
Our challenge is to make sure we preserve all that, said Stuart W. Doescher,
program manager for systems engineering at the Geological Survey site.
Doescher, a 24-year veteran of the Great Plains data center, said three daily shifts
have been working for three straight years at real-time satellite rates to convert a
nine-track tape archive of satellite telemetry data and images to digital cassette tape.
Soon they must start planning the conversion from digital cassette tape to
next-generation media, because the tape drives probably will be available only for about
another five years.
In this business, youre always migrating, Doescher said.
Another storage problem that EROS officials have yet to reckon with will occur when
data now archived on high-density IBM Corp. 3480 tapes has to be moved to the next
generation of high-density media.
If they decide to combine data from four 3480 tapes onto single 3490E tapes,
theyll have new indexing and file-naming problems, which will be compounded if the
center selects digital linear tape, Doescher said.
In any case, the media conversion will require more intelligent software than the
center now has. The software needs to be intelligent enough to ask not only is the
data on this tape, but is it at this location? said Ken Gacke, systems engineer with
contractor Raytheon STX Corp., working on site at the EROS Data Center.
Demand for the centers data is so great that workers each month mail out about
6,000 CD-ROMs and transmit more than 140,000 data sets over the network, Doescher said.
His staff has been trying to match the storage media to the demand by shuffling data
between a robotic tape silo and the faster RAID storage devices in the centers
Computer Room 1.
EROS officials have been disappointed by the performance of mass storage systems in a
Web environment. Web users want immediate gratification, said Tony Butzer, principal
Raytheon systems engineer. They are unaccustomed to the latency inherent in the
centers marquee storage unit: a 5,000-cartridge STK 4400 near-line silo from Storage
Technology Corp. of Louisville, Colo.
Weve had to make policy decisions as to what goes on RAID and what stays
near-line and what goes in the basement, Butzer said. The climate-controlled
basement houses more than 215 terabytes of satellite remote sensing data on IBM 3480 and
digital cassette tape cartridges.
The center carefully monitors the choke points of its networks, Web servers and
information systems. A 500M satellite image could be sent in less than a minute over a
Sonet OC-3 connection, Doescher said, but over a 56-Kbps connection, the same File
Transfer Protocol exchange would take 24 hours.
Recently the center added about 1.5 terabytes of Data General Corp. Clariion RAID
storage arrays to keep up with the demand, Doescher said.
Web requests for data have been the wildcard in balancing the growing workload. Three
years ago, a Sun Microsystems Inc. Sun 4 Model 470 Sparc server burned out after
processing 18 continuous hours of Web requests.
Another factor that drives usage is the distributed Global Land Information System,
which lets Web users browse 108G of images representing most of the centers stored
satellite data.
We may have to come up with mechanisms to sort out the people who are really
interested in the data from those who are casually putting it up, Doescher said.
As often happens with Web requests, the center may start processing a large request
only to find nobody at the other end to receive the results later. They go on to
something else, but we never get notified, so we keep processing. Gacke said.
EROS officials blame the Web infrastructure for that particular problem, but they
realistically accept that notification and dequeuing protocols are far down the list of
Internet priorities.
Software licensing is another worry. Hierarchical storage management software costs are
rising as the data archive grows. Were looking for licensing that has a lot
higher upper bound before were charged a premium, Butzer said.
The proprietary nature of storage management software also is worrisome. Proprietary
formats make data migration very difficult. You end up using FTP or some other
mechanism that might not be as optimized as that provided by the storage
manufacturer, Butzer said.
In dealing with 10- or 20-year storage problems, vendor stability is a big factor.
Its a pretty big effort to throw out one software vendor for another,
Butzer said.
The systems engineering staff also must cope with the growing demand for seamless data
sets. A lot of our data products are based on the swath of sensors or political
boundaries, said Butzer. But scientists are interested in land areas that cut cross
political boundaries.
Scientists want to select elements from complementary data sets representing any place
on the earths surface and fuse them together seamlessly. There are going to be
a lot of grand challenges with that, Butzer said.
To manage and distribute its digital and photographic archive, the center currently
operates a Silicon Graphics Inc. Challenge XL Web FTP server with six CPUs and 768M of
RAM.
Network connections into and out of the center include a NASA OC-3 connection, 21 T1
circuits on the Interior Departments backbone network, and several T3 connections to
the Great Plains Network that serves western and midwestern states.
The EROS systems staff, now busy bringing in storage systems to handle even more data,
has moved a pair of STK PowderHorn tape silos into the newest computer room. The
PowerHorns, each equipped with seven D-3 tape drives, will store data from the Landsat7
satellite set for launch April 15.