Patience is a virtue when building data warehouses
Connecting state and local government leaders
"It's the biggest issue," said Pat Garvey, director of the Environmental Protection Agency's Envirofacts data warehouse team. "Don't think too big, and don't start off too grandiose. Keep expectations lowered." The much-lauded Envirofacts data warehouse has been online since 1995. From the EPA's six national mainframe systems, the warehouse application extracts information on about a million sites handling or discharging potentially harmful substances, and pulls regulatory, spatial and demographic data
Its the biggest issue, said Pat Garvey, director of the Environmental
Protection Agencys Envirofacts data warehouse team. Dont think too big,
and dont start off too grandiose. Keep expectations lowered.
The much-lauded Envirofacts data warehouse has been online since 1995. From the
EPAs six national mainframe systems, the warehouse application extracts information
on about a million sites handling or discharging potentially harmful substances, and pulls
regulatory, spatial and demographic data into a warehouse built on software from Oracle
Corp.
Data from the 40G warehouse is accessible to anyone with a Web browser, at EPAs
site at http://www.epa.gov/enviro. Users
include EPA staff members, emergency management teams, business and industry, community
advocates and real estate brokers.
Managers got Envirofacts off the ground by putting together a multidisciplinary team
from three EPA units, Garvey said. We had somebody from the Superfund program,
someone from hazardous waste and someone from the water office, he said.
As the warehouse expanded, other EPA units saw how well it worked and wanted to add
data from their systems to the database. Success breeds success, and others wanted
to join in, Garvey said.
At the Kennedy Space Center, Ron Phelps, program manager for NASAs Insight data
warehouse, also built his warehouse by starting modestly and expanding it incrementally.
Everything I did in the first phase carried over into the second or third phases,
so I never threw anything away, he said. Ive been able to build on
everything weve put in place in the first two phases.
Insight, an intranet warehouse, integrates Space Shuttle maintenance information from
six contractor and agency databases. NASA engineers analyze the data to make sure that the
maintenance work, which is done by hundreds of contractors, is being processed properly.
They then produce reports on their findings.
While planning and assembling a data warehouse, project managers must maintain an
intellectual distance, remembering that a data warehouse is, in the end, a tool, both
government and industry officials said.
Data warehousing is more than anything a solution to a business problem,
said Ramon Barquin, founder and former president of the Data Warehousing Institute in
Washington and now a data warehousing consultant in Bethesda, Md. A data warehouse
is not just another database. It is a collection of data to support a decision-making
process.
The first step in the process is developing a data warehouse strategy, Barquin said.
Its not so much a question of should you develop a data warehousejust
about everybody should because they need that kind of platform for analysisbut how
you should do it. You can very easily go about it the wrong way and push back your
agencys agenda by a couple of years, he said.
A key to developing a strategy is staying focused on this fundamental fact: Users are
the center of the data warehouse universe.
The No. 1 factor in ensuring that a data warehouse is successful is making sure
that it is a tool for users, Barquin said. If the data warehouse is built by
the information technology people, off in their little ivory tower and disconnected from
users, it basically becomes a dead end.
Jim Davis, data warehouse program manager at SAS Institute Inc. of Cary, N.C., agreed
that users must be brought into the planning process.
Make sure that its a joint effort between IT and the business
community, he said. Make sure the warehouse is not totally driven by the IT
folks. Dont build a user interface based on what IT thinks the user is going to have
an easy time with. Involve the user in the planning.
At Randolph Air Force Base in San Antonio, developers of the Interactive Demographics
Analysis System (IDEAS), a new data warehouse that puts Air Force personnel demographics
on the Web, made certain that the interface was easy to use, intuitive and completely
mouse-driven.
Weve worked really hard to ensure that the user doesnt have to touch
the keyboard at all, said Air Force technical sergeant Eddie Stevens, head of the
IDEAS team. If they can touch the keyboard, they can enter something
incorrectly.
For Stevens, a point-and-click environment was critical for the success of IDEAS, whose
users range from high school students doing research papers to reporters from Air Force
publications and Air Force brass needing personnel statistics. IDEAS, posted at
http://www.afpc.randolph.af.mil under the Personal Statistic link, runs on a Compaq
ProLiant 6000 server and uses data warehousing software from SAS.
Its vital to understand who your users are and what data they need to do their
jobs, Phelps said. One of the hardest things for me was getting the user community
to tell me what data they actually needed and how they needed to see it, he said.
To this end, every nascent data warehouse needs a strong project manager.
The project manager has to have a vision thats direct and flexible,
Garvey said. Its difficult to keep expectations lowered and keep people
engaged. So you have to keep on making incremental successes very quickly throughout the
project life. The mini-success stories are just crucial.
The small-is-beautiful approach, of course, applies principally as the project is
getting up and running. Once fully operational, a data warehouse, by its very nature,
its going to growand grow and grow.
Warehouse builders should be certain that the hardware and software platforms
theyve chosen at the start can manage the vast amounts of data that will be pulled
in.
Always plan for expansion and enhancementexpansion because you want to keep
driving more data into the warehouse and enhancement because you want to be able to access
easily and completely the data within the warehouse, Garvey said.
One way to do that is to pick a family of technologies and stay with it.
If youre using different computer-aided software engineering tools,
different documentation tools and different database modeling tools, then youre in a
situation where seamlessness isnt really there, Garvey said. Staying
within the vendor family diminishes a lot of the chaos of matching up tools and other
vendors.