This data warehouse creates a virtual Noah's Ark

 

Connecting state and local government leaders

TUCSON, Ariz.—Under the banner of the National Gap Analysis Project, the Geological Survey has for the last 12 years began mobilizing an army of biological detectives to discover plant and animal species that are endangered or are about to be, that should be thriving but aren't. Their detection tool is a data warehouse.

TUCSON, Ariz.—Under the banner of the National Gap Analysis Project,
the Geological Survey has for the last 12 years began mobilizing an army of biological
detectives to discover plant and animal species that are endangered or are about to be,
that should be thriving but aren’t.


Their detection tool is a data warehouse.


To work as USGS envisions, the warehouse needs detailed maps showing the distribution
of vertebrates and plants in the United States. Each map needs multiple layers showing not
only data about animals and vegetation but also about the land: latitude, longitude,
elevation, climate, rainfall and temperature range. Each state has its own Gap project.


The Arizona Gap warehouse draws from an unprecedented range of sources: century-old
handwritten cards about museum specimens; more than 100 databases from state fish and game
departments and from federal departments such as Defense and Agriculture; Global
Positioning System readings taken by University of Arizona graduate students, and even
information from elementary school students.


USGS’ Biological Resources Division, which coordinates the project, is
piggybacking its research with the states.


The university’s School of Renewable Natural Resources is home to Arizona Gap.
Graduate students in the Advanced Resources Technology (ART) program study geographic
information systems in cooperation with USGS ecologist Michael Kunzmann. Graduate students
generate some of the new data as well as normalize or transform data in legacy databases.


On the Web at http://nbii.srnr.arizona.edu/nbs/gap/gapdata.html,
citizens, researchers and policy-makers can ask questions, search the metadata repository
and retrieve information from more than 100 databases maintained by state, federal and
university groups.


In the works is software to build a user profile database that will let individuals
subscribe to specific information. As new information is added or old information is
changed, the software will search across libraries and return to users only data that
affects the information in their profiles.


One of the biggest problems in building such a vast warehouse, Kunzmann said, is
“there’s not enough money and not enough scientists, and we cannot work fast
enough to monitor all the standards and keep up with changes in land cover.”


Part of the solution, he said, is to get everyone from scientists to kindergartners to
contribute to the effort. “Even the most basic information about animal sightings is
useful,” Kunzmann said. “It tells us where to look for the animals.”


The resource suppliers contribute data via the same Web site that they visit to
retrieve data. A software harvesting agent presents them with a list of queries and
determines who they are, whether first-time contributors, students or peer-review
scientists. The agent keeps a user profile database. It asks what kind of data they want
to submit and which database the new data would augment.


Other harvesting tools maintain quality control by determining whether the data is in
range, compared with existing data.


If the form is clear enough, contributors can even create the metadata themselves,
Kunzmann said.


The new data goes to a database of harvested resources, maintained in the ART computer
laboratory. It also goes to a broker knowledge base on Gap’s Sun Microsystems Ultra
Enterprise 4000 parallel server with 4G of RAM and 38G of storage, where Kunzmann and his
grad students fine-tune old models and create new ones.


Each database supplier maintains its database and can draw newly harvested data from
the Gap store. That part of the process is not automated, however. “You need a person
trained in library science and the disciplines, such as ornithology, to make
judgments,” Kunzmann said.


Databases from some suppliers, such as DOD, are generalized to different levels,
depending on the user. A fairly fuzzy level of data is freely available to all. Particular
versions are available to a list of contractors or agencies supplied by DOD.


Maps showing archaeological sites and endangered species are similarly limited.
Problems have arisen, for example, from release of nesting information about peregrine
falcons.


Database owners now must request the newly harvested information but eventually will be
able to subscribe.


Each of the supplier databases is replicated on Arizona Gap’s Web and GIS server,
a Sun UltraSparc 60 with 256M of RAM and 37.2G of storage.


Getting the information out to users who have ordinary telephone lines can be
problematic.


“We did tapes, but the formats keep changing,” Kunzmann said. “We
don’t know whether we’re dealing with 4-mm or 8-mm or quarter-inch
cartridge,” or what operating system is involved.


Two graduate students work half-time to transform legacy data sets and update the
metadata to comply with Federal Geographic Data Committee metadata standards set in 1994.


“Our biggest problem isn’t hardware or software, and it isn’t how easy
the data is to retrieve,” Kunzmann said. “If you don’t have good quality
control and good metadata, you cannot rely on the quality. The whole thing is
useless.”


Metadata even covers such seemingly marginal information as what vehicle the data
collector drove.


“A researcher on horseback in a national park or forest would find more grassland
birds than one in an automobile because the horses would flush the birds out,”
Kunzmann said.


Data also is subject to constant change. When ornithologists recently split one bird
species into two species, each instance of the brown towhee and its Latin name had to be
changed in each database to the canyon or California towhee.


Some legacy data sets were produced in programs such as the Geographic Resources
Analysis Support System, a Unix GIS tool from the Army Corps of Engineers, and Idrisi, a
GIS created at Clark University in Massachusetts. Others were created in or translated
into formats compatible with Arc/Info for Unix or Microsoft Windows NT from Environmental
Systems Research Institute Inc. of Redlands, Calif.


“You can translate from one format to another,” Kunzmann said, “but you
don’t get the same look and feel, and there’s not necessarily a one-to-one
relationship in what’s presented and how.”


When a user goes to the Web site and queries the warehouse, a computer tool applies a
set of rules to determine the best algorithms to retrieve the data.


The tool works like the nearest-neighbor algorithms used in library programs, which
take into account misspellings and other irregularities, he said.


For example, a user might ask for plant communities. The tool might respond that it
found no plant communities but did find vegetation communities. Or a user might ask for
imagery about a particular bird, when what he wanted was a photo of it. The software would
ask whether he wanted satellite imagery or photographs.


A query triggers a Common Gateway Interface script that locates the data sets, starts
ESRI’s Arc/View 3.0 for SunSoft Solaris with Spatial Analyst and other ESRI
extensions, and loads the data sets. Images then go back to the requester.


“We cannot count on everyone having the software,” Kunzmann said, and there
is no funding to buy licenses needed to let users manipulate data online.


When a user requests entire data sets, it triggers another CGI script, which uploads
from the correct database to a File Transfer Protocol site on the Web server. The user
then downloads the data from the FTP site.


“If our budget was based on how much information we give away, I’d have a
great budget,” Kunzmann said. Data sets can be as large as several gigabytes and take
hours to download. The ART lab’s 100Base-T LAN connects to the university’s
fiber-optic 100Base-T WAN and uses its three to four available T1 lines.


Early this month, the Gap Web server hard drive crashed irretrievably. The data was
safely backed up on tape, but not the operating system or the CGI scripts or the
subdirectories, Kunzmann said.


Because there is no money for a mirrored server, Kunzmann spent the two weeks after he
got a new drive restoring the operating system, scripts, subdirectories, software and
data.


Lack of hardware is a problem, he said. The warehouse has grown huge and slow, and the
demand is fast outgrowing the ART lab’s resources. Lack of staff is pinching even
harder.


Average service time for a grad student is nine months, Kunzmann said. He can pay $180
per week, but industry will pay $50,000 per year.


The project is fulfilling its mission—helping preserve plants and animals—but
it’s also fast becoming a crucial tool in land planning.


A yellow area might call for informal consultation. A red area would mean,
“You’re going to have to have a Section 7 [document], a consultation with the
Environmental Protection Agency, and there’ll be public hearings,” Kunzmann
said.

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.