Let Web bots do the grunt work

Connect with state & local government leaders
 

Connecting state and local government leaders

If you want to know the World Wide Web today, talk to a robot. So-called bots are a cheap source of labor if you'd like to stay up-to-date on various kinds of information but don't have the time to search hundreds of Web home pages every day. Also known as Web-walkers, spiders or wanderers, these software agents fall into two types: personal bots set up and operated by end users, and bots offered by Internet

If you want to know the World Wide Web today, talk to a robot.


So-called bots are a cheap source of labor if you'd like to stay up-to-date on various
kinds of information but don't have the time to search hundreds of Web home pages every
day.


Also known as Web-walkers, spiders or wanderers, these software agents fall into two
types: personal bots set up and operated by end users, and bots offered by Internet
service providers.


You can tweak personal bots to act as your own executive information system, culling
information from Web sites and UseNet news groups. Caution: These bots are notoriously
prodigal with network resources, processor time and storage.


Bots operated by service providers are more common. These bots--or at least the
databases they feed--can be used for free by clients located anywhere on the Internet. The
clients usually must connect through a forms-driven interface on a Web browser. Unless a
bot is sponsored by the government or another organization, the cost of its service is
recouped from access charges or advertising.


To call a Web bot a wanderer is a misnomer, because most bots stay put on a single
Internet server and merely send out queries to other Web servers. The other servers don't
need to know whether they've been contacted by a bot or by a person with a Web browser.
They respond in the same way, sending back a copy of a home page file.


Unless bots are set to search for artwork, most will simply ignore any graphics files
associated with the home page or signal the server to stop sending the graphics.


Bots actually are a series of programs working together. One sends network queries,
another compares the culled materials against filters, others parse specific requests and
present information in a desired format.


You can assign a bot to download and inspect large numbers of Web pages automatically,
then take action based on what is found. That usually means searching for further
references via Hypertext Markup Language (HTML) pointers and downloading files from those
pages to repeat the search process. Or it can mean adding specific references to your
database or indexing words and pointers for future full-text searches.


The Lycos "Catalog of the Internet" maintained by Lycos Inc. and Carnegie
Mellon University is one of the best bot-fed reference tools available today. Similar
tools are Architext Software Inc.'s Excite database and Brian Pinkerton's WebCrawler,
sponsored by America Online Inc.


Pointers to these and other search services can be found on the Internet Search page
maintained by NetScape Communications Corp. at http://home.netscape.com/internet-search.html.
 


If you want to operate your own bot, it's easiest to work with a provider who can
customize sets of bots for your needs. For example, BBN Corp. of Cambridge, Mass.,
formerly Bolt Beranek and Newman Inc., has software for creating a personal information
newspaper, called PINpaper, on your Web browser using Internet content as well as in-house
resources. You can display and share this discovered information on public pages.


The Treasury Department has built some test PINpaper pages, according to BBN officials.


BBN customizes each bot system, so pricing information is sketchy, but basic modules
run about $10,000. A full system for multiple users, including server hardware, could cost
$60,000.


The problem with running your own bots is that they can get away from you. Other server
owners, for example, might resent your bot tying up their systems with extensive requests
for documents. You might run up charges of unknown magnitude if your bot hits a commercial
on-line service. And your search could even be stymied by nonsense pages that display
hundreds of popular reference words just to see how many connections they get.


Because bots are such busybodies, they pig out on resources. It's good to set them to
run at night when systems generally are underutilized.


If you're just interested in resource discovery, leave the searching to the commercial
bots like Lycos or Architext's Excite database. Lycos probably is the most extensive of
the advertiser-supported search services, but Excite allows more liberal search terms.


If you prefer to try running your own small bot, experiment with Harvest, Boulder or
WebWatch. With these tools, you can establish an organized bookmark system that will
extract data or update pointers by itself.


A good approach for a government office with a limited budget is to install Harvest on
a Unix server and then set up custom pages that autolink to services like that of Lycos
for extended searches.


If you're concerned about ill-behaved bots visiting your server, look up information
about bot exclusion at http://info.webcrawler.com/mak/projects/robots/norobots/html.
  You can create and store a file called ROBOT.TXT that most visiting bots will
retrieve first. Commands in that file define your bot access policy.


There are other robot services on the Internet, including bots that log onto Internet
Relay Chat servers, keep a channel open and respond to input from other users by sending
text strings or server commands. The Eggdrop bot created by Robey Pointer is one popular
variant. Information is available at http://www.gobills.com/eggdrop.README.txt.
 


X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.