When people search for federal information online, the vast majority reach first for search engines like Google or Yahoo.Only 4 percent of visitors to www.nih.gov, for instance, got there by typing the URL into their browser's address line, according to a ComScore research study released last year. The rest arrived by typing nih.gov into a search engine ' usually Google's ' and then clicking on the results.This has set up an interesting dynamic between search engine companies and the federal government. The feds want their sites to appear high on the list of results delivered. Google, Yahoo and the other search engineswant to have satisfiedsearchers. The more contentthat is searchable, the better,and the happier everybody is.Users performing a searchthink, ' 'I've been diagnosedwith cancer, and I need information.'They don't think abouttheir information sources,' saidJ.L. Needham, who representsGoogle's public sector contentpartnership. 'But if peoplecan't find something, theyblame it on Google, not thegovernment.'To boost their rankings onsearch lists, agencies have beenworking with Google to developsitemaps, which are ExtensibleMarkup Language-basedlists of Web addresses thatpoint to database records.A sitemap can take a coupleof forms, Needham said. At itssimplest, it can be a list ofURLs submitted throughGoogle's Webmaster Tools Website at www.google. com/webmasters/tools.Much of the government'sinformation on the Web is uncrawlable,Needham said.'Some estimates are that asmuch as 90 percent of governmentinformation is not accessiblethrough Web search engines,'embedded in databases. 'We estimate thatat about 50 percent,' Needhamsaid. A sitemap makesthis information visible to thesearch engines.But does this request forsitemaps put Google in thetricky position of telling the federalgovernment what to do?No, said Chris Sherman, whois the executive editor ofSearchengineland.com. 'It'svoluntary. Web sites don't haveto do it,' he said. 'I don't thinkany of the search engines aredictating anything. Their concernis to get as much contentas they can. As good as searchengines have become, there arestill some barriers.'Most government Web sitesdo quite well in search enginerankings, he said. A sitemapwill boost a site's ranking if ithas a lot of the content storedin databases. 'Databases aretough for search engines tocrack,' he said.Historically, search engineshave looked with suspicion oncontent providers, Shermansaid. 'Now they're saying, 'Wewant your content, and this ishow to get it.' 'The sitemap protocol is an industrystandard, supported byGoogle, Yahoo and Microsoft.The actual development of asitemap doesn't take muchmore than a day or so.And federal Webmastersdon't seem to mind complyingwith the protocol. If anything,it's a labor of love.Setting up the sitemap forwww.plainlanguage.gov tookMiriam Vincent between eightand 10 hours. Vincent is an attorneyat the Social SecurityAdministration, but she volunteerstime to the Plain LanguageAction and InformationNetwork, an interagency workinggroup of federal employeeswho promote the use of plainlanguage for all governmentcommunications. Vincent describesherself as the site'sWebwright ' the -wright suf-fix indicating a careful craftsman,as in wheelwright orshipwright ' not its master.Before Vincent instituted thesitemap, a search in Google forone of the site's specific examplesof plain language 'wouldn'tshow up on the first page orfirst two pages' of results, shesaid. The site's examples of language,both plain and obfuscating,are some of its most popularfeatures, and they eludedsearch engines.Since Vincent implementedthe sitemap, she has seen someincrease in Web traffic.Now when users type 'plainlanguage' into Google, plainlanguage.gov is the first result.Type in 'plain language' and'engineer jargon,' and the siteis still the first result.Vincent has to do a shortcopy-paste step when she updatesthe database, but someother federal Web sites havemanaged to automate theprocess entirely, dynamicallygenerating an XML file, shesaid.It took the Energy Department'sOffice of Science andTechnology Information 12hours to create its sitemapusing the Google protocol, saidWalt Warnick, OSTI's director.'We've spent more time talkingabout what we did regardingthe sitemap protocol thanwe did executing it,' Warnicksaid.When osti.gov began offeringsitemaps several years ago, theagency saw a huge increase intraffic. 'The first day thatYahoo offered up our materialfor search, our traffic increasedso much that we could not keepup with it,' Warnick said.Dennis Rodrigues, chief of theonline information branch forthe National Institutes ofHealth, called the sitemapproject a win-win for federalWeb sites and search engines.Rodrigues coordinates sites for27 separate agencies under thehealth agency's umbrella.'I think a lot of the breadand-butter stuff agencies haveon the Web sites [was] alreadycarefully indexed,' Rodriguessaid. The bulk of searches sentto NIH Web sites are for healthproblems, such as cancer, diabetesand heart disease. But itwould be harder for someonelooking for information on aparticular gene or protein, hesaid. The information would beburied in a database.Rodrigues said developingsitemaps is more about creating'a better quality of the site'sindex and covering all the disparate,eclectic information.'The goal of the project is toboost the quality of search results,rather than the quantity.'As federal providers, we havea lot of concern about whetheror not the public is going to beable to find our information,especially about health information,'Rodrigues said. 'Weknow with the ever-growingvolume of information on theWeb, it's easy to become lost in a sea of data.
PRIMO FINDS: NIH's Dennis Rodrigues says the goal is to boost the quality of search results rather than the quantity.
Rick Steele
Opting inSeeking contentEverybody wins