Is the Public Outcry Coming From Your Constituents or Bots?
Connecting state and local government leaders
It’s getting harder to discern fact from fiction online, and public officials will to need to fight bots with bots.
BERKELEY, Calif. — In the weeks leading up to the Federal Communications Commission’s fateful vote to repeal net neutrality, opponents consistently highlighted the many comments automatically and fraudulently submitted to the commission by automated web robots (commonly known as “bots”). One state attorney general expressed concern that citizens’ identities were used to lend credence to the bot-generated comments. While the submissions ultimately did not play a major role in the commission’s decision, the comment pool will likely be a key piece of evidence in the court cases over the repeal.
A more extreme example of bots complicating an agency’s work saw Judge Jon Tigar of the Northern District of California refer a case to the local U.S. Attorney’s Office for investigation. The claims process of a privacy class action he was adjudicating was so fouled with automated submissions it needed to be addressed before the case could proceed.
For government agencies seeking to manage large comment pools and use stakeholders’ input when making key decisions, these examples should raise alarm. These are some of the most challenging scenarios: sensitive topics, on which many citizens have strong opinions, clouded by fraudulent data submitted by hundreds or even thousands of active bots. How can the responsible agency or commission manager tell whether their submission form has been infiltrated by bots, or whether the comments represent legitimate citizen feedback?
Most managers will encounter one of two types of bot comments, beyond the basic ones which spam filters catch. The first, more crude type of bot will simply jam the system with the same name (or minor variations on the same name) and comments repeated multiple times. Increasingly common, the second type comprises sophisticated bots designed to be far less obvious.
With these bots, you usually can’t detect their presence from a single message. Nowadays, this class of bots can approximate humans fairly well in their generated text, having learned from the billions of words of human-generated content available online. Poor or excellent grammar and speech patterns won’t immediately reveal whether the comment came from an impatient, less educated, or hurried constituent, or a bot designed to fool the system. Unless the bot makes an obvious mistake—such as failing to replace a template placeholder like {STATE} with the name of an actual state—an individual message just doesn’t give you enough to go on.
This is where a sophisticated agency manager will want to examine the pattern of messages. Tell-tale patterns can emerge from the content of the messages as well as from the message metadata—for example, the timestamps, sender IP addresses, or email domains.
Within the content, some indicators of a bot’s presence can include repeated text, ranging from individual misspelled words to idiosyncratic turns of phrase and entire sections of duplicate prose. Moving up in sophistication, comparing the sentiment of the message both to submissions in prior comment pools and to the context of the issue can reveal hidden subversion of the process; extreme or lukewarm sentiment as a repeated pattern may indicate the presence of bot activity.
When examining message metadata, bots can reveal themselves in a myriad of ways. If a vast number of comments share the exact same browser and operating system versions, timestamps, email addresses, email domains, etc., it’s a sure sign of bot activity. Sometimes the repetition is more subtle, such as comments spaced out by precisely the same amount of time. Or the revealing facts may have nothing to do with repetition, such as when comments are coming from odd locations; if a California-based agency is getting lots of inbound comments from Arkansas, this is a sign of foul play.
State agency managers looking to detect these patterns and trends in large comment pools will find modern technology essential to success here. Humans simply can’t identify such patterns reliably or efficiently—in other words, you need a bot to find a bot. Review software levels the playing field, with tools such as duplicate and near-duplicate detection (for repeated content), sentiment analysis based on natural language processing (for mismatched sentiments), and data visualizations (for bringing metadata to life). Whether examining one message or millions, focused on the content or the metadata, these tools are indispensable in helping agency leaders make quick work of large comment pools.
Hordes of robots jamming citizen comment periods may inspire dread for the conscientious agency manager (and fellow citizens). However, taking care to understand the distinctions and patterns in bot-submitted comments will give the sophisticated manager an edge. With the right tools, and a thoughtful review process, the agency can separate fact from fiction, and get to the heart of constituent’s thoughts on the issues of the day.
Jonathan Kerry-Tyerman is a vice president at Berkeley, California-based Everlaw.
NEXT STORY: Smart grid research on Chattanooga's muni-broadband network