Easing analysis of crowdsourced data
Connecting state and local government leaders
The Civic CrowdAnalytics web application automates analysis of unstructured crowdsourced data using natural language processing and machine learning.
Determining the will of the people isn’t always easy, even when governments ask for -- and receive --input.
Specifically, there is no easy way to analyze and synthesize crowdsourced opinions. And as governments increase their use of crowdsourcing, the need for data analytics tools to manage this manually intensive process increases.
Today most crowdsourced input is processed manually. Because such analysis is so time consuming, many government entities have shied away from crowdsourcing initiatives altogether. Further, the lack of efficient analysis tools makes it difficult to determine whether citizens’ input is reflected in the final policymaking.
To address the issue, researchers created Civic CrowdAnalytics, a web application to automate analysis of unstructured crowdsourced data using natural language processing (NLP) and machine learning. Based on application programming interfaces from Hewlett-Packard Enterprise’s big data tool Haven onDemand, it allows the user to submit data sets and analyze them in various ways.
After the data is submitted, the application categorizes it using concept extraction. Users train the algorithm by first labeling main categories and subcategories, and then letting the algorithm categorize the rest of the data. Next, the data is analyzed for positive or negative sentiment, as expressed by particular words or expressions. Civic CrowdAnalytics displays the number of occurrences of sentiments and the association between similar ideas.
Using Civic CrowdAnalytics, researchers analyzed a crowdsourced policymaking process in Palo Alto, Calif., to determine the effectiveness of the application and to find to what extent crowdsourced citizens’ input ultimately shapes policymaking.
Those tests found that while NLP methods show promise, they aren’t quite ready for wide deployment. The effort and time involved in training the algorithm and its relatively low accuracy rate (80 percent) offset the speed at which Civic CrowdAnalytics can mine a large dataset.
“But the larger the dataset, the more meaningful it is to train the algorithm in the beginning,” researchers wrote. “Furthermore, once the algorithm is trained, it can analyze several datasets about similar topics with improved performance, so the city can use it constantly in analyzing their civic datasets.”
Training NLP methods for analyzing crowdsourced data would be improved if users “share their data and results online, so that other cities and actors could use the already trained algorithm for similar topics,” researchers concluded.
As for whether crowdsourcing opinions should be an integral part of policymaking, the researchers said they “found an interesting and unexpected result: the city government in Palo Alto mirrors more or less the online crowd’s voice while citizen representatives filter rather than mirror the crowd’s will.”
Secondly, “the results suggest that whether citizen voices are incorporated into the policy depend on the amount and the sentiment of their suggestions. When citizens have more demands with a stronger tone, the government pays more attention.” Conversely, citizen representatives were not influenced by volume demands and tone.
“It remains unclear why certain suggestions from the crowd are adapted to the policy, whereas some are not,” researchers said.