Using crowds to teach AI to search smarter
Connecting state and local government leaders
A team at the University of Texas at Austin using crowdsourced input to train its machine-learning algorithms to create a more intelligent search engine.
As anyone who searched the internet in the late 1990s is aware, language-based search engines are getting steadily smarter and delivering more relevant results. Search engines still struggle when it comes to classifying some types of content, though, especially images, videos and language content that employs slang or jargon.
To help address such shortcomings, a team at the University of Texas at Austin is using crowdsourced input to train its machine-learning algorithms to create a more intelligent search engine.
“A lot of the machine learning today is built on the idea that the best way to transfer human knowledge into realizing intelligent systems is to have people provide lots of examples of what they want the system to do,” said Matthew Lease, associate professor in the School of Information. “The system then induces patterns and figures how to generalize that to new unseen examples.”
Since the accuracy of a machine-learning system is often driven by the quantity of the data, sometimes more than the algorithms themselves, Lease said that “anything that changes the scale of data that we can get is a game changer.” That is where crowdsourcing comes in.
By using people to read articles in medical journals and breaking news stories to extract and label the key details – events, people and places – the researchers can give the machine learning system more examples of correctly labeled content. “Crowdsourcing is a way that we have found to really ramp up the scale of label data that we collect," he said.
Conceding that crowds of lay people are less accurate than experts, Lease added that there are ways to check their work. “You may already have some examples of what you want, and you simply check how well the data you are collecting agrees with that,” he said. When the data from the crowd disagrees with the algorithm, it can be flagged for a human to check it.
Lease's team was able to train a neural network so it could accurately extract relevant information in unannotated texts, improving upon existing tagging and training methods. They also found they could estimate the quality of each crowdsourcer's work, which was not only useful for error analysis, but also let them identify the best person to annotate each particular text.
Lease said he even envisions tying crowdsourcing to search engines in real time. Taking a picture of plate of food and searching for the calorie count, for example, isn't yet possible because computer vision technology isn't advanced enough.
But with crowdsourcing, "we can let the machine learning system take us 80 percent of the way and then in real time reach out to the crowd to close the loop," he said. "The user using application might not even know … what is being done by AI and what is being done by the crowd.”
Lease has received grants from the National Science Foundation, the Institute of Museum and Library Services and the Defense Advanced Research Projects Agency to improve search engines through integrating crowdsourcing.
NEXT STORY: A public-service Skynet?