Algorithms can be gamed, but experienced humans can help
Connecting state and local government leaders
Researchers studying patent examiners found that machine learning is biased toward finding textually similar innovations, but domain expertise is needed to surface the most relevant prior art.
Researchers studying artificial intelligence and machine learning looked at the work of patent examiners to determine the degree to which ML systems can be “gamed” by individuals.
ML algorithms typically used to review resumes or insurance claims have been trained to look for specific phrases that indicate competence or experience, in the case of a job applicant for example. A savvy job hunter, then, could leverage an algorithm’s bias by including relevant but false degrees or certifications. An insurance applicant could deliberately omit mention of prior accidents.
Could ML algorithms spot and correct for deliberately false inputs?
For their study, the researchers from the University of Maryland and Harvard Business School turned to the U.S. Patent and Trademark Office whose patent examiners “face a time-consuming challenge of accurately determining the novelty and non-obviousness of a patent application,” the wrote. To help examiners find relevant prior art faster, USPTO uses ML to “read” the text of patent applications and compare it to textually similar innovations included in ever-expanding databases of ‘prior art,’ or all the existing patents, patent applications, publications and documentation related to applicant’s product. That process allows examiners to determine if there is a “silver bullet” -- an already-patented invention whose existence would essentially kill this specific application.
To make their innovations appear new, some unscrupulous applicants try to game the system by including extraneous information or omitting relevant citations. They can also create hyphenated words and assign new meanings to existing words to better explain their novel and non-obvious inventions. The introduction of new vocabularies makes it “exceptionally difficult for ML to make reliable predictions about a future that is unfamiliar to its training dataset,” the researchers said.
The researchers found that it is “practically impossible” to train an ML algorithm to spot incomplete or inconsistent patent applications on its own – even given the fact that the technology can learn and correct for manipulations it finds.
The ML benefitted strongly, they said, from collaboration with humans -- but not just any humans. Those with broad skills and deep knowledge in a specific domain, those who can draw on relevant outside information and those with “vintage specific skills” -- in this case meaning long experience using ML technologies -- can better mitigate bias stemming from applicant manipulation.
“The promise of ML technology in the patent examination context lies in its ability to make superior predictions by identifying a narrower, more relevant distribution of prior art,” the researchers concluded. “However, when patent applications are characterized by (plausibly strategic) input incompleteness, the ML technology may be more likely to make biased predictions without domain specific expertise.”
The full paper, “Machine Learning and Human Capital Complementarities: Experimental Evidence on Bias Mitigation,” by Prithwiraj Choudhury of Harvard Business School, and Evan Starr and Rajshree Agarwal of the University of Maryland’s Robert H. Smith School of Business, can be found here.