Project Oxford APIs offer facial, voice recognition capabilities

By Derek Major

Connecting state and local government leaders

| May 14, 2015

Microsoft’s Project Oxford’s application programming interfaces and software development kits let developers add intelligent services to their apps.

If you’ve tried out Microsoft’s How-Old.net only to find its guess for your age was way off, you know facial recognition technology has a long way to go. But the cloud-based Project Oxford, the ‘brains’ behind How-Old.net, has some tools developers can use now to easily build intelligence -- like facial, speech recognition -- into their apps.

Project Oxford’s application programming interfaces and software development kits allow developers to add intelligent services into their solutions that leverage Microsoft's natural data understanding.

The face API uses algorithms to detect and recognize human faces in images, with the ability to determine gender and age. That allows developers to check whether two images of faces are similar enough to be the same person (authentication) or to compare an image to a user-provided face database. It could be the foundation of a security application or help law enforcement agencies identify victims and criminals.

Earlier this week the Department of State posted a for facial recognition software that would have the ability to match a photo on a passport or visa to the face of the person in possession of it. And in San Diego, the Encinitas Union School District started a new pilot program that uses facial recognition software for school issued iPads to simplify the login process for students (although that particular

In Hyderabad, India, city police will soon use facial recognition software on cameras located around the city to potentially identify terror suspects, property offenders and other criminals by matching images with a data bank of images of known offenders.

The Project Oxford speech API uses algorithms to process spoken language into text and text into audio -- functionality that could be used for a wide range of citizen-facing services, such as initial call center screening. Files of spoken audio are transmitted to the Microsoft's servers in the cloud, and a single text result it returned. There is also a client library that can be hosted locally, which allows for real-time streaming of audio and returns text as it is generated.

The vision API will produce visual features based on the input image's visual content -- categorizing the image, identifying dominant colors and even flagging image elements that might be inappropriate. The API can also and read text from an image, and creates thumbnail versions of the original image that are tailored to specific needs.

There’s also a Language Understanding Intelligent Service, offered as in invitation-only beta, that helps applications understand what users mean when they say or type something using natural, everyday language.

The services are currently available for limited free usage in beta (20 transactions per minute). They work across programming platforms and languages -- from Windows and Windows Phone to iOS and Android, Microsoft said. To try them out, a developer must have an Azure account.

NEXT STORY: How Digital Equity Is a Driving Force for Some Gigabit Cities

This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. / Do Not Sell My Personal Information

Accept Cookies