Researchers at Microsoft say they've made advances in speech recognition that gives the company's technology the lowest error percentage in the industry, at 6.3 percent, edging out IBM, which recently said it had achieved 6.6 percent.
It's a symbolic milestone for Microsoft although IBM and other vendors undoubtedly will make further improvements given the competitive stakes. The 6.3 percent benchmark was derived by pitting Microsoft's system against the industry standard NIST 2000 Switchboard set, and comes down to advancements in deep neural networks and other AI-related technologies, Microsoft said on its research website:
Some researchers now believe these technologies could soon reach a point where computers can understand the words people are saying about as well as another person would, which aligns with Microsoft’s strategy to provide more personal computing experiences through technologies such as its Cortana personal assistant, Skype Translator and speech- and language-related cognitive services.
The speech research is also significant to Microsoft’s overall artificial intelligence (AI) strategy of providing systems that can anticipate users’ needs instead of responding to their commands, and to the company’s overall ambitions for providing intelligent systems that can see, hear, speak and even understand, augmenting how humans work today.
Indeed, speech recognition has the potential to dramatically change the way we work, says Constellation VP and principal analyst Alan Lepofsky, who leads the company's research in the Future of Work.
"These days it's quite common to use your voice as an input method for your phone," Lepofsky says. "We ask Siri or Google Now questions using natural language such as, 'who won the baseball game last night?' Or what's the weather outside?' We can create email replies or answer text messages by dictating the content. What if we could do the same at work?"
"Shouldn't a sales rep be able to ask their computer, 'who are the most important accounts I need to speak with today?'" he says. "Shouldn't we be able to perform actions simply using our voice such as please book a meeting with Jim next week?"
Of course, there are issues at work such as privacy, Lepofsky adds. "There are certainly things that cannot be said out loud, no one is going to say out loud to their computer, please contact Steve about the our pending acquisition of Acme Corporation," he says. "There is also just the simple factor of politeness. What would the workplace be like if everyone was just speaking out loud to their computers?"
Such a notion was impossible 20 years ago, when the lowest word error rate for speech recognition systems was 43 percent.
It's important to note that Microsoft's benchmark comes from a research project and hence isn't reflective of its commercial products' performance—yet. But what's surprising how close to human-level speech recognition it is. As a ZDNet report notes, IBM has estimated humans have a four percent word recognition error rate. (Although perhaps it matters who's talking.)
24/7 Access to Constellation Insights
Subscribe today for unrestricted access to expert analyst views on breaking news.