Screenshot from Wikipedia |
There is no question we live in the information age and devices are the key tools used by the worker of the information age. A life without them is hardly imaginable at this point – both at work and off work. Kids ask their parents how people met before they had mobile phones. Or what is a fax. I recently found out about that generation gap when our 8 year old daughter asked me if a rotary phone is really a phone and if it was how to use it.
The irony that till today almost all input into our devices happens through keyboards. We need keyboards so the devices understand us. Even if you think back to early computing – the punch card was the medium of choice to store information and program – and load them to devices.
So the keyboard came along as well and as one of the earliest testaments for the importance of backward compatibility, loaned its design from the mechanical typewriters. Not sure who made the decision but the came along with a short term pro and a mega long term con. The short term pro was – it was easy for people to use the keyboard layout – as they were used to it from the mechanical typewriter. The mega long term con (that we suffer from till today) was, that the key layout of the typewriter was ultimately designed in a way to not be able to type too fast. Correct – not type too fast. A lot of research of early typewriter keyboard layouts went into creating a layout that would avoid the back then ‘blue screen of death’ – the mechanical jam of the typewriter’s hammers. One can imagine the productivity impact of such a crash was significant – unclog the hammers, clean fingers and get back to typing – a little slower this time. Still a faster recovery than the one from the PC blue screen of death. But even in the earliest computer times, there was no need to throttle the human typing speed.
And humans are extraordinary at adapting and learning. Ever seen an adept teenager tying on a T9 keyboard – beating many people typing on a regular keyboard day in and day out from an accuracy and speed perspective. Or the most recent trend to solve the input problem – the swipe across the keyboard. Saves the time to lift fingers – and let’s software help the understanding on what was supposed to be type. How fast that can be can be seen in the recent Guinness world record that Microsoft established for Windows Phone 8.1
And now we are seeing the rise of voice. First popular in the late 90ies – but it never took over the PC. And even with voice recognition now being part of Windows 8 – with no additional charge – voice never took over on a PC. The reason might be the multi-tasking nature of the PC – voice recognition only gets really good when knowing the context of the voice being heard – and PCs are used for multiple things at the same time. Smartphones though are usually only being used in one context (even though they can multitask) – and that makes voice recognition much easier to master. And of course the form factor, the disappearance of the physical keyboard all played hand in hand for the rise of voice.
As mentioned – Siri made the start – but interesting enough you see very few iPhone users using voice as their dominant input method. It is largely used as a search entry replacement – often in a social setting. Coupled with the prestige and coolness factor of Siri – the search results are often entertaining. Then came Google with Now – and that moved the yardstick quite a bit. In my unscientific and not representative samples I see Android users talking more to their phones for text input than iPhone users. .I even know a (in fairness dictation trained lawyer) that handles almost all smartphone input activity via voice.
And now it’s Microsoft with Cortana. As almost a tradition, Microsoft is not early in the game – but a later follower – with that it has the chance to get things right and differentiated from existing products. Being able to interact with Cortana also via keyboard – not just voice – is definitively an improvement that takes into account that people expect answers not only on a spoken context – but also in settings when you cannot speak (e.g. when in a meeting). From the developer angle, opening up Cortana APIs for specific jargon, words and context is also a differentiating move. Moreover Cortana can take notes and make turn them into reminders. Through pure coincidence I had lunch with the PM team of Cortana at the build conference – and it was interesting to see and learn how well planned the differentiating features were put in place. Having Cortana pro-actively tell you e.g. the latest weather forecast because she noticed you always ask this around 7 AM… is just another example.
At the end of the day voice tools like Apple’s Siri, Google Now and Microsoft Cortana need to get voice recognition, context and then intent right.
- Getting the voice side right is pretty much a table stake.
- A great search engine helps to get the context right – as Google has shown. And here Microsoft may have an advantage over Apple, but unlikely in comparison to Google. But having the largest email and calendaring platform with Office is a huge bonus on the other side.
- Getting the intent is largely depending on the context – and there smartphones capturing information on location, movement, applications are a very important help.
How Microsoft manages to create additional value and differentiation for Cortana beyond that– we will have to see.
At the end of the day voice recognition is all about getting the prediction of intent right. When it hits the sweet spot it is unbelievably cool. When it misses by a little, the results are – silly. And in order to increase prediction quality, voice recognition providers need to encroach into areas that are usually tucked under the cloth of privacy. Getting the mix right and not becoming creepy is the art to get right.
When will we know voice recognition has arrived? Well when we see no more QWERTY keyboards being sold. No keyboard accessory business. Cortana will be certainly a system that will fight the keyboards. Will we see the keyboards come back with a faster to type design – we will see. For now we certainly can say that we will use our voice chords more often than our fingertips.
----
P.S. Siri and Cortana are female, Google Now the voice is determined by user setup. Cortana was loaned from the Halo game – so she has a physical appearance. How Microsoft got that by concerns on stereotypes and gender thinking is something I am still pondering on.