Relationships are built on communication and the relationship between humans and machines is no different. Each generation of technology is defined by the way it communicates with its human users; from the levers and cogs of yesteryear to the keyboards and mice of the 80s and 90s, and on to the touchscreens, trackpads, and gestures of the naughties (00s) and tens (10s). As our relationship with digital technology matures, so does our communication, and as we move into the 20s it sounds like the developing form of communication will be voice.
“One problem in this industry has been the separation of point-problem and point-solution providers. Until now, it has been a struggle to unify all of them on to one platform but voice does a great job at it because regardless of what the end device interface is, if you are able to create the architecture right then you are able to control all these devices with a single experience,” says Siddharth Bannerjee, CEO of Cosine Labs, a product development company that provides smart living solutions in residential, hospitality and commercial buildings.
Recent advances in voice-control, primarily forged by Amazon’s Alexa, Apple’s Siri, and Google’s Cortana, have created a seemingly endless market for voice as a human-machine interface. In addition to vehicles and mobile devices, today almost every type of building is promising to find its voice in the connected age.
“It now seems natural for voice to be in speakers, TVs, and set-top boxes but amazingly today, we are getting more and more customers asking for voice to be enabled in appliances such as water filters, heaters, and all kinds of other use cases,” says Hariharan Bojan, CEO, Sirena Technologies, who specialize in bringing Alexa to end products for third-parties.
Voice-control is now about much more than simply turning on and off the lights, but rather about having a conversation with the device to bring about a greater level of control with greater ease. You could tell your lighting system to “set the lights for dinner” and it may ask you “is it a romantic, social, or family dinner?” then illuminate accordingly. Through such a dialogue, it should be possible to reach your desired control settings through a painless, handsfree experience that is natural to the human mind.
“A lot of what we do is to create that conversation. Conversation means short, so it is digestible by the brain. Conservation also means we can interrogate at any point,” says Deepak Thomas, Chief Marketing Officer at Klove Chef, an AI-enabled recipe platform. “For example, I could say ‘how much chicken did you say, I forgot’ but if that conversation is not possible in a step-by-step guide then your lost and frustrated. The device should be able to say ‘that’s OK, it was 2.5 ounces in half-inch cubes’. That is conversation.”
To date, few would describe their voice-control experiences as “natural”. In fact, the early iterations of voice-control are symbolized by funny and frustrating mistakes, where even the calmest people with the most recognizable accents are left screaming “no Alexa, That’s not what I said”. However, alongside voice-recognition improvements, new features are emerging that allow devices to gauge your mood and meaning by the tone of your voice.
“As customers continue to use Alexa more often, they want her to be more conversational and can get frustrated when Alexa gets something wrong,” Amazon wrote in a blog post announcing new tone recognition features. “When she recognizes that you’re frustrated with her, Alexa can now try to adjust, just like you or I would do.”
Human-to-human conversation isn’t perfect but it is natural to us. It seems that ‘what you or I would do’ is the ultimate goal of this new generation of voice-control systems. Last year, Amazon introduced a feature that prompts Alexa to respond quietly if a command is whispered to it. “It’s magical when you walk in late to your bedroom and your wife is asleep,” Rohit Prasad, chief scientist for Alexa, said in an interview with OneZero. “Of course, the cost of a mistake is very high there,” he added.
For the user, this will seem like the assistant just understands you better, more like a human would. Behind the scenes, however, Amazon has created two additional deep neural networks to better analyze your voice, beyond the ones used to decipher your command. One network aims to recognize words that indicate frustration, such as “no” or “stop”, while the other will analyze the tone of your voice. “The tonality is a big feature. You could have said yes [in response to Alexa], but be sarcastic, right?” says Prasad.
While the ability to speak sarcastically to Alexa and her counterparts is appealing, it is just the tip-of-the-tongue for what voice-tone recognition can provide. In the healthcare sector, Toronto-based startup Winterlight Labs is training the technology to determine signs of dementia and other mental illnesses. While many major hotel chains see voice-integration as a way to fight back against the Airbnb travel-accommodation trend that has already gained 20% of the lodging market.
Just as touchscreens have emerged to largely replace the now old-fashioned keyboards and mice to enable more user-friendly mobile devices, voice now promises to be the new normal for human-machine interaction in the built environment. No need to walk across the room or pull a device out of your bag/pocket, with voice-control you can just ask for what you want and with new advances in the technology the systems should understand what you really mean, just like a human would.
The continued evolution of voice seems the inevitable next step in making human-machine communication more natural for the user. While voice still has limitations they will quickly reduce as systems continue to improve. Looking forward, we should expect voice-control to co-exist with touchscreens and keyboards for their various applications until machines learning improves to understand what we’re thinking.