Smart Home Interfaces: Voice, Gesture, or Other?


The smart home has been more of a hope than a reality. Futurists envisioned that consumers would save time and money at home thanks to a proliferation of smart, connected devices, but platform fragmentation and user interface (UI) issues slowed user adoption. First-generation interfaces were limited. Smartphones, tablets, and other touch interfaces largely failed because they provided limited convenience for users at home. Smart home consumers grew frustrated as control actions required users to take multiple steps, such as unlocking their devices and opening dedicated apps, to accomplish anything.

Then came Alexa. Amazon’s artificial intelligence (AI)-powered Alexa devices have shown how the voice interface is ideal for the smart home and how virtual digital assistants (VDAs) are an ideal universal controller for smart devices within the smart home. VDAs are actively listening, so a user can tell the smart home what to do even from across a room. All of this means that the smart home is finally on the verge of becoming a reality for the mass market.

But is voice the UI that will reign over all? Tractica believes that while voice will be an important UI within the smart home, other AI-fueled UIs, such as gesture and biometrics, will play important roles in the smart home. This amalgamation of UIs also could mean there will be multiple players within any given home, and a reduced chance of there being a dominant smart home UI platform provider. Touch/text interfaces, which can incorporate AI in other use cases, are not considered an AI-driven UI for the smart home. This blog post lays out a forecast for smart home UIs and an overview of the key AI-fueled UIs. For more detail, you can read Tractica’s full white paper on this topic.

Strong Growth Forecast

Tractica forecasts that the installed base of smart home devices leveraging AI-fueled UI interfaces will grow from 43.7 million devices in 2017 to 860.2 million devices in 2022. Voice UI will represent an outsized proportion of smart home device UIs for the next few years, but Tractica estimates that, by 2020, more than 25% of smart home devices will use gesture, biometrics, or multimodal UIs. By 2022, the percentage of smart home devices using non-voice UIs will grow to nearly 35%. Tractica believes the global smart home market will support a diverse range of UI platform options, and that no one platform will dominate the global market.

User Interface: Voice

It is important to understand that voice UI is most compelling in private, as opposed to public, settings: in homes, private cars, and enclosed offices. Mass transit, open offices, or other public spaces are not ideal for the voice interface because of the lack of privacy and the amount of ambient/conflicting noise. In the smart home, the voice UI provides convenience; thanks to far-field communications, voice UI allows users to control something as long as it can hear the user. For example, a user can adjust lights, temperature, or entertainment without moving to manual control panels. Voice UI is ideal for enabling hands-free computing, allowing users to follow recipes or cooking instructions easily.

While the benefits are compelling, challenges still remain for voice UI. The main issue is understanding human language. At the front end, while IoT speaker sensors continue to improve, accurate speech recognition can be an issue due to mumbling, accented speech, and ambient noise. At the back end, natural language processing (NLP) algorithms are challenged by the context of human communication. Sarcasm and irony are difficult for machines to translate. For most voice UI today, the tone of one’s voice is not considered, though there are strides being made in emotion recognition.

Another challenge for voice UI will be security and privacy. Today, most voice UIs are designed to “wake” to a spoken keyword, regardless of who speaks it, and most voice UIs are not equipped to conduct voice recognition. Voice UIs will have to evolve to enable specific users.

User Interface: Gesture

Intuitive and universally understood, gesture UI holds promise as a major component of the smart home, with the potential to someday offer consumers a remote-less environment.

Gesture control is based on computer vision (CV) algorithms and relies heavily on camera capabilities. Usually, gesture UI apps have to stitch together multiple camera views to interpret gestures accurately in 3D. The challenges are particularly an issue for interpreting hand gestures. A 3D detector may have to track as many as 30 different points on the human hand that could spawn up to trillions of possible combinations to track. 3D systems with one camera are highly dependent on lighting conditions and the information available could vary significantly depending on the source of light. Computer algorithms must account for shading, color, and texture that can vary quite a bit depending on the light conditions.

There are also limitations in terms of consistency and repeatability of results. Though current systems are well equipped to provide reasonable accuracy, they are not 100% efficient. The right gesture producing the wrong result could be disastrous for home control automation.

Latency can also be an issue. It is possible to overcome the lag by capturing the image at a very high frame rate, although current cameras have limitations in terms of how many frames they can capture. Current cameras are limited to 60 frames per second (fps) or 120 fps, whereas it is desirable to have 240 fps and higher to generate reasonable latency. Such cameras are expensive, although the price is expected to decrease with time.

However, progress is being made to overcome these challenges. Clay is a software development kit (SDK) now available that can accurately track hand gestures in hardware equipped with a smartphone-quality camera and a 64-bit processor. According to a blog post in Electronic House, a Clay-enabled device “learns to distinguish a user’s hands from the background and track its 3D position and pose entirely from 2D information. Most importantly, this process is fast and efficient; on an iPhone 7, it takes roughly 12 milliseconds to process a gesture from sensor input to the app, using just 9% of [central processing unit] CPU power.” Clay is offering gesture UI for automotive and augmented reality (AR)/virtual reality (VR) applications and says consumer electronics applications are coming soon.

Several smart home players are offering gesture UIs today, including Ubiquilux (lighting), singlecue Gen 2 (Nest thermostat), SWIPE from FIBARO, Aura, Otodo, and Piccolo.

User Interface: Biometrics

The biometric UI holds the promise of an effortless UI. The biometric UI will come into play in the smart home, primarily in the form of sleep technology and smart beds.

Next-Generation User Interface: Emotion Analysis

The next AI-driven UI for the smart home could be emotion analysis, based on facial recognition, speech, voice tone, and biometrics. Innovators such as Beyond Verbal, which envisions a VDA who listens to your conversations and, based on tone and content, interprets your emotional state and acts as your health coach, or the Massachusetts Institute of Technology’s (MIT) EQ-Radio, which “can infer a person’s emotions using wireless signals. It transmits an RF signal and analyzes its reflections off a person’s body to recognize his emotional state (happy, sad, etc.). The key enabler underlying EQ-Radio is a new algorithm for extracting the individual heartbeats from the wireless signal at an accuracy comparable to on-body [electrocardiogram] ECG monitors.”

In a future blog post, Tractica will outline some of the key smart home use cases for which these AI-driven UIs make sense.

Comments are closed.