IoT Blog
IoT Blog

How Internet of Things Voice Recognition Will Transform the Technology Landscape

Voice-activated commands are perhaps the most persistent concept across science fiction movies or TV shows. So much so that the ease of possibility that they represent became shorthand for “the future.” 

Well, that future has essentially arrived. People have become accustomed to speaking aloud to their phones, narrating text messages, ordering Siri to perform online searches and opening apps. But this is just the tip of the iceberg, as voice is evolving into an integral component of the Internet of Things (IoT). As we grow more connected, voice will become the unifying link between ourselves and our devices. 

Audio as an IoT Interface

Chances are, you’ve already used voice commands. After all:

  • 50% of US households use voice to access online content;
  • 62% of adults between 25-34 use voice to access online content; and
  • 75% of adults between 18-24 use voice to access online content.

WP - Blog CTA

We see that, not only are voice commands incredibly important for the present, they're also growing in demand. Pretty soon, we’ll have people coming of age who know voice commands as the dominant way to access information, whether that’s telling the TV what show to put on, asking what the weather will be like next week or ordering groceries for tomorrows dinner. 

We see these devices becoming both more sophisticated and ubiquitous, with voice-recognition technology like Siri, Alexa, Cortana, and Google Assistant leading the way. That’s because voice-enabled IoT holds a lot of unique possibilities, and Bluetooth audio technology is critical in some of these use cases to enable the audio device to remotely connect to a smart phone or other internet gateway to process the commands in the cloud. A device equipped with Bluetooth typically sends its audio file to a smartphone first, and the smartphone then transmits the file to the cloud using its cellular or Wi-Fi connection. Placing the voice-recognition engine in the cloud eliminates the need for dedicated resources in the device and creates a more efficient way to process audio and generate responses. The application possibilities span a wide variety of categories, including:

  • Consumer Electronics. The devices that actually power your smart home will be voice-activated, even without the assistant. Turn on the TV, lower the heat, start vacuuming the floor or pre-heat the oven. You will have many possibilities at your fingertips.
  • Connected Entertainment. Start a video on your phone, and tell it to mirror on your TV. Tell YouTube to look for videos. Tell the room that you want to hear Uptown Funk, and Bruno Mars will fill the space of your house. 
  • The Connected Car. Your car’s infotainment system is quickly becoming voice-enabled for safety and convenience. One day soon, you’ll hop into a self-driving car and simply state your destination...and it’ll take you there! 

In these examples, the car, consumer electronics and entertainment devices are all using Bluetooth to transfer your words to your phone or a gateway, which connects to the cloud.

The possibilities surrounding voice-controlled IoT devices are many and exciting. But some of the most transformative forms of voice recognition technology could be those that are just on the brink of development. One example would be the hospital pillow — an object that most would not consider to be very tech-savvy. But a connected pillow, activated by voice, would be great for patients with poor mobility or limited use of their limbs. They could just say “call nurse” or “turn on TV” or “close the blinds.” In a connected hospital, your smart pillow could serve as your remote control to comfort, because the pillow could be connected via Bluetooth to a gateway or phone that’s connected to the cloud. For those who suffer permanent mobility issues, such as those living with quadriplegia, IoT voice control could truly transform their quality of life in some extremely powerful ways. 

There’s a virtually limitless number of applications for voice-controlled, IoT-connected devices. If you are a company that makes devices with IoT applications, you’ll need to consider adding voice controls, or risk being left behind. But what are the technical requirements when transferring audio over Bluetooth? 

Effective IoT Voice Recognition Requires a Clean, Clear Sound
If you’ve never worked in sound, you may not realize why microphones usually have a fuzzy covering or a round filter (known in the industry as a “pop filter”). But those exist for a reason: they are designed to block out distortion from breath and wind. Extra noises can interfere with sound quality. Perhaps you’ve noticed that video chat sound quality is often far better when you have headphones with a mic. Otherwise, you’re left to use the computer’s in-built mic, which can pick up ambient sounds such as traffic, birds and your television in the background.

But what’s annoying during a video chat becomes a business-critical matter when dealing with voice-activated IoT over Bluetooth. Commands must be clear and understandable or the entire voice recognition tower collapses. You need a high quality voice codec, as well an echo and noise cancellation algorithm. You also need the ability to separate the actual voice command from the ambient sound — a task that’s surprisingly challenging, as the human brain does this automatically (although it’s a learned ability. Profoundly deaf individuals who hear for the first time thanks to bone conduction hearing aid technology will spend months learning how to “listen.” It’s very difficult learning how to separate a single sound from the cacophony of background noises.) 

Any voice-controlled application over Bluetooth needs to achieve this feat of distinguishing a voice from the background noises. You also need modules that ensure a clean, clear sound, so it can be picked up and interpreted by the voice recognition service. Once interpretation occurs, that command is transformed into a signal that’s passed along to the device. And this must happen quickly! Modern Bluetooth codecs like mSBC compress data to enable faster transmission but can still reproduce a high-quality audio signal. Not all codecs are equal though, and the quality of sound depends upon the quality of the codec. 

A default Bluetooth audio codec will have a latency time of 150 milliseconds, although it can vary in either direction by 100 milliseconds. That’s fast, but it has been incredibly improved by aptX, and aptX-LL, with a latency period of 40 milliseconds, and a +/- of only 10 milliseconds. Maintaining a constant latency ensures consistency of service. 

To enable this connectivity, you need the right modules that can work with Bluetooth or Wi-Fi devices. These modules, like the BC118, BC127, and BC188, come with regulatory certifications as well as integrated antennas for long-range communication (like say between smart construction helmets on large work sites).  There are three main factors when it comes to enabling voice-controlled IoT:

  1. How to improve audio quality; 
  2. How to maintain constant latency; and
  3. How to extend the range of your designs.

To be part of the voice-controlled IoT, you have to start with a few things. You have to start by understanding that the IoT is going to be hands-free, a pure extension of our digital/physical overlap. You have to understand how your customers will integrate your product into their voice-controlled lives. You have to start with the right modules that give you the speed, complexity, and consistent communication you need. And to do that, you can Start with Sierra; watch our on-demand webinar, Why Bluetooth Audio Matters for Wireless Applications to learn more.