downloadGroupGroupnoun_press release_995423_000000 copyGroupnoun_Feed_96767_000000Group 19noun_pictures_1817522_000000Member company iconResource item iconStore item iconGroup 19Group 19noun_Photo_2085192_000000 Copynoun_presentation_2096081_000000Group 19Group Copy 7noun_webinar_692730_000000Path
Skip to main content
Default Banner Image

voice first

The seemingly simple act of commanding consumer devices by voice is a choice that nearly 118 million Americans now make every day, according to a recent report from eMarketer, the digital marketing research firm.While the voice interface is convenient for users, its implementation comes at the potential loss of individual privacy. The reason? Always-on, always-connected voice-first devices such as Amazon Alexa and Google Home require a wall plug and an internet connection to powerful cloud processors, making it possible for cloud companies — however benignly — to collect data on personal habits, location and conversation that were never intended for sharing. Move processing to the edgeTo address concerns over user privacy, device designers are attempting to do more of the audio processing within the consumer device, rather than sending users’ voices into the cloud. Moving more processing to the edge is a trend across the Internet of Things (IoT) industry, and not just for voice data but for other types of sensitive or proprietary data as well.Yet designers have realized limited success because the conventional approach to always-listening edge processing is notoriously inefficient: It digitizes and processes 100% of incoming sound data even though up to 90% of the data is irrelevant noise. This digitize-first approach wastes vast amounts of system power digitizing and analyzing the audio signal as it searches for a wake word when there isn’t even speech present, making it impractical for use in small, battery-operated devices.Workarounds don’t workTackling this power issue is critical to keeping private data secure. Unfortunately, it’s also exceptionally difficult. Design engineers have tried workarounds to decrease power consumption in an always-listening system, including duty cycling and reducing the power of each individual component in the audio signal chain that handles the data. The reality is that these kinds of approaches don’t address the root cause of the problem: too much data.To truly tackle the problem, we need to change our approach to a system solution, not a component solution. By moving to a more efficient edge architecture that intelligently minimizes the amount of data that moves through the system, we can focus the system’s energy resources on analyzing voice and not on searching for a wake word in irrelevant noise. Analyze, THEN digitize It’s time to move away from the digitize-first approach that has dominated voice wake-up device architecture since the invention of voice-first applications.Inspired by the way the human brain efficiently filters incoming information, differentiating, for example, a dog bark from a baby’s cry, an ultra-low-power analog machine learning technology is changing this paradigm. For the first time, device designers can use low-power analog machine learning to detect which data are important for further processing and analysis prior to data digitization.Leveraging an analyze-first architecture, a new analog neuromorphic semiconductor platform allows the higher-power-processing components in the system to stay asleep until voice has actually been detected, and only then does it wake them to listen for a possible wake word.Delivering a post-microphone audio chain that draws as little as 25µA of current when always-listening and collecting preroll data, this analyze-first architecture allows designers to extend battery lifetime significantly. That’s the difference between smart earbuds that run for weeks instead of hours or a battery-powered smart speaker that runs for months instead of weeks.More importantly, it’s the difference between the current always-listening devices that indiscriminately record and send all sound data to the cloud, and one that has the localized intelligence to select and send only the relevant data, reducing the user’s vulnerability to the loss of private data.Balance convenience with privacyThe trade-off between making our lives easier and keeping our personal information private is a choice that we are asked to make throughout our day in a hundred different ways. Bringing more audio processing capability to the mobile device without draining the battery is the first step toward delivering more secure voice-first solutions. But to succeed in this effort, we must shift to a bio-inspired architecture that determines which data are important and requires further processing at the earliest point in the signal chain. Once we move to the analyze-first approach, only a small fraction of the tens of zettabytes of data collected by the forthcoming generation of always-on IoT devices will require further processing in the device and in the cloud.A better balance between cloud and edge processing is a better balance between convenience and privacy, and that’s a win for everyone.About the AuthorTom Doyle is CEO and founder of Aspinity. He brings over 30 years of experience in operational excellence and executive leadership in analog and mixed-signal semiconductor technology to Aspinity. Prior to Aspinity, Tom was group director of Cadence Design Systems’ analog and mixed-signal IC business unit, where he managed the deployment of the company’s technology to the world’s foremost semiconductor companies. Previously, Tom was founder and president of the analog/mixed-signal software firm, Paragon IC solutions, where he was responsible for all operational facets of the company including sales and marketing, global partners/distributors, and engineering teams in the US and Asia. Tom holds a B.S. in Electrical Engineering from West Virginia University and an MBA from California State University, Long Beach. For more information, please visit https://www.aspinity.com/Technology.Aspinity is a member of MEMS Sensors Industry Group (MSIG), a SEMI technology community, that enables the MEMS and sensor industry to address common challenges, innovate and accelerate business results.
Read More
Every day it seems like a new portable voice-first device is coming to market. From smart speakers small enough to fit in your pocket to tiny wireless earbuds and voice-activated TV remote controls, we are using voice increasingly to play music, select TV shows, turn on the lights or interact with our smart thermostat. While the popularity of voice-first interfaces has spawned massive diversity in device type, as long as these devices are portable, they have one thing in common: They’re battery-powered, and that could be a problem for consumers who are tired of frequently recharging or replacing batteries. Change the Architecture, Reduce the PowerThe issue lies in the traditional hardware architectures of today’s voice-first devices, which are notoriously inefficient when it comes to power consumption. Such devices rely on a “digitize-first” model of processing voice data in which the heaviest power-consumers, like the analog-to-digital converter (ADC) and the digital signal processor (DSP), do all the heavy lifting up front, right at the start of the audio signal chain. They continuously digitize and analyze 100% of the ambient sound data as they search for a wake word, even if speech is not present and the only sound is noise. Because voice is spoken randomly and sporadically, that continuous digitization of sound wastes up to 90% of battery power.To tackle the battery drain in portable voice-first devices, we need look no further than the human brain. Our brain processes sound very efficiently. Imagine that you are outside your house having a conversation with your neighbor. You are able to focus on what your neighbor is saying because your brain can differentiate between sounds that it should send to the deeper brain for speech processing and sounds that it shouldn’t bother processing further (e.g., dog barks, sirens or car traffic). The brain spends minimal energy up front to decide whether it should spend additional energy on processing down the line. In other words, it saves the most power-intensive processing only for the important sounds.We can mimic the brain’s approach to signal processing by enabling a new “analyze-first” architecture for voice-first devices. This analyze-first approach requires ultra-low-power analog processing technology that can differentiate voice from noise before the sound data is digitized. This keeps the higher-power capabilities in a voice-first system, such as the wake-word engine, in a low-power mode when just noise is present. This approach only wakes up the higher-power chips in the system, e.g., the DSP or ADC, when it detects speech. Like our brain, a voice-first system uses an analyze-first architecture to conserve energy most of the time, saving the heavy lifting, i.e., the wake-word listening, for times when speech is present. The analyze-first architectural approach to always-on listening analyzes the analog microphone prior to digitization, saving considerable power in portable voice-first devices that run on battery. This architectural shift to analyze-first is well worth the investment because it reduces the system’s power consumption in a battery-powered voice-first device by up to 10x. That’s the difference between a portable smart speaker that runs for a month on battery instead of a week or smart earbuds that last for a whole day instead of a few hours on a single charge. Longer battery life in portable voice-first devices generates more good will among consumers, creating another key differentiator for manufacturers engaged in the ultra-competitive race for more users.For more information on the analyze-first architectural approach to voice-first devices, please view our video.Tom Doyle is CEO and founder of Aspinity. He brings over 30 years of experience in operational excellence and executive leadership in analog and mixed-signal semiconductor technology to Aspinity. Prior to Aspinity, Tom was group director of Cadence Design Systems’ analog and mixed-signal IC business unit, where he managed the deployment of the company’s technology to the world’s foremost semiconductor companies. Previously, Tom was founder and president of the analog/mixed-signal software firm, Paragon IC solutions, where he was responsible for all operational facets of the company including sales and marketing, global partners/distributors, and engineering teams in the US and Asia. Tom holds a B.S. in Electrical Engineering from West Virginia University and an MBA from California State University, Long Beach. For more information, visit www.aspinity.com. Aspinity is a member of SEMI-MEMS Sensors Industry Group, which connects the MEMS and sensors supply network, allowing members to address common industry challenges and explore new markets.
Read More