Deep spoken keyword spotting: An overview

I López-Espejo, ZH Tan, JHL Hansen, J Jensen - IEEE Access, 2021 - ieeexplore.ieee.org
Spoken keyword spotting (KWS) deals with the identification of keywords in audio streams
and has become a fast-growing technology thanks to the paradigm shift introduced by deep …

Micronets: Neural network architectures for deploying tinyml applications on commodity microcontrollers

C Banbury, C Zhou, I Fedorov… - … of machine learning …, 2021 - proceedings.mlsys.org
Executing machine learning workloads locally on resource constrained microcontrollers
(MCUs) promises to drastically expand the application space of IoT. However, so-called …

Keyword transformer: A self-attention model for keyword spotting

A Berg, M O'Connor, MT Cruz - arXiv preprint arXiv:2104.00769, 2021 - arxiv.org
The Transformer architecture has been successful across many domains, including natural
language processing, computer vision and speech recognition. In keyword spotting, self …

Broadcasted residual learning for efficient keyword spotting

B Kim, S Chang, J Lee, D Sung - arXiv preprint arXiv:2106.04140, 2021 - arxiv.org
Keyword spotting is an important research field because it plays a key role in device wake-
up and user interaction on smart devices. However, it is challenging to minimize errors while …

Streaming keyword spotting on mobile devices

O Rybakov, N Kononenko, N Subrahmanya… - arXiv preprint arXiv …, 2020 - arxiv.org
In this work we explore the latency and accuracy of keyword spotting (KWS) models in
streaming and non-streaming modes on mobile phones. NN model conversion from non …

Widening access to applied machine learning with tinyml

VJ Reddi, B Plancher, S Kennedy, L Moroney… - arXiv preprint arXiv …, 2021 - arxiv.org
Broadening access to both computational and educational resources is critical to diffusing
machine-learning (ML) innovation. However, today, most ML resources and experts are …

Matchboxnet: 1d time-channel separable convolutional neural network architecture for speech commands recognition

S Majumdar, B Ginsburg - arXiv preprint arXiv:2004.08531, 2020 - arxiv.org
We present an MatchboxNet-an end-to-end neural network for speech command
recognition. MatchboxNet is a deep residual network composed from blocks of 1D time …

Lightweight neural architecture search for temporal convolutional networks at the edge

M Risso, A Burrello, F Conti, L Lamberti… - IEEE Transactions …, 2022 - ieeexplore.ieee.org
Neural Architecture Search (NAS) is quickly becoming the go-to approach to optimize the
structure of Deep Learning (DL) models for complex tasks such as Image Classification or …

Small-footprint keyword spotting on raw audio data with sinc-convolutions

S Mittermaier, L Kürzinger… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
Keyword Spotting (KWS) enables speech-based user interaction on smart devices. Always-
on and battery-powered application scenarios for smart devices put constraints on hardware …

Wav2kws: Transfer learning from speech representations for keyword spotting

D Seo, HS Oh, Y Jung - IEEE Access, 2021 - ieeexplore.ieee.org
With the expanding development of on-device artificial intelligence, voice-enabled devices
such as smart speakers, wearables, and other on-device or edge processing systems have …