UCL Discovery Stage
UCL home » Library Services » Electronic resources » UCL Discovery Stage

A 16-nm SoC for Noise-Robust Speech and NLP Edge AI Inference With Bayesian Sound Source Separation and Attention-Based DNNs

Tambe, T; Yang, EY; Ko, GG; Chai, Y; Hooper, C; Donato, M; Whatmough, PN; ... Wei, GY; + view all (2022) A 16-nm SoC for Noise-Robust Speech and NLP Edge AI Inference With Bayesian Sound Source Separation and Attention-Based DNNs. IEEE Journal of Solid-State Circuits 10.1109/JSSC.2022.3179303. (In press). Green open access

[thumbnail of A_16-nm_SoC_for_Noise-Robust_Speech.pdf]
Preview
Text
A_16-nm_SoC_for_Noise-Robust_Speech.pdf - Accepted Version

Download (5MB) | Preview

Abstract

The proliferation of personal artificial intelligence (AI) -assistant technologies with speech-based conversational AI interfaces is driving the exponential growth in the consumer Internet of Things (IoT) market. As these technologies are being applied to keyword spotting (KWS), automatic speech recognition (ASR), natural language processing (NLP), and text-to-speech (TTS) applications, it is of paramount importance that they provide uncompromising performance for context learning in long sequences, which is a key benefit of the attention mechanism, and that they work seamlessly in polyphonic environments. In this work, we present a 25-mm<inline-formula> <tex-math notation="LaTeX">$^2$</tex-math> </inline-formula> system-on-chip (SoC) in 16-nm FinFET technology, codenamed SM6, which executes end-to-end speech-enhancing attention-based ASR and NLP workloads. The SoC includes: 1) FlexASR, a highly reconfigurable NLP inference processor optimized for whole-model acceleration of bidirectional attention-based sequence-to-sequence (seq2seq) deep neural networks (DNNs); 2) a Markov random field source separation engine (MSSE), a probabilistic graphical model accelerator for unsupervised inference via Gibbs sampling, used for sound source separation; 3) a dual-core Arm Cortex A53 CPU cluster, which provides on-demand single Instruction/multiple data (SIMD) fast fourier transform (FFT) processing and performs various application logic (e.g., expectation&#x2013;maximization (EM) algorithm and 8-bit floating-point (FP8) quantization); and 4) an always-on M0 subsystem for audio detection and power management. Measurement results demonstrate the efficiency ranges of 2.6&#x2013;7.8 TFLOPs/W and 4.33&#x2013;17.6 Gsamples/s/W for FlexASR and MSSE, respectively; MSSE denoising performance allowing 6<inline-formula> <tex-math notation="LaTeX">$\times$</tex-math> </inline-formula> smaller ASR model to be stored on-chip with negligible accuracy loss; and 2.24-mJ energy consumption while achieving real-time throughput, end-to-end, and per-frame ASR latencies of 18 ms.

Type: Article
Title: A 16-nm SoC for Noise-Robust Speech and NLP Edge AI Inference With Bayesian Sound Source Separation and Attention-Based DNNs
Open access status: An open access version is available from UCL Discovery
DOI: 10.1109/JSSC.2022.3179303
Publisher version: https://doi.org/10.1109/JSSC.2022.3179303
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Task analysis , Computational modeling , Internet of Things , Source separation , Decoding , System-on-chip , Recurrent neural networks
UCL classification: UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Physics and Astronomy
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL
URI: https://discovery-pp.ucl.ac.uk/id/eprint/10150658
Downloads since deposit
6,690Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item