Articulatory Synthesis for Data Augmentation in Phoneme Recognition

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Articulatory Synthesis for Data Augmentation in Phoneme Recognition

Krug, PK; Birkholz, P; Gerazov, B; van Niekerk, DR; Xu, A; Xu, Y; (2022) Articulatory Synthesis for Data Augmentation in Phoneme Recognition. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH. (pp. pp. 1228-1232). International Speech Communication Association (ISCA): Incheon, Korea. Green open access

Preview

Text
krug22_interspeech.pdf - Published Version
Download (994kB) | Preview

Abstract

While numerous studies on automatic speech recognition have been published in recent years describing data augmentation strategies based on time or frequency domain signal processing, few works exist on the artificial extensions of training data sets using purely synthetic speech data. In this work, the German KIEL corpus was augmented with synthetic data generated with the state-of-the-art articulatory synthesizer VOCALTRACTLAB. It is shown that the additional synthetic data can lead to a significantly better performance in single-phoneme recognition in certain cases, while at the same time, the performance can also decrease in other cases, depending on the degree of acoustic naturalness of the synthetic phonemes. As a result, this work can potentially guide future studies to improve the quality of articulatory synthesis via the link between synthetic speech production and automatic speech recognition.

Type:	Proceedings paper
Title:	Articulatory Synthesis for Data Augmentation in Phoneme Recognition
Event:	Interspeech 2022
Open access status:	An open access version is available from UCL Discovery
DOI:	10.21437/Interspeech.2022-10874
Publisher version:	https://www.isca-speech.org/archive/interspeech_20...
Language:	English
Additional information:	This version is the version of record. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords:	Automatic speech recognition, phoneme recognition, articulatory speech synthesis, data augmentation
UCL classification:	UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences > Speech, Hearing and Phonetic Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences
URI:	https://discovery-pp.ucl.ac.uk/id/eprint/10158547

Downloads since deposit

5,772Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item