Artificial vocal learning guided by speech recognition: What it may tell us about how children learn to speak

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Artificial vocal learning guided by speech recognition: What it may tell us about how children learn to speak

Xu, Anqi; Van Niekerk, Daniel R; Gerazov, Branislav; Krug, Paul Konstantin; Birkholz, Peter; Prom-On, Santitham; Halliday, Lorna F; (2024) Artificial vocal learning guided by speech recognition: What it may tell us about how children learn to speak. Journal of Phonetics , 105 , Article 101338. 10.1016/j.wocn.2024.101338. Green open access

[thumbnail of Xu_1-s2.0-S0095447024000445-main.pdf]

Preview

Text
Xu_1-s2.0-S0095447024000445-main.pdf
Download (6MB) | Preview

Abstract

It has long been a mystery how children learn to speak without formal instructions. Previous research has used computational modelling to help solve the mystery by simulating vocal learning with direct imitation or caregiver feedback, but has encountered difficulty in overcoming the speaker normalisation problem, namely, discrepancies between children’s vocalisations and that of adults due to age-related anatomical differences. Here we show that vocal learning can be successfully simulated via recognition-guided vocal exploration without explicit speaker normalisation. We trained an articulatory synthesiser with three-dimensional vocal tract models of an adult and two child configurations of different ages to learn monosyllabic English words consisting of CVC syllables, based on coarticulatory dynamics and two kinds of auditory feedback: (i) acoustic features to simulate universal phonetic perception (or direct imitation), and (ii) a deep-learning-based speech recogniser to simulate native-language phonological perception. Native listeners were invited to evaluate the learned synthetic speech with natural speech as baseline reference. Results show that the English words trained with the speech recogniser were more intelligible than those trained with acoustic features, sometimes close to natural speech. The successful simulation of vocal learning in this study suggests that a combination of coarticulatory dynamics and native-language phonological perception may be critical also for real-life vocal production learning.

Type:	Article
Title:	Artificial vocal learning guided by speech recognition: What it may tell us about how children learn to speak
Open access status:	An open access version is available from UCL Discovery
DOI:	10.1016/j.wocn.2024.101338
Publisher version:	https://doi.org/10.1016/j.wocn.2024.101338
Language:	English
Additional information:	Copyright © 2024 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Keywords:	computational modelling of vocal learning, phonological perception, coarticulation, speech acquisition, articulatory synthesis
UCL classification:	UCL UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences > Speech, Hearing and Phonetic Sciences
URI:	https://discovery-pp.ucl.ac.uk/id/eprint/10192527

Downloads since deposit

408Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item