An Investigation of the Target Approximation Model for Tone Modeling and Recognition in Continuous Mandarin Speech

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

An Investigation of the Target Approximation Model for Tone Modeling and Recognition in Continuous Mandarin Speech

Gao, Y; Zhang, X; Xu, Y; Zhang, J; Birkholz, P; (2020) An Investigation of the Target Approximation Model for Tone Modeling and Recognition in Continuous Mandarin Speech. In: Meng, H and Xu, B and Zheng, T, (eds.) Proceedings of Interspeech 2020. (pp. pp. 1913-1917). International Speech Communication Association (ISCA): Shanghai, China. Green open access

[thumbnail of Gao_etAl_Interspeech2020.pdf]

Preview

Text
Gao_etAl_Interspeech2020.pdf - Published Version
Download (338kB) | Preview

Abstract

The complex f0 variations in continuous speech make it rather difficult to perform automatic recognition of tones in a language like Mandarin Chinese. In this study, we tested the use of target approximation model (TAM) for continuous tone recognition on two datasets. TAM simulates f0 production from the articulatory point of view and so allow to discover the underlying pitch targets from the surface f0 contour. The f0 contour of each tone represented by 30 equidistant points in the first dataset was simulated by the TAM model. Using a support vector machine (SVM) to classify tones showed that, compared to the representation by 30 f0 values, the estimated three-dimensional TAM parameters had a comparable performance in characterizing tone patterns. The TAM model was further tested on the second dataset containing more complex tonal variations. With equal or a fewer number of features, the TAM parameters provided better performance than the coefficients of the cosine transform and a slightly worse performance than the statistical f0 parameters for tone recognition. Furthermore, we investigated bidirectional LSTM neural network for modelling the sequential tonal variations, which proved to be more powerful than the SVM classifier. The BLSTM system incorporating TAM and statistical f0 parameters achieved the best accuracy of 87.56%.

Type:	Proceedings paper
Title:	An Investigation of the Target Approximation Model for Tone Modeling and Recognition in Continuous Mandarin Speech
Event:	Interspeech 2020
Open access status:	An open access version is available from UCL Discovery
DOI:	10.21437/Interspeech.2020-2823
Publisher version:	https://doi.org/10.21437/Interspeech.2020-2823
Language:	English
Additional information:	This version is the version of record. For information on re-use, please refer to the publisher’s terms and conditions.
UCL classification:	UCL UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences > Speech, Hearing and Phonetic Sciences
URI:	https://discovery-pp.ucl.ac.uk/id/eprint/10118765

Downloads since deposit

5,396Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item