Huckvale, M;
Liu, Z;
Buciuleac, C;
(2023)
Automated voice pathology discrimination from audio recordings benefits from phonetic analysis of continuous speech.
Biomedical Signal Processing and Control
, 86
(Part B)
, Article 105201. 10.1016/j.bspc.2023.105201.
Preview |
Text
Huckvale_1-s2.0-S1746809423006341-main.pdf Download (1MB) | Preview |
Abstract
In this paper we evaluate the hypothesis that automated methods for diagnosis of voice disorders from speech recordings would benefit from contextual information found in continuous speech. Rather than basing a diagnosis on how disorders affect the average acoustic properties of the speech signal, the idea is to exploit the possibility that different disorders will cause different acoustic changes within different phonetic contexts. Any differences in the pattern of effects across contexts would then provide additional information for discrimination of pathologies. We evaluate this approach using two complementary studies: the first uses a short phrase which is automatically annotated using a phonetic transcription, the second uses a long reading passage which is automatically annotated from text. The first study uses a single sentence recorded from 597 speakers in the Saarbrucken Voice Database to discriminate structural from neurogenic disorders. The results show that discrimination performance for these broad pathology classes improves from 59% to 67% unweighted average recall when classifiers are trained for each phone-label and the results fused. Although the phonetic contexts improved discrimination, the overall sensitivity and specificity of the method seems insufficient for clinical application. We hypothesise that this is because of the limited contexts in the speech audio and the heterogeneous nature of the disorders. In the second study we address these issues by processing recordings of a long reading passage obtained from clinical recordings of 60 speakers with either Spasmodic Dysphonia or Vocal fold Paralysis. We show that discrimination performance increases from 80% to 87% unweighted average recall if classifiers are trained for each phone-labelled region and predictions fused. We also show that the sensitivity and specificity of a diagnostic test with this performance is similar to other diagnostic procedures in clinical use. In conclusion, the studies confirm that the exploitation of contextual differences in the way disorders affect speech improves automated diagnostic performance, and that automated methods for phonetic annotation of reading passages are robust enough to extract useful diagnostic information.
Type: | Article |
---|---|
Title: | Automated voice pathology discrimination from audio recordings benefits from phonetic analysis of continuous speech |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1016/j.bspc.2023.105201 |
Publisher version: | https://doi.org/10.1016/j.bspc.2023.105201 |
Language: | English |
Additional information: | © 2023 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
Keywords: | Voice disorders,Continuous speech, Machine learning, Diagnosis, Health applications |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences > Speech, Hearing and Phonetic Sciences |
URI: | https://discovery-pp.ucl.ac.uk/id/eprint/10181492 |
Archive Staff Only
View Item |