Huckvale, M;
(2018)
Neural Network Architecture That Combines Temporal and Summative Features for Infant Cry Classification in the Interspeech 2018 Computational Paralinguistics Challenge.
In:
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2018.
(pp. pp. 137-141).
International Speech Communication Association (ISCA): Hyderabad, India.
Preview |
Text
is2018crying-final.pdf - Published Version Download (318kB) | Preview |
Abstract
This paper describes the application of a novel deep neural network architecture to the classification of infant vocalisations as part of the Interspeech 2018 Computational Paralinguistics Challenge. Previous approaches to infant cry classification have either applied a statistical classifier to summative features of the whole cry, or applied a syntactic pattern recognition technique to a temporal sequence of features. In this work we explore a deep neural network architecture that exploits both temporal and summative features to make a joint classification. The temporal input comprises centi-second frames of low-level signal features which are input to LSTM nodes, while the summative vector comprises a large set of statistical functionals of the same frames that are input to MLP nodes. The combined network is jointly optimized and evaluated using leave-one-speaker-out cross-validation on the challenge training set. Results are compared to independently-trained temporal and summative networks and to a baseline SVM classifier. The combined model outperforms the other models and the challenge baseline on the training set. While problems remain in finding the best configuration and training protocol for such networks, the approach seems promising for future signal classification tasks.
Type: | Proceedings paper |
---|---|
Title: | Neural Network Architecture That Combines Temporal and Summative Features for Infant Cry Classification in the Interspeech 2018 Computational Paralinguistics Challenge |
Event: | Annual Conference of the International Speech Communication Association, INTERSPEECH 2018 |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.21437/Interspeech.2018-1959 |
Publisher version: | https://doi.org/10.21437/Interspeech.2018-1959 |
Language: | English |
Additional information: | This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions. |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences > Speech, Hearing and Phonetic Sciences |
URI: | https://discovery-pp.ucl.ac.uk/id/eprint/10060444 |
Archive Staff Only
View Item |