UCL Discovery Stage
UCL home » Library Services » Electronic resources » UCL Discovery Stage

Using a deep neural network to speed up a model of loudness for time-varying sounds

Schlittenlacher, Josef; Turner, Richard E; Moore, Brian CJ; (2020) Using a deep neural network to speed up a model of loudness for time-varying sounds. In: Proceedings of the International Symposium on Auditory and Audiological Research. (pp. pp. 133-140). International Symposium on Auditory and Audiological Research Green open access

[thumbnail of abigailkressner-367-article-edited--revised-final.pdf]
Preview
Text
abigailkressner-367-article-edited--revised-final.pdf - Published Version

Download (373kB) | Preview

Abstract

The “time-varying loudness (TVL)” model calculates “instantaneous loudness” every 1 ms, and this is used to generate predictions of short-term loudness, the loudness of a short segment of sound such as a word in a sentence, and of long-term loudness, the loudness of a longer segment of sound, such as a whole sentence. The calculation of instantaneous loudness is computationally intensive and real-time implementation of the TVL model is difficult. To speed up the computation, a deep neural network (DNN) has been trained to predict instantaneous loudness using a large database of speech sounds and artificial sounds (tones alone and tones in white or pink noise), with the predictions of the TVL model as a reference (providing the "correct" answer, specifically the loudness level in phons). A multilayer perceptron with three hidden layers was found to be sufficient, with more complex DNN architecture not yielding higher accuracy. After training, the deviations between the predictions of the TVL model and the predictions of the DNN were typically less than 0.5 phons, even for types of sounds that were not used for training (music, rain, animal sounds, washing machine). The DNN calculates instantaneous loudness over 100 times more quickly than the TVL model.

Type: Proceedings paper
Title: Using a deep neural network to speed up a model of loudness for time-varying sounds
Location: Nyborg
Open access status: An open access version is available from UCL Discovery
Publisher version: https://proceedings.isaar.eu/index.php/isaarproc/a...
Language: English
Additional information: © The Authors 2023. Original content in this paper is licensed under the terms of the Creative Commons Attribution 3.0 Unported (CC BY 3.0) Licence (https://creativecommons.org/licenses/by/3.0/).
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences > Speech, Hearing and Phonetic Sciences
URI: https://discovery-pp.ucl.ac.uk/id/eprint/10181600
Downloads since deposit
315Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item