Using a deep neural network to speed up a model of loudness for time-varying sounds

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Using a deep neural network to speed up a model of loudness for time-varying sounds

Schlittenlacher, Josef; Turner, Richard E; Moore, Brian CJ; (2020) Using a deep neural network to speed up a model of loudness for time-varying sounds. In: Proceedings of the International Symposium on Auditory and Audiological Research. (pp. pp. 133-140). International Symposium on Auditory and Audiological Research Green open access

[thumbnail of abigailkressner-367-article-edited--revised-final.pdf]

Preview

Text
abigailkressner-367-article-edited--revised-final.pdf - Published Version
Download (373kB) | Preview

Abstract

The “time-varying loudness (TVL)” model calculates “instantaneous loudness” every 1 ms, and this is used to generate predictions of short-term loudness, the loudness of a short segment of sound such as a word in a sentence, and of long-term loudness, the loudness of a longer segment of sound, such as a whole sentence. The calculation of instantaneous loudness is computationally intensive and real-time implementation of the TVL model is difficult. To speed up the computation, a deep neural network (DNN) has been trained to predict instantaneous loudness using a large database of speech sounds and artificial sounds (tones alone and tones in white or pink noise), with the predictions of the TVL model as a reference (providing the "correct" answer, specifically the loudness level in phons). A multilayer perceptron with three hidden layers was found to be sufficient, with more complex DNN architecture not yielding higher accuracy. After training, the deviations between the predictions of the TVL model and the predictions of the DNN were typically less than 0.5 phons, even for types of sounds that were not used for training (music, rain, animal sounds, washing machine). The DNN calculates instantaneous loudness over 100 times more quickly than the TVL model.

Type:	Proceedings paper
Title:	Using a deep neural network to speed up a model of loudness for time-varying sounds
Location:	Nyborg
Open access status:	An open access version is available from UCL Discovery
Publisher version:	https://proceedings.isaar.eu/index.php/isaarproc/a...
Language:	English
Additional information:	© The Authors 2023. Original content in this paper is licensed under the terms of the Creative Commons Attribution 3.0 Unported (CC BY 3.0) Licence (https://creativecommons.org/licenses/by/3.0/).
UCL classification:	UCL UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Brain Sciences > Div of Psychology and Lang Sciences > Speech, Hearing and Phonetic Sciences
URI:	https://discovery-pp.ucl.ac.uk/id/eprint/10181600

Downloads since deposit

1,080Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item