UCL Discovery Stage
UCL home » Library Services » Electronic resources » UCL Discovery Stage

Analysing Longitudinal Social Science Questionnaires: Topic modelling with BERT-based Embeddings

Sharifian-Attar, Vida; De, Suparna; Jabbari, Sanaz; Li, Jenny; Moss, Harry; Johnson, Jon; (2023) Analysing Longitudinal Social Science Questionnaires: Topic modelling with BERT-based Embeddings. In: 2022 IEEE International Conference on Big Data (Big Data). IEEE: Osaka, Japan. Green open access

[thumbnail of Analysing Longitudinal Social Science Questionnaires.pdf]
Preview
Text
Analysing Longitudinal Social Science Questionnaires.pdf - Accepted Version

Download (2MB) | Preview

Abstract

Unsupervised topic modelling is a useful unbiased mechanism for topic labelling of complex longitudinal questionnaires covering multiple domains such as social science and medicine. Manual tagging of such complex datasets increases the propensity of incorrect or inconsistent labels and is a barrier to scaling the processing of longitudinal questionnaires for provision of question banks for data collection agencies. Towards this effort, we propose a tailored BERTopic framework that takes advantage of its novel sentence embedding for creating interpretable topics, and extend it with an enhanced visualisation for comparing the topic model labels with the tags manually assigned to the question literals. The resulting topic clusters uncover instances of mislabelled question tags, while also enabling showcasing the semantic shifts and evolution of the topics across the time span of the longitudinal questionnaires. The tailored BERTopic framework outperforms existing topic modelling baselines for the quantitative evaluation metrics of topic coherence and diversity, while also being 18 times faster than the next best-performing baseline.

Type: Proceedings paper
Title: Analysing Longitudinal Social Science Questionnaires: Topic modelling with BERT-based Embeddings
Event: 2022 IEEE International Conference on Big Data (Big Data)
Dates: 17 Dec 2022 - 20 Dec 2022
Open access status: An open access version is available from UCL Discovery
DOI: 10.1109/bigdata55660.2022.10020678
Publisher version: https://doi.org/10.1109/BigData55660.2022.10020678
Language: English
Additional information: This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Analytical models, Computational modeling, Social sciences, Semantics, Data visualization, Manuals, Coherence
UCL classification: UCL
URI: https://discovery-pp.ucl.ac.uk/id/eprint/10165551
Downloads since deposit
12,300Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item