Using natural language processing to facilitate the harmonisation of mental health questionnaires: a validation study using real-world data

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

Using natural language processing to facilitate the harmonisation of mental health questionnaires: a validation study using real-world data

McElroy, E; Wood, T; Bond, R; Mulvenna, M; Shevlin, M; Ploubidis, GB; Hoffmann, MS; (2024) Using natural language processing to facilitate the harmonisation of mental health questionnaires: a validation study using real-world data. BMC Psychiatry , 24 , Article 530. 10.1186/s12888-024-05954-2. Green open access

[thumbnail of Using natural language processing to facilitate the harmonisation of mental health questionnaires a validation study using r.pdf]

Preview

Text
Using natural language processing to facilitate the harmonisation of mental health questionnaires a validation study using r.pdf - Published Version
Download (1MB) | Preview

Abstract

Background: Pooling data from different sources will advance mental health research by providing larger sample sizes and allowing cross-study comparisons; however, the heterogeneity in how variables are measured across studies poses a challenge to this process. / Methods: This study explored the potential of using natural language processing (NLP) to harmonise different mental health questionnaires by matching individual questions based on their semantic content. Using the Sentence-BERT model, we calculated the semantic similarity (cosine index) between 741 pairs of questions from five questionnaires. Drawing on data from a representative UK sample of adults (N = 2,058), we calculated a Spearman rank correlation for each of the same pairs of items, and then estimated the correlation between the cosine values and Spearman coefficients. We also used network analysis to explore the model’s ability to uncover structures within the data and metadata. / Results: We found a moderate overall correlation (r = .48, p < .001) between the two indices. In a holdout sample, the cosine scores predicted the real-world correlations with a small degree of error (MAE = 0.05, MedAE = 0.04, RMSE = 0.064) suggesting the utility of NLP in identifying similar items for cross-study data pooling. Our NLP model could detect more complex patterns in our data, however it required manual rules to decide which edges to include in the network. / Conclusions: This research shows that it is possible to quantify the semantic similarity between pairs of questionnaire items from their meta-data, and these similarity indices correlate with how participants would answer the same two items. This highlights the potential of NLP to facilitate cross-study data pooling in mental health research. Nevertheless, researchers are cautioned to verify the psychometric equivalence of matched items.

Type:	Article
Title:	Using natural language processing to facilitate the harmonisation of mental health questionnaires: a validation study using real-world data
Location:	England
Open access status:	An open access version is available from UCL Discovery
DOI:	10.1186/s12888-024-05954-2
Publisher version:	https://doi.org/10.1186/s12888-024-05954-2
Language:	English
Additional information:	This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Keywords:	Retrospective data harmonisation, Harmonisation, Meta-analysis, Data pooling
UCL classification:	UCL UCL > Provost and Vice Provost Offices > School of Education UCL > Provost and Vice Provost Offices > School of Education > UCL Institute of Education UCL > Provost and Vice Provost Offices > School of Education > UCL Institute of Education > IOE - Social Research Institute
URI:	https://discovery-pp.ucl.ac.uk/id/eprint/10195624

Downloads since deposit

533Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item