UCL Discovery Stage
UCL home » Library Services » Electronic resources » UCL Discovery Stage

Early Prediction of Neoplasms Using Machine Learning: A Study of Electronic Health Records from the Ministry of National Guard Health Affairs in Saudi Arabia

Alfayez, A; Lai, A; Kunz, H; (2022) Early Prediction of Neoplasms Using Machine Learning: A Study of Electronic Health Records from the Ministry of National Guard Health Affairs in Saudi Arabia. Studies in Health Technology and Informatics , 289 pp. 37-40. 10.3233/SHTI210853. Green open access

[thumbnail of SHTI-289-SHTI210853.pdf]
Preview
Text
SHTI-289-SHTI210853.pdf - Published Version

Download (131kB) | Preview

Abstract

The early detection and treatment of neoplasms, and in particular the malignant, can save lives. However, identifying those most at risk of developing neoplasms remains challenging. Electronic Health Records (EHR) provide a rich source of “big” data on large numbers of patients. We hypothesised that in the period preceding a definitive diagnosis, there exists a series of ordered healthcare events captured within EHR data that characterise the onset and progression of neoplasms that can be exploited to predict future neoplasms occurrence. Using data from the EHR of the Ministry of National Guard Health Affairs (MNG-HA), a large healthcare provider in Saudi Arabia, we aimed to discover health event patterns present in EHR data that predict the development of neoplasms in the year prior to diagnosis. After data cleaning, pre-processing, and applying the inclusion and exclusion criteria, 5,466 patients were available for model construction: 1,715 cases and 3,751 controls. Two predictive models were developed (using Decision tree (DT), and Random Forests (RF)). Age, gender, ethnicity, and ICD-10-chapter (broad disease classification) codes as predictor variables and the presence or absence of neoplasms as the output variable. The common factors associated with a diagnosis of neoplasms within one or more years after their occurrence across all the models were: (1) age at neoplasms/event diagnosis; (2) gender; and patient medical history of (3) diseases of the blood and blood-forming organs and certain disorders involving immune mechanisms, and (4) diseases of the genitourinary system. Model performance assessment showed that RF has higher Area Under the Curve (AUC)=0.76 whereas the DT was less complex. This study is a demonstration that EHR data can be used to predict future neoplasm occurrence.

Type: Article
Title: Early Prediction of Neoplasms Using Machine Learning: A Study of Electronic Health Records from the Ministry of National Guard Health Affairs in Saudi Arabia
Open access status: An open access version is available from UCL Discovery
DOI: 10.3233/SHTI210853
Publisher version: http://dx.doi.org/10.3233/SHTI210853
Language: English
Additional information: © 2022 The authors and IOS Press. This article is published online with Open Access by IOS Press and distributed under the terms of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0).
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Institute of Health Informatics > Clinical Epidemiology
URI: https://discovery-pp.ucl.ac.uk/id/eprint/10142115
Downloads since deposit
1,887Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item