UCL Discovery Stage
UCL home » Library Services » Electronic resources » UCL Discovery Stage

The DetectDeviatingCells algorithm was a useful addition to the toolkit for cellwise error detection in observational data

Viviani, Laura; White, Ian R; Williamson, Elizabeth J; Carpenter, James; van der Meulen, Jan; Cromwell, David A; (2023) The DetectDeviatingCells algorithm was a useful addition to the toolkit for cellwise error detection in observational data. Journal of Clinical Epidemiology , 157 pp. 35-45. 10.1016/j.jclinepi.2023.02.015. Green open access

[thumbnail of 1-s2.0-S0895435623000367-main.pdf]
Preview
PDF
1-s2.0-S0895435623000367-main.pdf - Published Version

Download (2MB) | Preview

Abstract

OBJECTIVE: We evaluated the error detection performance of the DetectDeviatingCells (DDC) algorithm, which flags data anomalies at observation (casewise) and variable (cellwise) level in continuous variables. We compared its performance to other approaches in a simulated dataset. STUDY DESIGN AND SETTING: We simulated height and weight data for hypothetical individuals aged 2-20 years. We changed a proportion of height values according to pre-determined error patterns. We applied the DDC algorithm and other error-detection approaches (descriptive statistics, plots, fixed-threshold rules, classic and robust Mahalanobis distance) and we compared error detection performance with sensitivity, specificity, likelihood ratios, predictive values and ROC curves. RESULTS: At our chosen thresholds, error detection specificity was excellent across all scenarios for all methods and sensitivity was higher for multivariable and robust methods. The DDC algorithm performance was similar to other robust multivariable methods. Analysis of ROC curves suggested that all methods had comparable performance for gross errors (e.g. wrong measurement unit), but the DDC algorithm outperformed the others for more complex error patterns (e.g. transcription errors that are still plausible, although extreme). CONCLUSIONS: The DDC algorithm has the potential to improve error detection processes for observational data.

Type: Article
Title: The DetectDeviatingCells algorithm was a useful addition to the toolkit for cellwise error detection in observational data
Location: United States
Open access status: An open access version is available from UCL Discovery
DOI: 10.1016/j.jclinepi.2023.02.015
Publisher version: https://doi.org/10.1016/j.jclinepi.2023.02.015
Language: English
Additional information: ©2023 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
Keywords: DetectDeviatingCells, Mahalanobis distance, data quality, outlier, robust statistics, s: error detection
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Inst of Clinical Trials and Methodology
URI: https://discovery-pp.ucl.ac.uk/id/eprint/10167513
Downloads since deposit
576Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item