Viviani, Laura;
White, Ian R;
Williamson, Elizabeth J;
Carpenter, James;
van der Meulen, Jan;
Cromwell, David A;
(2023)
The DetectDeviatingCells algorithm was a useful addition to the toolkit for cellwise error detection in observational data.
Journal of Clinical Epidemiology
, 157
pp. 35-45.
10.1016/j.jclinepi.2023.02.015.
Preview |
PDF
1-s2.0-S0895435623000367-main.pdf - Published Version Download (2MB) | Preview |
Abstract
OBJECTIVE: We evaluated the error detection performance of the DetectDeviatingCells (DDC) algorithm, which flags data anomalies at observation (casewise) and variable (cellwise) level in continuous variables. We compared its performance to other approaches in a simulated dataset. STUDY DESIGN AND SETTING: We simulated height and weight data for hypothetical individuals aged 2-20 years. We changed a proportion of height values according to pre-determined error patterns. We applied the DDC algorithm and other error-detection approaches (descriptive statistics, plots, fixed-threshold rules, classic and robust Mahalanobis distance) and we compared error detection performance with sensitivity, specificity, likelihood ratios, predictive values and ROC curves. RESULTS: At our chosen thresholds, error detection specificity was excellent across all scenarios for all methods and sensitivity was higher for multivariable and robust methods. The DDC algorithm performance was similar to other robust multivariable methods. Analysis of ROC curves suggested that all methods had comparable performance for gross errors (e.g. wrong measurement unit), but the DDC algorithm outperformed the others for more complex error patterns (e.g. transcription errors that are still plausible, although extreme). CONCLUSIONS: The DDC algorithm has the potential to improve error detection processes for observational data.
Type: | Article |
---|---|
Title: | The DetectDeviatingCells algorithm was a useful addition to the toolkit for cellwise error detection in observational data |
Location: | United States |
Open access status: | An open access version is available from UCL Discovery |
DOI: | 10.1016/j.jclinepi.2023.02.015 |
Publisher version: | https://doi.org/10.1016/j.jclinepi.2023.02.015 |
Language: | English |
Additional information: | ©2023 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
Keywords: | DetectDeviatingCells, Mahalanobis distance, data quality, outlier, robust statistics, s: error detection |
UCL classification: | UCL UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > Inst of Clinical Trials and Methodology |
URI: | https://discovery-pp.ucl.ac.uk/id/eprint/10167513 |
Archive Staff Only
![]() |
View Item |