UCL Discovery Stage
UCL home » Library Services » Electronic resources » UCL Discovery Stage

Synthetic data in medical research

Kokosi, Theodora; Harron, Katie; (2022) Synthetic data in medical research. BMJ Medicine , 1 (1) , Article e000167. 10.1136/bmjmed-2022-000167. Green open access

[thumbnail of e000167.full.pdf]
Preview
Text
e000167.full.pdf - Published Version

Download (623kB) | Preview

Abstract

Introduction Demand to access high quality data at the individual level for medical and healthcare research is growing. Electronic health record data collected on whole populations can help to generate real world evidence and can be used for a range of secondary purposes, including testing new hypotheses and developing and evaluating different methodological and statistical approaches. Secondary analysis of primary research data, such as from clinical trials,1 is also valuable—for example, to conduct meta-analyses of individual participant data. However, several complex privacy requirements make accessing these data challenging.2 Information contained in electronic health records or in clinical trial data are highly sensitive and access to these datasets can be an expensive and lengthy process.3 Data privacy and protection regulations are the main barriers to accessing these data for healthcare and medical research.4 Anonymisation (where potentially identifiable variables are removed) is one way to make data available; however, intensive anonymisation can degrade the data to the extent that it is no longer fit for purpose.5 For example, adding random noise to the data reduces precision and leads to larger confidence intervals. Several reidentification attempts on anonymised data have been successful and have harmed public and regulators’ trust in such methods.6 7 For instance, one study showed that patients could be identified by matching information from patient level data that was publicly available, attributing information obtained from newspapers, and contacting those patients directly.6 Use of information from clinical trials and electronic health records of large populations has the potential to benefit medical and healthcare research and makes seeking new approaches to data access imperative. One solution is to use so-called synthetic data, or artificial data, which provide a realistic representation of the original data source. Synthetic data look like the original data source, without containing any information on any real individuals. Synthetic data can attempt to preserve some of the statistical properties of the original data source (eg, distributions of continuous data, proportions of categorical data, correlations between variables, and other model parameters).

Type: Article
Title: Synthetic data in medical research
Open access status: An open access version is available from UCL Discovery
DOI: 10.1136/bmjmed-2022-000167
Publisher version: https://doi.org/10.1136/bmjmed-2022-000167
Language: English
Additional information: This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third-party material in this article are included in the Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
UCL classification: UCL
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > UCL GOS Institute of Child Health > Population, Policy and Practice Dept
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences > Faculty of Population Health Sciences > UCL GOS Institute of Child Health
UCL > Provost and Vice Provost Offices > School of Life and Medical Sciences
URI: https://discovery-pp.ucl.ac.uk/id/eprint/10156537
Downloads since deposit
14,972Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item