UCL Discovery Stage
UCL home » Library Services » Electronic resources » UCL Discovery Stage

A Framework for Auditable Synthetic Data Generation

Houssiau, Florimond; Cohen, Samuel N; Szpruch, Lukasz; Daniel, Owen; Lawrence, Michaela G; Mitra, Robin; Wilde, Henry; (2022) A Framework for Auditable Synthetic Data Generation. ArXiv: Ithaca, NY, USA. Green open access

[thumbnail of auditablesyntheticdatapaper.pdf]
Preview
Text
auditablesyntheticdatapaper.pdf - Submitted Version

Download (525kB) | Preview

Abstract

Synthetic data has gained significant momentum thanks to sophisticated machine learning tools that enable the synthesis of high-dimensional datasets. However, many generation techniques do not give the data controller control over what statistical patterns are captured, leading to concerns over privacy protection. While synthetic records are not linked to a particular real-world individual, they can reveal information about users indirectly which may be unacceptable for data owners. There is thus a need to empirically verify the privacy of synthetic data -- a particularly challenging task in high-dimensional data. In this paper we present a general framework for synthetic data generation that gives data controllers full control over which statistical properties the synthetic data ought to preserve, what exact information loss is acceptable, and how to quantify it. The benefits of the approach are that (1) one can generate synthetic data that results in high utility for a given task, while (2) empirically validating that only statistics considered safe by the data curator are used to generate the data. We thus show the potential for synthetic data to be an effective means of releasing confidential data safely, while retaining useful information for analysts.

Type: Working / discussion paper
Title: A Framework for Auditable Synthetic Data Generation
Open access status: An open access version is available from UCL Discovery
Publisher version: https://doi.org/10.48550/arXiv.2211.11540
Language: English
Additional information: This version is the version of record. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords: Synthetic Data, Privacy, Generative models, Auditing
UCL classification: UCL
UCL > Provost and Vice Provost Offices > UCL BEAMS
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences
UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science
URI: https://discovery-pp.ucl.ac.uk/id/eprint/10201612
Downloads since deposit
30Downloads
Download activity - last month
Download activity - last 12 months
Downloads by country - last 12 months

Archive Staff Only

View Item View Item