On Integrating the Number of Synthetic Data Sets m into the a priori Synthesis Approach

Advanced search
Browse by:

Department | Year

UCL Theses | Latest

Deposit your research

On Integrating the Number of Synthetic Data Sets m into the a priori Synthesis Approach

Jackson, J; Mitra, R; Francis, B; Dove, I; (2022) On Integrating the Number of Synthetic Data Sets m into the a priori Synthesis Approach. In: Privacy in Statistical Databases. PSD 2022. (pp. pp. 205-219). Springer International Publishing: Cham, Switzerland. Green open access

[thumbnail of PSD_2022_Revised_Jackson_et_al.pdf]

Preview

Text
PSD_2022_Revised_Jackson_et_al.pdf - Accepted Version
Download (831kB) | Preview

Abstract

The synthesis mechanism given in [4] uses saturated models, along with overdispersed count distributions, to generate synthetic categorical data. The mechanism is controlled by tuning parameters, which can be tuned according to a specific risk or utility metric. Thus expected properties of synthetic data sets can be determined analytically a priori, that is, before they are generated. While [4] considered the case of generating m=1 data set, this paper considers generating m>1 data sets. In effect, m becomes a tuning parameter and the role of m in relation to the risk-utility trade-off can be shown analytically. The paper introduces a pair of risk metrics, τ3(k,d) and τ4(k,d), that are suited to m>1 data sets; and also considers the more general issue of how best to analyse m>1 categorical data sets: average the data sets pre-analysis or average results post-analysis. Finally, the methods are demonstrated empirically with the synthesis of a constructed data set which is used to represent the English School Census.

Type:	Proceedings paper
Title:	On Integrating the Number of Synthetic Data Sets m into the a priori Synthesis Approach
Event:	International Conference on Privacy in Statistical Databases - PSD 2022
ISBN-13:	9783031139444
Open access status:	An open access version is available from UCL Discovery
DOI:	10.1007/978-3-031-13945-1_15
Language:	English
Additional information:	This version is the author accepted manuscript. For information on re-use, please refer to the publisher’s terms and conditions.
Keywords:	Synthetic data, privacy, categorical data, risk metrics, contingency tables
UCL classification:	UCL UCL > Provost and Vice Provost Offices > UCL BEAMS UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences UCL > Provost and Vice Provost Offices > UCL BEAMS > Faculty of Maths and Physical Sciences > Dept of Statistical Science
URI:	https://discovery-pp.ucl.ac.uk/id/eprint/10159958

Downloads since deposit

900Downloads

Download activity - last month

Download activity - last 12 months

Downloads by country - last 12 months

Archive Staff Only

View Item