When Machine Learning Models Leak: An Exploration of Synthetic Training Data

Manel Slokom*, Peter Paul de Wolf, Martha Larson

*Corresponding author for this work

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

10 Downloads (Pure)

Abstract

We investigate an attack on a machine learning classifier that predicts the propensity of a person or household to move (i.e., relocate) in the next two years. The attack assumes that the classifier has been made publically available and that the attacker has access to information about a certain number of target individuals. That attacker might also have information about another set of people to train an auxiliary classifier. We show that the attack is possible for target individuals independently of whether they were contained in the original training set of the classifier. However, the attack is somewhat less successful for individuals that were not contained in the original data. Based on this observation, we investigate whether training the classifier on a data set that is synthesized from the original training data, rather than using the original training data directly, would help to mitigate the effectiveness of the attack. Our experimental results show that it does not, leading us to conclude that new approaches to data synthesis must be developed if synthesized data is to resemble “unseen” individuals to an extent great enough to help to block machine learning model attacks.

Original languageEnglish
Title of host publicationPrivacy in Statistical Databases - International Conference, PSD 2022, Proceedings
EditorsJosep Domingo-Ferrer, Maryline Laurent
PublisherSpringer
Pages283-296
Number of pages14
ISBN (Print)9783031139444
DOIs
Publication statusPublished - 2022
EventInternational Conference on Privacy in Statistical Databases, PSD 2022 - Paris, France
Duration: 21 Sept 202223 Sept 2022

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume13463 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceInternational Conference on Privacy in Statistical Databases, PSD 2022
Country/TerritoryFrance
CityParis
Period21/09/2223/09/22

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

  • Attribute inference
  • Machine learning
  • Propensity to move
  • Synthetic data

Fingerprint

Dive into the research topics of 'When Machine Learning Models Leak: An Exploration of Synthetic Training Data'. Together they form a unique fingerprint.

Cite this