Data vs. Model Machine Learning Fairness Testing: An Empirical Study

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

21 Downloads (Pure)

Abstract

Although several fairness definitions and bias mitigation techniques exist in the literature, all existing solutions evaluate fairness of Machine Learning (ML) systems after the training stage. In this paper, we take the first steps towards evaluating a more holistic approach by testing for fairness both before and after model training. We evaluate the effectiveness of the proposed approach and position it within the ML development lifecycle, using an empirical analysis of the relationship between model dependent and independent fairness metrics. The study uses 2 fairness metrics, 4 ML algorithms, 5 real-world datasets and 1600 fairness evaluation cycles. We find a linear relationship between data and model fairness metrics when the distribution and the size of the training data changes. Our results indicate that testing for fairness prior to training can be a "cheap" and effective means of catching a biased data collection process early; detecting data drifts in production systems and minimising execution of full training cycles thus reducing development time and costs.

Original languageEnglish
Title of host publicationProceedings - 2024 ACM/IEEE 46th International Conference on Software Engineering
Subtitle of host publicationCompanion, ICSE-Companion 2024
PublisherIEEE
Pages366-367
Number of pages2
ISBN (Electronic)9798400705021
DOIs
Publication statusPublished - 2024
EventACM/IEEE 46th International Conference on Software Engineering - Lisbon, Lisbon, Portugal
Duration: 14 Apr 202420 Apr 2024
Conference number: 46
https://conf.researchr.org/home/icse-2024

Publication series

NameProceedings - International Conference on Software Engineering
ISSN (Print)0270-5257

Conference

ConferenceACM/IEEE 46th International Conference on Software Engineering
Abbreviated title ICSE '24
Country/TerritoryPortugal
CityLisbon
Period14/04/2420/04/24
Internet address

Keywords

  • Datacentric AI
  • Empirical Software Engineering
  • ML Fairness Testing
  • SE4ML

Fingerprint

Dive into the research topics of 'Data vs. Model Machine Learning Fairness Testing: An Empirical Study'. Together they form a unique fingerprint.

Cite this