A Data Perspective on Ethical Challenges in Voice Biometrics Research

Anna Leschanowsky, Casandra Rusti*, Carolyn Quinlan, Michaela Pnacek, Lauriane Gorce, Wiebke Hutiri

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

5 Downloads (Pure)

Abstract

Speaker recognition technology, deployed in sectors like banking, education, recruitment, immigration, law enforcement, and healthcare, relies heavily on biometric data. However, the ethical implications and biases inherent in the datasets driving this technology have not been fully explored. Through a longitudinal study of close to 700 papers published at the ISCA Interspeech Conference in the years 2012 to 2021, we investigate how dataset use has evolved alongside the widespread adoption of deep neural networks. Our study identifies the most commonly used datasets in the field and examines their usage patterns. The analysis reveals significant shifts in data practices since the advent of deep learning: a small number of datasets dominate speaker recognition training and evaluation, and the majority of studies evaluate their systems on a single dataset. For four key datasets–Switchboard, Mixer, VoxCeleb, and ASVspoof–we conduct a detailed analysis of metadata and collection methods to assess ethical concerns and privacy risks. Our study highlights numerous challenges related to sampling bias, re-identification, consent, disclosure of sensitive information and security risks in speaker recognition datasets, and emphasizes the need for more representative, fair, and privacy-aware data collection in this domain.
Original languageEnglish
Pages (from-to)118-131
Number of pages14
JournalIEEE Transactions on Biometrics, Behavior, and Identity Science
Volume7
Issue number1
DOIs
Publication statusPublished - 2025

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

  • biometrics (access control)
  • data handling
  • data transparency
  • ethical aspects
  • human voice
  • privacy
  • speaker recognition

Fingerprint

Dive into the research topics of 'A Data Perspective on Ethical Challenges in Voice Biometrics Research'. Together they form a unique fingerprint.

Cite this