Improving child speech recognition with augmented child-like speech

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

10 Downloads (Pure)

Abstract

State-of-the-art ASRs show suboptimal performance for child speech. The scarcity of child speech limits the development of child speech recognition (CSR). Therefore, we studied child-to-child voice conversion (VC) from existing child speakers in the dataset and additional (new) child speakers via monolingual and cross-lingual (Dutch-to-German) VC, respectively. The results showed that cross-lingual child-to-child VC significantly improved child ASR performance. Experiments on the impact of the quantity of child-to-child cross-lingual VC-generated data on fine-tuning (FT) ASR models gave the best results with two-fold augmentation for our FT-Conformer model and FT-Whisper model which reduced WERs with ~3% absolute compared to the baseline, and with six-fold augmentation for the model trained from scratch, which improved by an absolute 3.6% WER. Moreover, using a small amount of "high-quality" VC-generated data achieved similar results to those of our best-FT models.
Original languageEnglish
Title of host publicationProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
PublisherInterspeech
Pages5183-5187
Number of pages5
Volume2024
DOIs
Publication statusPublished - 2024
EventINTERSPEECH 2024 - Kos, Greece
Duration: 1 Sept 20245 Sept 2024

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISSN (Print)2308-457X

Conference

ConferenceINTERSPEECH 2024
Country/TerritoryGreece
CityKos
Period1/09/245/09/24

Keywords

  • Child speech recognition
  • Child-to-child voice conversion
  • Cross-lingual voice conversion
  • Data augmentation

Fingerprint

Dive into the research topics of 'Improving child speech recognition with augmented child-like speech'. Together they form a unique fingerprint.

Cite this