Towards Identity Preserving Normal to Dysarthric Voice Conversion

Wen-Chin Huang, Bence Mark Halpern, Lester Phillip Violeta, Odette Scharenborg, Tomoki Toda

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

12 Citations (Scopus)
5 Downloads (Pure)

Abstract

We present a voice conversion framework that converts normal speech into dysarthric speech while preserving the speaker identity. Such a framework is essential for (1) clinical decision making processes and alleviation of patient stress, (2) data augmentation for dysarthric speech recognition. This is an especially challenging task since the converted samples should capture the severity of dysarthric speech while being highly natural and possessing the speaker identity of the normal speaker. To this end, we adopted a two-stage framework, which consists of a sequence-to-sequence model and a nonparallel frame-wise model. Objective and subjective evaluations were conducted on the UASpeech dataset, and results showed that the method was able to yield reasonable naturalness and capture severity aspects of the pathological speech. On the other hand, the similarity to the normal source speaker’s voice was limited and requires further improvements.
Original languageEnglish
Title of host publicationProceedings of the ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Place of PublicationPiscataway
PublisherIEEE
Pages6672-6676
Number of pages5
ISBN (Electronic)978-1-6654-0540-9
ISBN (Print)978-1-6654-0541-6
DOIs
Publication statusPublished - 2022
EventICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - Singapore, Singapore
Duration: 23 May 202227 May 2022

Conference

ConferenceICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Country/TerritorySingapore
CitySingapore
Period23/05/2227/05/22

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

  • voice conversion
  • pathological speech
  • dysarthric speech
  • sequence-to-sequence modeling
  • autoencoder

Fingerprint

Dive into the research topics of 'Towards Identity Preserving Normal to Dysarthric Voice Conversion'. Together they form a unique fingerprint.

Cite this