Towards Identity Preserving Normal to Dysarthric Voice Conversion

Wen-Chin  Huang; Bence Mark Halpern; Lester Phillip  Violeta; Odette Scharenborg; Tomoki  Toda

doi:10.1109/ICASSP43922.2022.9747550

Towards Identity Preserving Normal to Dysarthric Voice Conversion

Wen-Chin Huang, Bence Mark Halpern, Lester Phillip Violeta, Odette Scharenborg, Tomoki Toda

Multimedia Computing

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

12 Citations (Scopus)

5 Downloads (Pure)

Abstract

We present a voice conversion framework that converts normal speech into dysarthric speech while preserving the speaker identity. Such a framework is essential for (1) clinical decision making processes and alleviation of patient stress, (2) data augmentation for dysarthric speech recognition. This is an especially challenging task since the converted samples should capture the severity of dysarthric speech while being highly natural and possessing the speaker identity of the normal speaker. To this end, we adopted a two-stage framework, which consists of a sequence-to-sequence model and a nonparallel frame-wise model. Objective and subjective evaluations were conducted on the UASpeech dataset, and results showed that the method was able to yield reasonable naturalness and capture severity aspects of the pathological speech. On the other hand, the similarity to the normal source speaker’s voice was limited and requires further improvements.

Original language	English
Title of host publication	Proceedings of the ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Place of Publication	Piscataway
Publisher	IEEE
Pages	6672-6676
Number of pages	5
ISBN (Electronic)	978-1-6654-0540-9
ISBN (Print)	978-1-6654-0541-6
DOIs	https://doi.org/10.1109/ICASSP43922.2022.9747550
Publication status	Published - 2022
Event	ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - Singapore, Singapore Duration: 23 May 2022 → 27 May 2022

Conference

Conference	ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Country/Territory	Singapore
City	Singapore
Period	23/05/22 → 27/05/22

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

voice conversion
pathological speech
dysarthric speech
sequence-to-sequence modeling
autoencoder

Access to Document

10.1109/ICASSP43922.2022.9747550

Towards_Identity_Preserving_Normal_to_Dysarthric_Voice_ConversionFinal published version, 998 KB

Cite this

@inproceedings{593cb1dce5ab4a8b9eb1908bbc26a2e9,

title = "Towards Identity Preserving Normal to Dysarthric Voice Conversion",

abstract = "We present a voice conversion framework that converts normal speech into dysarthric speech while preserving the speaker identity. Such a framework is essential for (1) clinical decision making processes and alleviation of patient stress, (2) data augmentation for dysarthric speech recognition. This is an especially challenging task since the converted samples should capture the severity of dysarthric speech while being highly natural and possessing the speaker identity of the normal speaker. To this end, we adopted a two-stage framework, which consists of a sequence-to-sequence model and a nonparallel frame-wise model. Objective and subjective evaluations were conducted on the UASpeech dataset, and results showed that the method was able to yield reasonable naturalness and capture severity aspects of the pathological speech. On the other hand, the similarity to the normal source speaker{\textquoteright}s voice was limited and requires further improvements.",

keywords = "voice conversion, pathological speech, dysarthric speech, sequence-to-sequence modeling, autoencoder",

author = "Wen-Chin Huang and Halpern, {Bence Mark} and Violeta, {Lester Phillip} and Odette Scharenborg and Tomoki Toda",

note = "Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.; ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ; Conference date: 23-05-2022 Through 27-05-2022",

year = "2022",

doi = "10.1109/ICASSP43922.2022.9747550",

language = "English",

isbn = "978-1-6654-0541-6",

pages = "6672--6676",

booktitle = "Proceedings of the ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)",

publisher = "IEEE",

address = "United States",

}

Huang, W-C, Halpern, BM, Violeta, LP, Scharenborg, O & Toda, T 2022, Towards Identity Preserving Normal to Dysarthric Voice Conversion. in Proceedings of the ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)., 9747550, IEEE, Piscataway, pp. 6672-6676, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 23/05/22. https://doi.org/10.1109/ICASSP43922.2022.9747550

Towards Identity Preserving Normal to Dysarthric Voice Conversion. / Huang, Wen-Chin ; Halpern, Bence Mark; Violeta, Lester Phillip et al.
Proceedings of the ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). Piscataway: IEEE, 2022. p. 6672-6676 9747550.

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Towards Identity Preserving Normal to Dysarthric Voice Conversion

AU - Huang, Wen-Chin

AU - Halpern, Bence Mark

AU - Violeta, Lester Phillip

AU - Scharenborg, Odette

AU - Toda, Tomoki

N1 - Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

PY - 2022

Y1 - 2022

N2 - We present a voice conversion framework that converts normal speech into dysarthric speech while preserving the speaker identity. Such a framework is essential for (1) clinical decision making processes and alleviation of patient stress, (2) data augmentation for dysarthric speech recognition. This is an especially challenging task since the converted samples should capture the severity of dysarthric speech while being highly natural and possessing the speaker identity of the normal speaker. To this end, we adopted a two-stage framework, which consists of a sequence-to-sequence model and a nonparallel frame-wise model. Objective and subjective evaluations were conducted on the UASpeech dataset, and results showed that the method was able to yield reasonable naturalness and capture severity aspects of the pathological speech. On the other hand, the similarity to the normal source speaker’s voice was limited and requires further improvements.

AB - We present a voice conversion framework that converts normal speech into dysarthric speech while preserving the speaker identity. Such a framework is essential for (1) clinical decision making processes and alleviation of patient stress, (2) data augmentation for dysarthric speech recognition. This is an especially challenging task since the converted samples should capture the severity of dysarthric speech while being highly natural and possessing the speaker identity of the normal speaker. To this end, we adopted a two-stage framework, which consists of a sequence-to-sequence model and a nonparallel frame-wise model. Objective and subjective evaluations were conducted on the UASpeech dataset, and results showed that the method was able to yield reasonable naturalness and capture severity aspects of the pathological speech. On the other hand, the similarity to the normal source speaker’s voice was limited and requires further improvements.

KW - voice conversion

KW - pathological speech

KW - dysarthric speech

KW - sequence-to-sequence modeling

KW - autoencoder

UR - http://www.scopus.com/inward/record.url?scp=85128659545&partnerID=8YFLogxK

U2 - 10.1109/ICASSP43922.2022.9747550

DO - 10.1109/ICASSP43922.2022.9747550

M3 - Conference contribution

SN - 978-1-6654-0541-6

SP - 6672

EP - 6676

BT - Proceedings of the ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

PB - IEEE

CY - Piscataway

T2 - ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

Y2 - 23 May 2022 through 27 May 2022

ER -

Towards Identity Preserving Normal to Dysarthric Voice Conversion

Abstract

Conference

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this