The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition

Luke Prananta; Bence Mark Halpern; Siyuan Feng; Odette Scharenborg

doi:10.21437/Interspeech.2022-190

The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition

Luke Prananta, Bence Mark Halpern, Siyuan Feng, Odette Scharenborg

Multimedia Computing

Research output: Contribution to journal › Conference article › Scientific › peer-review

6 Citations (Scopus)

17 Downloads (Pure)

Abstract

In this paper, we investigate several existing and a new state-of-the-art generative adversarial network-based (GAN) voice conversion method for enhancing dysarthric speech for improved dysarthric speech recognition. We compare key components of existing methods as part of a rigorous ablation study to find the most effective solution to improve dysarthric speech recognition. We find that straightforward signal processing methods such as stationary noise removal and vocoder-based time stretching lead to dysarthric speech recognition results comparable to those obtained when using state-of-the-art GAN-based voice conversion methods as measured using a phoneme recognition task. Additionally, our proposed solution of a combination of MaskCycleGAN-VC and time stretching is able to improve the phoneme recognition results for certain dysarthric speakers compared to our time stretched baseline.

Original language	English
Pages (from-to)	36-40
Number of pages	5
Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume	2022-September
DOIs	https://doi.org/10.21437/Interspeech.2022-190
Publication status	Published - 2022
Event	23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of Duration: 18 Sept 2022 → 22 Sept 2022

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

dysarthric speech recognition
generative adversarial networks
time stretching
voice conversion

Access to Document

10.21437/Interspeech.2022-190

prananta22_interspeechFinal published version, 4.25 MB

Cite this

@article{9d9665c2cbc64779b9ecd1f424580165,

title = "The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition",

abstract = "In this paper, we investigate several existing and a new state-of-the-art generative adversarial network-based (GAN) voice conversion method for enhancing dysarthric speech for improved dysarthric speech recognition. We compare key components of existing methods as part of a rigorous ablation study to find the most effective solution to improve dysarthric speech recognition. We find that straightforward signal processing methods such as stationary noise removal and vocoder-based time stretching lead to dysarthric speech recognition results comparable to those obtained when using state-of-the-art GAN-based voice conversion methods as measured using a phoneme recognition task. Additionally, our proposed solution of a combination of MaskCycleGAN-VC and time stretching is able to improve the phoneme recognition results for certain dysarthric speakers compared to our time stretched baseline.",

keywords = "dysarthric speech recognition, generative adversarial networks, time stretching, voice conversion",

author = "Luke Prananta and Halpern, {Bence Mark} and Siyuan Feng and Odette Scharenborg",

note = "Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.; 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 ; Conference date: 18-09-2022 Through 22-09-2022",

year = "2022",

doi = "10.21437/Interspeech.2022-190",

language = "English",

volume = "2022-September",

pages = "36--40",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition. / Prananta, Luke; Halpern, Bence Mark; Feng, Siyuan et al.
In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2022-September, 2022, p. 36-40.

Research output: Contribution to journal › Conference article › Scientific › peer-review

TY - JOUR

T1 - The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition

AU - Prananta, Luke

AU - Halpern, Bence Mark

AU - Feng, Siyuan

AU - Scharenborg, Odette

N1 - Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

PY - 2022

Y1 - 2022

N2 - In this paper, we investigate several existing and a new state-of-the-art generative adversarial network-based (GAN) voice conversion method for enhancing dysarthric speech for improved dysarthric speech recognition. We compare key components of existing methods as part of a rigorous ablation study to find the most effective solution to improve dysarthric speech recognition. We find that straightforward signal processing methods such as stationary noise removal and vocoder-based time stretching lead to dysarthric speech recognition results comparable to those obtained when using state-of-the-art GAN-based voice conversion methods as measured using a phoneme recognition task. Additionally, our proposed solution of a combination of MaskCycleGAN-VC and time stretching is able to improve the phoneme recognition results for certain dysarthric speakers compared to our time stretched baseline.

AB - In this paper, we investigate several existing and a new state-of-the-art generative adversarial network-based (GAN) voice conversion method for enhancing dysarthric speech for improved dysarthric speech recognition. We compare key components of existing methods as part of a rigorous ablation study to find the most effective solution to improve dysarthric speech recognition. We find that straightforward signal processing methods such as stationary noise removal and vocoder-based time stretching lead to dysarthric speech recognition results comparable to those obtained when using state-of-the-art GAN-based voice conversion methods as measured using a phoneme recognition task. Additionally, our proposed solution of a combination of MaskCycleGAN-VC and time stretching is able to improve the phoneme recognition results for certain dysarthric speakers compared to our time stretched baseline.

KW - dysarthric speech recognition

KW - generative adversarial networks

KW - time stretching

KW - voice conversion

UR - http://www.scopus.com/inward/record.url?scp=85140077807&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2022-190

DO - 10.21437/Interspeech.2022-190

M3 - Conference article

AN - SCOPUS:85140077807

SN - 2308-457X

VL - 2022-September

SP - 36

EP - 40

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022

Y2 - 18 September 2022 through 22 September 2022

ER -

The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this