Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra

Zhengjun Yue; Erfan Loweimi; Zoran Cvetkovic

doi:10.21437/Interspeech.2023-222

Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra

Zhengjun Yue, Erfan Loweimi, Zoran Cvetkovic

Multimedia Computing

Research output: Contribution to journal › Conference article › Scientific › peer-review

111 Downloads (Pure)

Abstract

In this paper, we explore the effectiveness of deploying the raw phase and magnitude spectra for dysarthric speech recognition, detection and classification. In particular, we scrutinise the usefulness of various raw phase-based representations along with their combinations with the raw magnitude spectrum and filterbank features. We employed single and multi-stream architectures consisting of a cascade of convolutional, recurrent and fully-connected layers for acoustic modelling. Furthermore, we investigate various configurations and fusion schemes as well as their training dynamics. In addition, the accuracies of the raw phase and magnitude based systems in the detection and classification tasks are studied and discussed. We report the performance on the UASpeech and TORGO dysarthric speech databases and for different severity levels. Our best system achieved WERs of 31.2% and 9.1% for dysarthric and typical speech on TORGO and 30.2% on UASpeech, respectively.

Original language	English
Pages (from-to)	1533-1537
Number of pages	5
Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume	2023-August
DOIs	https://doi.org/10.21437/Interspeech.2023-222
Publication status	Published - 2023
Event	24th International Speech Communication Association, Interspeech 2023 - Dublin, Ireland Duration: 20 Aug 2023 → 24 Aug 2023

Keywords

Dysarthric speech processing
raw phase and magnitude spectra
single- and multi-stream acoustic modelling

Access to Document

10.21437/Interspeech.2023-222

yue23_interspeechFinal published version, 276 KB

Cite this

@article{3dd58d3b4b5a4ecaaedd72bc523a2979,

title = "Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra",

abstract = "In this paper, we explore the effectiveness of deploying the raw phase and magnitude spectra for dysarthric speech recognition, detection and classification. In particular, we scrutinise the usefulness of various raw phase-based representations along with their combinations with the raw magnitude spectrum and filterbank features. We employed single and multi-stream architectures consisting of a cascade of convolutional, recurrent and fully-connected layers for acoustic modelling. Furthermore, we investigate various configurations and fusion schemes as well as their training dynamics. In addition, the accuracies of the raw phase and magnitude based systems in the detection and classification tasks are studied and discussed. We report the performance on the UASpeech and TORGO dysarthric speech databases and for different severity levels. Our best system achieved WERs of 31.2% and 9.1% for dysarthric and typical speech on TORGO and 30.2% on UASpeech, respectively.",

keywords = "Dysarthric speech processing, raw phase and magnitude spectra, single- and multi-stream acoustic modelling",

author = "Zhengjun Yue and Erfan Loweimi and Zoran Cvetkovic",

year = "2023",

doi = "10.21437/Interspeech.2023-222",

language = "English",

volume = "2023-August",

pages = "1533--1537",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

note = "24th International Speech Communication Association, Interspeech 2023 ; Conference date: 20-08-2023 Through 24-08-2023",

}

Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra. / Yue, Zhengjun; Loweimi, Erfan; Cvetkovic, Zoran.
In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Vol. 2023-August, 2023, p. 1533-1537.

Research output: Contribution to journal › Conference article › Scientific › peer-review

TY - JOUR

T1 - Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra

AU - Yue, Zhengjun

AU - Loweimi, Erfan

AU - Cvetkovic, Zoran

PY - 2023

Y1 - 2023

N2 - In this paper, we explore the effectiveness of deploying the raw phase and magnitude spectra for dysarthric speech recognition, detection and classification. In particular, we scrutinise the usefulness of various raw phase-based representations along with their combinations with the raw magnitude spectrum and filterbank features. We employed single and multi-stream architectures consisting of a cascade of convolutional, recurrent and fully-connected layers for acoustic modelling. Furthermore, we investigate various configurations and fusion schemes as well as their training dynamics. In addition, the accuracies of the raw phase and magnitude based systems in the detection and classification tasks are studied and discussed. We report the performance on the UASpeech and TORGO dysarthric speech databases and for different severity levels. Our best system achieved WERs of 31.2% and 9.1% for dysarthric and typical speech on TORGO and 30.2% on UASpeech, respectively.

AB - In this paper, we explore the effectiveness of deploying the raw phase and magnitude spectra for dysarthric speech recognition, detection and classification. In particular, we scrutinise the usefulness of various raw phase-based representations along with their combinations with the raw magnitude spectrum and filterbank features. We employed single and multi-stream architectures consisting of a cascade of convolutional, recurrent and fully-connected layers for acoustic modelling. Furthermore, we investigate various configurations and fusion schemes as well as their training dynamics. In addition, the accuracies of the raw phase and magnitude based systems in the detection and classification tasks are studied and discussed. We report the performance on the UASpeech and TORGO dysarthric speech databases and for different severity levels. Our best system achieved WERs of 31.2% and 9.1% for dysarthric and typical speech on TORGO and 30.2% on UASpeech, respectively.

KW - Dysarthric speech processing

KW - raw phase and magnitude spectra

KW - single- and multi-stream acoustic modelling

UR - http://www.scopus.com/inward/record.url?scp=85171573949&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2023-222

DO - 10.21437/Interspeech.2023-222

M3 - Conference article

AN - SCOPUS:85171573949

SN - 2308-457X

VL - 2023-August

SP - 1533

EP - 1537

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 24th International Speech Communication Association, Interspeech 2023

Y2 - 20 August 2023 through 24 August 2023

ER -

Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this