Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra

Zhengjun Yue, Erfan Loweimi, Zoran Cvetkovic

Research output: Contribution to journalConference articleScientificpeer-review

111 Downloads (Pure)

Abstract

In this paper, we explore the effectiveness of deploying the raw phase and magnitude spectra for dysarthric speech recognition, detection and classification. In particular, we scrutinise the usefulness of various raw phase-based representations along with their combinations with the raw magnitude spectrum and filterbank features. We employed single and multi-stream architectures consisting of a cascade of convolutional, recurrent and fully-connected layers for acoustic modelling. Furthermore, we investigate various configurations and fusion schemes as well as their training dynamics. In addition, the accuracies of the raw phase and magnitude based systems in the detection and classification tasks are studied and discussed. We report the performance on the UASpeech and TORGO dysarthric speech databases and for different severity levels. Our best system achieved WERs of 31.2% and 9.1% for dysarthric and typical speech on TORGO and 30.2% on UASpeech, respectively.

Original languageEnglish
Pages (from-to)1533-1537
Number of pages5
JournalProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2023-August
DOIs
Publication statusPublished - 2023
Event24th International Speech Communication Association, Interspeech 2023 - Dublin, Ireland
Duration: 20 Aug 202324 Aug 2023

Keywords

  • Dysarthric speech processing
  • raw phase and magnitude spectra
  • single- and multi-stream acoustic modelling

Fingerprint

Dive into the research topics of 'Dysarthric Speech Recognition, Detection and Classification using Raw Phase and Magnitude Spectra'. Together they form a unique fingerprint.

Cite this