The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results

Hang Chen, Hengshun Zhou, Jun Du, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Di-Yuan Liu, More Authors

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

14 Citations (Scopus)
309 Downloads (Pure)

Abstract

In this paper we discuss the rational of the Multi-model Information based Speech Processing (MISP) Challenge, and provide a detailed description of the data recorded, the two evaluation tasks and the corresponding baselines, followed by a summary of submitted systems and evaluation results. The MISP Challenge aims at tack-ling speech processing tasks in different scenarios by introducing information about an additional modality (e.g., video, or text), which will hopefully lead to better environmental and speaker robustness in realistic applications. In the first MISP challenge, two bench-mark datasets recorded in a real-home TV room with two reproducible open-source baseline systems have been released to promote research in audio-visual wake word spotting (AVWWS) and audio-visual speech recognition (AVSR). To our knowledge, MISP is the first open evaluation challenge to tackle real-world issues of AVWWS and AVSR in the home TV scenario.
Original languageEnglish
Title of host publicationProceedings of the ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Place of PublicationPiscataway
PublisherIEEE
Pages9266-9270
Number of pages5
ISBN (Electronic)978-1-6654-0540-9
ISBN (Print)978-1-6654-0541-6
DOIs
Publication statusPublished - 2022
EventICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - Singapore, Singapore
Duration: 23 May 202227 May 2022

Conference

ConferenceICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Country/TerritorySingapore
CitySingapore
Period23/05/2227/05/22

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

  • MISP challenge
  • microphone array
  • audio-visual
  • automatic speech recognition
  • wake word spotting

Fingerprint

Dive into the research topics of 'The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results'. Together they form a unique fingerprint.

Cite this