Abstract
In this paper we discuss the rational of the Multi-model Information based Speech Processing (MISP) Challenge, and provide a detailed description of the data recorded, the two evaluation tasks and the corresponding baselines, followed by a summary of submitted systems and evaluation results. The MISP Challenge aims at tack-ling speech processing tasks in different scenarios by introducing information about an additional modality (e.g., video, or text), which will hopefully lead to better environmental and speaker robustness in realistic applications. In the first MISP challenge, two bench-mark datasets recorded in a real-home TV room with two reproducible open-source baseline systems have been released to promote research in audio-visual wake word spotting (AVWWS) and audio-visual speech recognition (AVSR). To our knowledge, MISP is the first open evaluation challenge to tackle real-world issues of AVWWS and AVSR in the home TV scenario.
Original language | English |
---|---|
Title of host publication | Proceedings of the ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
Place of Publication | Piscataway |
Publisher | IEEE |
Pages | 9266-9270 |
Number of pages | 5 |
ISBN (Electronic) | 978-1-6654-0540-9 |
ISBN (Print) | 978-1-6654-0541-6 |
DOIs | |
Publication status | Published - 2022 |
Event | ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - Singapore, Singapore Duration: 23 May 2022 → 27 May 2022 |
Conference
Conference | ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
---|---|
Country/Territory | Singapore |
City | Singapore |
Period | 23/05/22 → 27/05/22 |
Bibliographical note
Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-careOtherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.
Keywords
- MISP challenge
- microphone array
- audio-visual
- automatic speech recognition
- wake word spotting