The Multimodal Information Based Speech Processing (Misp) 2022 Challenge: Audio-Visual Diarization And Recognition

Zhe Wang, Shilong Wu, Hang Chen, Mao-Kui He, Jun Du*, Chin-Hui Lee, Jingdong Chen, Shinji Watanabe, Sabato Marco Siniscalchi, Odette Scharenborg, Diyuan Liu, More Authors

*Corresponding author for this work

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

22 Downloads (Pure)

Abstract

The Multi-modal Information based Speech Processing (MISP) challenge aims to extend the application of signal processing technology in specific scenarios by promoting the research into wake-up words, speaker diarization, speech recognition, and other technologies. The MISP2022 challenge has two tracks: 1) audio-visual speaker diarization (AVSD), aiming to solve "who spoken when" using both audio and visual data; 2) a novel audio-visual diarization and recognition (AVDR) task that focuses on addressing "who spoken what when" with audio-visual speaker diarization results. Both tracks focus on the Chinese language, and use far-field audio and video in real home-tv scenarios: 2-6 people communicating each other with TV noise in the background. This paper introduces the dataset, track settings, and baselines of the MISP2022 challenge. Our analyses of experiments and examples indicate the good performance of AVDR baseline system, and the potential difficulties in this challenge due to, e.g., the far-field video quality, the presence of TV noise in the background, and the indistinguishable speakers.
Original languageEnglish
Title of host publicationProceedings of the ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Place of PublicationPiscataway
PublisherIEEE
Number of pages5
ISBN (Electronic)978-1-7281-6327-7
ISBN (Print)978-1-7281-6328-4
DOIs
Publication statusPublished - 2023
Event48th IEEE International Conference on Acoustics, Speech and Signal Processing 2023 - Rhodes Island, Greece
Duration: 4 Jun 202310 Jun 2023

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2023-June
ISSN (Print)1520-6149

Conference

Conference48th IEEE International Conference on Acoustics, Speech and Signal Processing 2023
Abbreviated titleICASSP 2023
Country/TerritoryGreece
CityRhodes Island
Period4/06/2310/06/23

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

  • MISP challenge
  • speaker diarization
  • speech recognition
  • multimodality

Fingerprint

Dive into the research topics of 'The Multimodal Information Based Speech Processing (Misp) 2022 Challenge: Audio-Visual Diarization And Recognition'. Together they form a unique fingerprint.

Cite this