Abstract
We propose two end-to-end neural configurations for language diarization on bilingual code-switching speech. The first, a BLSTM-E2E architecture, includes a set of stacked bidirectional LSTMs to compute embeddings and incorporates the deep clustering loss to enforce grouping of languages belonging to the same class. The second, an XSA-E2E architecture, is based on an x-vector model followed by a self-attention encoder. The former encodes frame-level features into segmentlevel embeddings while the latter considers all those embeddings to generate a sequence of segment-level language labels. We evaluated the proposed methods on the dataset obtained from the shared task B in WSTCSMC 2020 and our handcrafted simulated data from the SEAME dataset. Experimental results show that our proposed XSA-E2E architecture achieved a relative improvement of 12.1% in equal error rate and a 7.4% relative improvement on accuracy compared with the baseline algorithm in the WSTCSMC 2020 dataset. Our proposed XSA-E2E architecture achieved an accuracy of 89.84% with a baseline of 85.60% on the simulated data derived from the SEAME dataset.
Original language | English |
---|---|
Title of host publication | 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 |
Publisher | International Speech Communication Association |
Pages | 866-870 |
Number of pages | 5 |
ISBN (Electronic) | 9781713836902 |
DOIs | |
Publication status | Published - 2021 |
Event | 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 - Brno, Czech Republic Duration: 30 Aug 2021 → 3 Sept 2021 |
Publication series
Name | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
---|---|
Volume | 2 |
ISSN (Print) | 2308-457X |
ISSN (Electronic) | 1990-9772 |
Conference
Conference | 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 |
---|---|
Country/Territory | Czech Republic |
City | Brno |
Period | 30/08/21 → 3/09/21 |
Bibliographical note
Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-careOtherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.
Keywords
- Code-switching
- End-to-end neural diarization
- Language diarization
- Language identification
- Self-attention