Using cross-model learnings for the Gram Vaani ASR Challenge 2022

Tanvina Patel; Odette Scharenborg

doi:10.21437/Interspeech.2022-10639

Using cross-model learnings for the Gram Vaani ASR Challenge 2022

Tanvina Patel, Odette Scharenborg

Multimedia Computing

Research output: Contribution to journal › Conference article › Scientific › peer-review

1 Citation (Scopus)

12 Downloads (Pure)

Abstract

In the diverse and multilingual land of India, Hindi is spoken as a first language by a majority of its population. Efforts are made to obtain data in terms of audio, transcriptions, dictionary, etc. to develop speech-technology applications in Hindi. Similarly, the Gram-Vaani ASR Challenge 2022 provides spontaneous telephone speech, with natural back-ground and regional variations in Hindi. The challenge provides: 100 hours of labeled train-set, 5 hours of labeled dev-set and 1000 hours of unlabeled data-set. For the 'Closed Challenge', we trained an End-to-End (E2E) Conformer model using speed perturbations, SpecAugment techniques and use VTLN to handle any unknown speaker groups in the blind evaluation set. On the dev-set, we achieved a 30.3% WER compared to the 34.8% WER by the Challenge E2E baseline. For the 'Self Supervised Closed Challenge', a semi-supervised learning approach is used. We generate pseudo-transcripts for the unlabeled data using a hybrid TDNN-3gram LM model and trained an E2E model. This is then used as a seed for retraining the E2E model with high confidence data. Cross-model learning and refining of the E2E model gave 25.3% WER on the dev-set compared to ∼33-35% WER by the Challenge baseline that use wav2vec models.

Original language	English
Pages (from-to)	4880-4884
Number of pages	5
Journal	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume	2022-September
DOIs	https://doi.org/10.21437/Interspeech.2022-10639
Publication status	Published - 2022
Event	23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 - Incheon, Korea, Republic of Duration: 18 Sept 2022 → 22 Sept 2022

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

cross-architecture learning
end-to-end ASR
Gram-Vaani Challenge
hybrid ASR
semi-supervised learning

Access to Document

10.21437/Interspeech.2022-10639

patel22_interspeechFinal published version, 439 KB

Cite this

@article{c3fe23e3a02549a6953c30349cae003a,

title = "Using cross-model learnings for the Gram Vaani ASR Challenge 2022",

abstract = "In the diverse and multilingual land of India, Hindi is spoken as a first language by a majority of its population. Efforts are made to obtain data in terms of audio, transcriptions, dictionary, etc. to develop speech-technology applications in Hindi. Similarly, the Gram-Vaani ASR Challenge 2022 provides spontaneous telephone speech, with natural back-ground and regional variations in Hindi. The challenge provides: 100 hours of labeled train-set, 5 hours of labeled dev-set and 1000 hours of unlabeled data-set. For the 'Closed Challenge', we trained an End-to-End (E2E) Conformer model using speed perturbations, SpecAugment techniques and use VTLN to handle any unknown speaker groups in the blind evaluation set. On the dev-set, we achieved a 30.3% WER compared to the 34.8% WER by the Challenge E2E baseline. For the 'Self Supervised Closed Challenge', a semi-supervised learning approach is used. We generate pseudo-transcripts for the unlabeled data using a hybrid TDNN-3gram LM model and trained an E2E model. This is then used as a seed for retraining the E2E model with high confidence data. Cross-model learning and refining of the E2E model gave 25.3% WER on the dev-set compared to ∼33-35% WER by the Challenge baseline that use wav2vec models.",

keywords = "cross-architecture learning, end-to-end ASR, Gram-Vaani Challenge, hybrid ASR, semi-supervised learning",

author = "Tanvina Patel and Odette Scharenborg",

note = "Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.; 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022 ; Conference date: 18-09-2022 Through 22-09-2022",

year = "2022",

doi = "10.21437/Interspeech.2022-10639",

language = "English",

volume = "2022-September",

pages = "4880--4884",

journal = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

issn = "2308-457X",

}

TY - JOUR

T1 - Using cross-model learnings for the Gram Vaani ASR Challenge 2022

AU - Patel, Tanvina

AU - Scharenborg, Odette

N1 - Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

PY - 2022

Y1 - 2022

N2 - In the diverse and multilingual land of India, Hindi is spoken as a first language by a majority of its population. Efforts are made to obtain data in terms of audio, transcriptions, dictionary, etc. to develop speech-technology applications in Hindi. Similarly, the Gram-Vaani ASR Challenge 2022 provides spontaneous telephone speech, with natural back-ground and regional variations in Hindi. The challenge provides: 100 hours of labeled train-set, 5 hours of labeled dev-set and 1000 hours of unlabeled data-set. For the 'Closed Challenge', we trained an End-to-End (E2E) Conformer model using speed perturbations, SpecAugment techniques and use VTLN to handle any unknown speaker groups in the blind evaluation set. On the dev-set, we achieved a 30.3% WER compared to the 34.8% WER by the Challenge E2E baseline. For the 'Self Supervised Closed Challenge', a semi-supervised learning approach is used. We generate pseudo-transcripts for the unlabeled data using a hybrid TDNN-3gram LM model and trained an E2E model. This is then used as a seed for retraining the E2E model with high confidence data. Cross-model learning and refining of the E2E model gave 25.3% WER on the dev-set compared to ∼33-35% WER by the Challenge baseline that use wav2vec models.

AB - In the diverse and multilingual land of India, Hindi is spoken as a first language by a majority of its population. Efforts are made to obtain data in terms of audio, transcriptions, dictionary, etc. to develop speech-technology applications in Hindi. Similarly, the Gram-Vaani ASR Challenge 2022 provides spontaneous telephone speech, with natural back-ground and regional variations in Hindi. The challenge provides: 100 hours of labeled train-set, 5 hours of labeled dev-set and 1000 hours of unlabeled data-set. For the 'Closed Challenge', we trained an End-to-End (E2E) Conformer model using speed perturbations, SpecAugment techniques and use VTLN to handle any unknown speaker groups in the blind evaluation set. On the dev-set, we achieved a 30.3% WER compared to the 34.8% WER by the Challenge E2E baseline. For the 'Self Supervised Closed Challenge', a semi-supervised learning approach is used. We generate pseudo-transcripts for the unlabeled data using a hybrid TDNN-3gram LM model and trained an E2E model. This is then used as a seed for retraining the E2E model with high confidence data. Cross-model learning and refining of the E2E model gave 25.3% WER on the dev-set compared to ∼33-35% WER by the Challenge baseline that use wav2vec models.

KW - cross-architecture learning

KW - end-to-end ASR

KW - Gram-Vaani Challenge

KW - hybrid ASR

KW - semi-supervised learning

UR - http://www.scopus.com/inward/record.url?scp=85140069590&partnerID=8YFLogxK

U2 - 10.21437/Interspeech.2022-10639

DO - 10.21437/Interspeech.2022-10639

M3 - Conference article

AN - SCOPUS:85140069590

SN - 2308-457X

VL - 2022-September

SP - 4880

EP - 4884

JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

T2 - 23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022

Y2 - 18 September 2022 through 22 September 2022

ER -

Using cross-model learnings for the Gram Vaani ASR Challenge 2022

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this