Building an ASR System for Mboshi Using A Cross-language Definition of Acoustic Units Approach

Odette Scharenborg, Patrick Ebel, Francesco Ciannella, Mark Hasegawa-Johnson, Najim Dehak

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

50 Downloads (Pure)

Abstract

For many languages in the world, not enough (annotated) speech data is available to train an ASR system. Recently, we proposed a cross-language method for training an ASR system using linguistic knowledge and semi-supervised training. Here, we apply this approach to the low-resource language Mboshi. Using an ASR system trained on Dutch, Mboshi acoustic units were first created using cross-language initialization of the phoneme vectors in the output layer. Subsequently, this adapted system was retrained using Mboshi self-labels. Two training methods were investigated: retraining of only the output layer and retraining the full deep neural network (DNN). The resulting Mboshi system was analyzed by investigating per phoneme accuracies, phoneme confusions, and by visualizing the hidden layers of the DNNs prior to and following retraining with the self-labels. Results showed a fairly similar performance for the two training methods but a better phoneme representation for the fully retrained DNN.
Original languageEnglish
Title of host publicationProceedings of the 6th Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU)
Subtitle of host publication29-31 August 2018, Gurugram, India
Place of PublicationNew Delhi, India
PublisherISCA
Pages167-171
Number of pages5
DOIs
Publication statusPublished - 2018
Event6th Workshop on Spoken Language Technologies for Under-resourced Languages - New Delhi, India
Duration: 29 Aug 201831 Aug 2018

Workshop

Workshop6th Workshop on Spoken Language Technologies for Under-resourced Languages
Abbreviated titleSLTU
CountryIndia
CityNew Delhi
Period29/08/1831/08/18

Keywords

  • Low-resource automatic speech recognition
  • Cross-language adaptation
  • n, Semi-supervised training

Fingerprint Dive into the research topics of 'Building an ASR System for Mboshi Using A Cross-language Definition of Acoustic Units Approach'. Together they form a unique fingerprint.

  • Cite this

    Scharenborg, O., Ebel, P., Ciannella, F., Hasegawa-Johnson, M., & Dehak, N. (2018). Building an ASR System for Mboshi Using A Cross-language Definition of Acoustic Units Approach. In Proceedings of the 6th Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU): 29-31 August 2018, Gurugram, India (pp. 167-171). ISCA. https://doi.org/10.21437/SLTU.2018-35