Unsupervised acoustic unit discovery by leveraging a language-independent subword discriminative feature representation

Siyuan Feng, Piotr Zelasko, Laureano Moro-Velázquez, Odette Scharenborg

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

5 Downloads (Pure)

Abstract

This paper tackles automatically discovering phone-like acoustic units (AUD) from unlabeled speech data. Past studies usually proposed single-step approaches. We propose a twostage approach: the first stage learns a subword-discriminative feature representation, and the second stage applies clustering to the learned representation and obtains phone-like clusters as the discovered acoustic units. In the first stage, a recently proposed method in the task of unsupervised subword modeling is improved by replacing a monolingual outof-domain (OOD) ASR system with a multilingual one to create a subword-discriminative representation that is more language-independent. In the second stage, segment-level kmeans is adopted, and two methods to represent the variablelength speech segments as fixed-dimension feature vectors are compared. Experiments on a very low-resource Mboshi language corpus show that our approach outperforms state-of-theart AUD in both normalized mutual information (NMI) and F-score. The multilingual ASR improved upon the monolingual ASR in providing OOD phone labels and in estimating the phone boundaries. A comparison of our systems with and without knowing the ground-truth phone boundaries showed a 16% NMI performance gap, suggesting that the current approach can significantly benefit from improved phone boundary estimation.

Original languageEnglish
Title of host publication22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
PublisherInternational Speech Communication Association
Pages1534-1538
Number of pages5
ISBN (Electronic)9781713836902
DOIs
Publication statusPublished - 2021
Event22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 - Brno, Czech Republic
Duration: 30 Aug 20213 Sep 2021

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2
ISSN (Print)2308-457X
ISSN (Electronic)1990-9772

Conference

Conference22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
Country/TerritoryCzech Republic
CityBrno
Period30/08/213/09/21

Keywords

  • Acoustic unit discovery
  • Unsupervised subword modeling
  • Zero-resource

Fingerprint

Dive into the research topics of 'Unsupervised acoustic unit discovery by leveraging a language-independent subword discriminative feature representation'. Together they form a unique fingerprint.

Cite this