Abstract
For a language with no transcribed speech available (the zero-resource scenario), conventional acoustic modeling algorithms are not applicable. Recently, zero-resource acoustic modeling has gained much interest. One research problem is unsupervised subword modeling (USM), i.e., learning a feature representation that can distinguish subword units and is robust to speaker variation. Previous studies showed that self-supervised learning (SSL) has the potential to separate speaker and phonetic information in speech in an unsupervised manner, which is highly desired in USM. This paper compares two representative SSL algorithms, namely, contrastive predictive coding (CPC) and autoregressive predictive coding (APC), as a front-end method of a recently proposed, state-of-the art two-stage approach, to learn a representation as input to a back-end cross-lingual DNN. Experiments show that the bottleneck features extracted by the back-end achieved state of the art in a subword ABX task on the Libri-light and ZeroSpeech databases. In general, CPC is more effective than APC as the front-end in our approach, which is independent of the choice of the out-domain language identity in the back-end cross-lingual DNN and the training data amount. With very limited training data, APC is found similar or more effective than CPC when test data consists of long utterances.
Original language | English |
---|---|
Title of host publication | 55th Asilomar Conference on Signals, Systems and Computers, ACSSC 2021 |
Subtitle of host publication | Proceedings |
Editors | Michael B. Matthews |
Publisher | IEEE |
Pages | 1414-1418 |
Number of pages | 5 |
ISBN (Electronic) | 978-1-6654-5828-3 |
ISBN (Print) | 978-1-6654-5829-0 |
DOIs | |
Publication status | Published - 2021 |
Event | 2021 55th Asilomar Conference on Signals, Systems, and Computers - Pacific Grove, United States Duration: 31 Oct 2021 → 3 Nov 2021 Conference number: 55th |
Publication series
Name | Conference Record - Asilomar Conference on Signals, Systems and Computers |
---|---|
Volume | 2021-October |
ISSN (Print) | 1058-6393 |
Conference
Conference | 2021 55th Asilomar Conference on Signals, Systems, and Computers |
---|---|
Country/Territory | United States |
City | Pacific Grove |
Period | 31/10/21 → 3/11/21 |
Bibliographical note
Accepted author manuscriptKeywords
- zero-resource
- unsupervised subword learning
- contrastive predictive coding
- autoregressive predictive coding
- cross-lingual modeling