The effectiveness of self-supervised representation learning in zero-resource subword modeling

Siyuan Feng, Odette Scharenborg

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

32 Downloads (Pure)

Abstract

For a language with no transcribed speech available (the zero-resource scenario), conventional acoustic modeling algorithms are not applicable. Recently, zero-resource acoustic modeling has gained much interest. One research problem is unsupervised subword modeling (USM), i.e., learning a feature representation that can distinguish subword units and is robust to speaker variation. Previous studies showed that self-supervised learning (SSL) has the potential to separate speaker and phonetic information in speech in an unsupervised manner, which is highly desired in USM. This paper compares two representative SSL algorithms, namely, contrastive predictive coding (CPC) and autoregressive predictive coding (APC), as a front-end method of a recently proposed, state-of-the art two-stage approach, to learn a representation as input to a back-end cross-lingual DNN. Experiments show that the bottleneck features extracted by the back-end achieved state of the art in a subword ABX task on the Libri-light and ZeroSpeech databases. In general, CPC is more effective than APC as the front-end in our approach, which is independent of the choice of the out-domain language identity in the back-end cross-lingual DNN and the training data amount. With very limited training data, APC is found similar or more effective than CPC when test data consists of long utterances.
Original languageEnglish
Title of host publication55th Asilomar Conference on Signals, Systems and Computers, ACSSC 2021
Subtitle of host publicationProceedings
EditorsMichael B. Matthews
PublisherIEEE
Pages1414-1418
Number of pages5
ISBN (Electronic)978-1-6654-5828-3
ISBN (Print)978-1-6654-5829-0
DOIs
Publication statusPublished - 2021
Event2021 55th Asilomar Conference on Signals, Systems, and Computers - Pacific Grove, United States
Duration: 31 Oct 20213 Nov 2021
Conference number: 55th

Publication series

NameConference Record - Asilomar Conference on Signals, Systems and Computers
Volume2021-October
ISSN (Print)1058-6393

Conference

Conference2021 55th Asilomar Conference on Signals, Systems, and Computers
Country/TerritoryUnited States
CityPacific Grove
Period31/10/213/11/21

Bibliographical note

Accepted author manuscript

Keywords

  • zero-resource
  • unsupervised subword learning
  • contrastive predictive coding
  • autoregressive predictive coding
  • cross-lingual modeling

Fingerprint

Dive into the research topics of 'The effectiveness of self-supervised representation learning in zero-resource subword modeling'. Together they form a unique fingerprint.

Cite this