Unsupervised Subword Modeling Using Autoregressive Pretraining and Cross-Lingual Phone-Aware Modeling

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

2 Citations (Scopus)
8 Downloads (Pure)


This study addresses unsupervised subword modeling, i.e.,
learning feature representations that can distinguish subword
units of a language. The proposed approach adopts a two-stage
bottleneck feature (BNF) learning framework, consisting of autoregressive
predictive coding (APC) as a front-end and a DNNBNF
model as a back-end. APC pretrained features are set as
input features to a DNN-BNF model. A language-mismatched
ASR system is used to provide cross-lingual phone labels for
DNN-BNF model training. Finally, BNFs are extracted as the
subword-discriminative feature representation. A second aim of
this work is to investigate the robustness of our approach’s effectiveness
to different amounts of training data. The results on
Libri-light and the ZeroSpeech 2017 databases show that APC
is effective in front-end feature pretraining. Our whole system
outperforms the state of the art on both databases. Cross-lingual
phone labels for English data by a Dutch ASR outperform those
by a Mandarin ASR, possibly linked to the larger similarity of
Dutch compared to Mandarin with English. Our system is less
sensitive to training data amount when the training data is over
50 hours. APC pretraining leads to a reduction of needed training
material from over 5,000 hours to around 200 hours with
little performance degradation.
Original languageEnglish
Title of host publicationProceedings of Interspeech 2020
Pages2732 - 2736
Number of pages5
Publication statusPublished - 2020
EventINTERSPEECH 2020 - Shanghai, Shanghai, China
Duration: 25 Oct 202029 Oct 2020

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISSN (Print)2308-457X


ConferenceINTERSPEECH 2020


  • Autoregressive predictive coding
  • Cross-lingual knowledge transfer
  • Unsupervised subword modeling


Dive into the research topics of 'Unsupervised Subword Modeling Using Autoregressive Pretraining and Cross-Lingual Phone-Aware Modeling'. Together they form a unique fingerprint.

Cite this