That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages

Piotr Żelasko, Laureano Moro-Velázquez, Mark Hasegawa-Johnson, Odette Scharenborg, Najim Dehak

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

14 Citations (Scopus)
48 Downloads (Pure)

Abstract

Only a handful of the world’s languages are abundant with the resources that enable practical applications of speech processing technologies. One of the methods to overcome this problem is to use the resources existing in other languages to train a multilingual automatic speech recognition (ASR) model, which, intuitively, should learn some universal phonetic representations. In this work, we focus on gaining a deeper understanding of how general these representations might be, and how individual phones are getting improved in a multilingual setting. To that end, we select a phonetically diverse set of languages, and perform a series of monolingual, multilingual and crosslingual (zero-shot) experiments. The ASR is trained to recognize the International Phonetic Alphabet (IPA) token sequences. We observe significant improvements across all languages in the multilingual setting, and stark degradation in the crosslingual setting, where the model, among other errors, considers Javanese as a tone language. Notably, as little as 10 hours of the target language training data tremendously reduces ASR error rates. Our analysis uncovered that even the phones that are unique to a single language can benefit greatly from adding training data from other languages — an encouraging result for the low-resource speech community.
Original languageEnglish
Title of host publicationProceedings of Interspeech 2020
PublisherISCA
Pages3705 - 3709
Number of pages5
DOIs
Publication statusPublished - 2020
EventINTERSPEECH 2020 - Shanghai, Shanghai, China
Duration: 25 Oct 202029 Oct 2020

Publication series

NameInterspeech 2020
PublisherISCA
ISSN (Print)1990-9772

Conference

ConferenceINTERSPEECH 2020
Country/TerritoryChina
CityShanghai
Period25/10/2029/10/20

Keywords

  • Crosslingual
  • Multilingual
  • Phone recognition
  • Speech recognition
  • Transfer learning
  • Zero-shot

Fingerprint

Dive into the research topics of 'That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages'. Together they form a unique fingerprint.

Cite this