Speech technology for unwritten languages

Odette Scharenborg; Laurent Besacier; Alan W. Black; Mark Hasegawa-Johnson; Florian Metze; Graham Neubig; Sebastian Stueker; Pierre Godard; M Mueller; null More Authors

doi:10.1109/TASLP.2020.2973896

Speech technology for unwritten languages

Odette Scharenborg, Laurent Besacier, Alan W. Black, Mark Hasegawa-Johnson, Florian Metze, Graham Neubig, Sebastian Stueker, Pierre Godard, M Mueller, More Authors

Multimedia Computing

Research output: Contribution to journal › Article › Scientific › peer-review

13 Citations (Scopus)

80 Downloads (Pure)

Abstract

Speech technology plays an important role in our everyday life. Among others, speech is used for human-computer interaction, for instance for information retrieval and on-line shopping. In the case of an unwritten language, however, speech technology is unfortunately difficult to create, because it cannot be created by the standard combination of pre-trained speech-to-text and text-to-speech subsystems. The research presented in this article takes the first steps towards speech technology for unwritten languages. Specifically, the aim of this work was 1) to learn speech-to-meaning representations without using text as an intermediate representation, and 2) to test the sufficiency of the learned representations to regenerate speech or translated text, or to retrieve images that depict the meaning of an utterance in an unwritten language. The results suggest that building systems that go directly from speech-to-meaning and from meaning-to-speech, bypassing the need for text, is possible.

Original language	English
Article number	8998182
Pages (from-to)	964-975
Number of pages	12
Journal	IEEE/ACM Transactions on Audio Speech and Language Processing
Volume	28
DOIs	https://doi.org/10.1109/TASLP.2020.2973896
Publication status	Published - 2020

Keywords

Speech processing
automatic speech recognition
image retrieval
speech synthesis
unsupervised learning

Access to Document

10.1109/TASLP.2020.2973896

08998182Accepted author manuscript, 2.96 MB

Cite this

@article{8eb3b72daff14e83aef1f7568b49426d,

title = "Speech technology for unwritten languages",

abstract = "Speech technology plays an important role in our everyday life. Among others, speech is used for human-computer interaction, for instance for information retrieval and on-line shopping. In the case of an unwritten language, however, speech technology is unfortunately difficult to create, because it cannot be created by the standard combination of pre-trained speech-to-text and text-to-speech subsystems. The research presented in this article takes the first steps towards speech technology for unwritten languages. Specifically, the aim of this work was 1) to learn speech-to-meaning representations without using text as an intermediate representation, and 2) to test the sufficiency of the learned representations to regenerate speech or translated text, or to retrieve images that depict the meaning of an utterance in an unwritten language. The results suggest that building systems that go directly from speech-to-meaning and from meaning-to-speech, bypassing the need for text, is possible.",

keywords = "Speech processing, automatic speech recognition, image retrieval, speech synthesis, unsupervised learning",

author = "Odette Scharenborg and Laurent Besacier and Black, {Alan W.} and Mark Hasegawa-Johnson and Florian Metze and Graham Neubig and Sebastian Stueker and Pierre Godard and M Mueller and {More Authors}",

year = "2020",

doi = "10.1109/TASLP.2020.2973896",

language = "English",

volume = "28",

pages = "964--975",

journal = "IEEE/ACM Transactions on Audio Speech and Language Processing",

issn = "2329-9290",

publisher = "IEEE Advancing Technology for Humanity",

}

TY - JOUR

T1 - Speech technology for unwritten languages

AU - Scharenborg, Odette

AU - Besacier, Laurent

AU - Black, Alan W.

AU - Hasegawa-Johnson, Mark

AU - Metze, Florian

AU - Neubig, Graham

AU - Stueker, Sebastian

AU - Godard, Pierre

AU - Mueller, M

AU - More Authors, null

PY - 2020

Y1 - 2020

N2 - Speech technology plays an important role in our everyday life. Among others, speech is used for human-computer interaction, for instance for information retrieval and on-line shopping. In the case of an unwritten language, however, speech technology is unfortunately difficult to create, because it cannot be created by the standard combination of pre-trained speech-to-text and text-to-speech subsystems. The research presented in this article takes the first steps towards speech technology for unwritten languages. Specifically, the aim of this work was 1) to learn speech-to-meaning representations without using text as an intermediate representation, and 2) to test the sufficiency of the learned representations to regenerate speech or translated text, or to retrieve images that depict the meaning of an utterance in an unwritten language. The results suggest that building systems that go directly from speech-to-meaning and from meaning-to-speech, bypassing the need for text, is possible.

AB - Speech technology plays an important role in our everyday life. Among others, speech is used for human-computer interaction, for instance for information retrieval and on-line shopping. In the case of an unwritten language, however, speech technology is unfortunately difficult to create, because it cannot be created by the standard combination of pre-trained speech-to-text and text-to-speech subsystems. The research presented in this article takes the first steps towards speech technology for unwritten languages. Specifically, the aim of this work was 1) to learn speech-to-meaning representations without using text as an intermediate representation, and 2) to test the sufficiency of the learned representations to regenerate speech or translated text, or to retrieve images that depict the meaning of an utterance in an unwritten language. The results suggest that building systems that go directly from speech-to-meaning and from meaning-to-speech, bypassing the need for text, is possible.

KW - Speech processing

KW - automatic speech recognition

KW - image retrieval

KW - speech synthesis

KW - unsupervised learning

UR - http://www.scopus.com/inward/record.url?scp=85079642575&partnerID=8YFLogxK

U2 - 10.1109/TASLP.2020.2973896

DO - 10.1109/TASLP.2020.2973896

M3 - Article

AN - SCOPUS:85079642575

SN - 2329-9290

VL - 28

SP - 964

EP - 975

JO - IEEE/ACM Transactions on Audio Speech and Language Processing

JF - IEEE/ACM Transactions on Audio Speech and Language Processing

M1 - 8998182

ER -

Speech technology for unwritten languages

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this