The Time-Course of Phoneme Category Adaptation in Deep Neural Networks

Junrui  Ni; Mark Hasegawa-Johnson; Odette Scharenborg

doi:10.1007/978-3-030-31372-2_1

The Time-Course of Phoneme Category Adaptation in Deep Neural Networks

Junrui Ni, Mark Hasegawa-Johnson, Odette Scharenborg

Multimedia Computing

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

47 Downloads (Pure)

Abstract

Both human listeners and machines need to adapt their sound categories whenever a new speaker is encountered. This perceptual learning is driven by lexical information. In previous work, we have shown that deep neural network-based (DNN) ASR systems can learn to adapt their phoneme category boundaries from a few labeled examples after exposure (i.e., training) to ambiguous sounds, as humans have been found to do. Here, we investigate the time-course of phoneme category adaptation in a DNN in more detail, with the ultimate aim to investigate the DNN’s ability to serve as a model of human perceptual learning. We do so by providing the DNN with an increasing number of ambiguous retraining tokens (in 10 bins of 4 ambiguous items), and comparing classification accuracy on the ambiguous items in a held-out test set for the different bins. Results showed that DNNs, similar to human listeners, show a step-like function: The DNNs show perceptual learning already after the first bin (only 4 tokens of the ambiguous phone), with little further adaptation for subsequent bins. In follow-up research, we plan to test specific predictions made by the DNN about human speech processing.

Original language	English
Title of host publication	Statistical Language and Speech Processing
Subtitle of host publication	7th International Conference, SLSP 2019
Editors	C. Martín-Vide, M. Purver, S. Pollak
Place of Publication	Cham
Publisher	Springer
Pages	3-15
Number of pages	13
ISBN (Electronic)	978-3-030-31372-2
ISBN (Print)	978-3-030-31371-5
DOIs	https://doi.org/10.1007/978-3-030-31372-2_1
Publication status	Published - 2019
Event	SLSP 2019: Statistical Language and Speech Processing - Ljubljana, Slovenia Duration: 14 Oct 2019 → 16 Oct 2019 Conference number: 7th

Publication series

Name	Part of the Lecture Notes in Computer Science book series, Also part of the Lecture Notes in Artificial Intelligence book sub series
Publisher	Springer
Volume	11816

Conference

Conference	SLSP 2019
Country/Territory	Slovenia
City	Ljubljana
Period	14/10/19 → 16/10/19

Keywords

Phoneme category adaptation
Human perceptual learning
Deep neural networks
Time-course

Access to Document

10.1007/978-3-030-31372-2_1

Ni2019_Chapter_TheTime-CourseOfPhonemeCategorFinal published version, 1.36 MB

Cite this

Ni, J., Hasegawa-Johnson, M., & Scharenborg, O. (2019). The Time-Course of Phoneme Category Adaptation in Deep Neural Networks. In C. Martín-Vide, M. Purver, & S. Pollak (Eds.), Statistical Language and Speech Processing: 7th International Conference, SLSP 2019 (pp. 3-15). (Part of the Lecture Notes in Computer Science book series, Also part of the Lecture Notes in Artificial Intelligence book sub series ; Vol. 11816). Springer. https://doi.org/10.1007/978-3-030-31372-2_1

Ni, Junrui ; Hasegawa-Johnson, Mark ; Scharenborg, Odette. / The Time-Course of Phoneme Category Adaptation in Deep Neural Networks. Statistical Language and Speech Processing: 7th International Conference, SLSP 2019. editor / C. Martín-Vide ; M. Purver ; S. Pollak. Cham : Springer, 2019. pp. 3-15 (Part of the Lecture Notes in Computer Science book series, Also part of the Lecture Notes in Artificial Intelligence book sub series ).

@inproceedings{84550c23c8374816b8e8fa4ba51b5b07,

title = "The Time-Course of Phoneme Category Adaptation in Deep Neural Networks",

abstract = "Both human listeners and machines need to adapt their sound categories whenever a new speaker is encountered. This perceptual learning is driven by lexical information. In previous work, we have shown that deep neural network-based (DNN) ASR systems can learn to adapt their phoneme category boundaries from a few labeled examples after exposure (i.e., training) to ambiguous sounds, as humans have been found to do. Here, we investigate the time-course of phoneme category adaptation in a DNN in more detail, with the ultimate aim to investigate the DNN{\textquoteright}s ability to serve as a model of human perceptual learning. We do so by providing the DNN with an increasing number of ambiguous retraining tokens (in 10 bins of 4 ambiguous items), and comparing classification accuracy on the ambiguous items in a held-out test set for the different bins. Results showed that DNNs, similar to human listeners, show a step-like function: The DNNs show perceptual learning already after the first bin (only 4 tokens of the ambiguous phone), with little further adaptation for subsequent bins. In follow-up research, we plan to test specific predictions made by the DNN about human speech processing.",

keywords = "Phoneme category adaptation, Human perceptual learning, Deep neural networks, Time-course",

author = "Junrui Ni and Mark Hasegawa-Johnson and Odette Scharenborg",

year = "2019",

doi = "10.1007/978-3-030-31372-2_1",

language = "English",

isbn = "978-3-030-31371-5",

series = "Part of the Lecture Notes in Computer Science book series, Also part of the Lecture Notes in Artificial Intelligence book sub series ",

publisher = "Springer",

pages = "3--15",

editor = "C. Mart{\'i}n-Vide and M. Purver and S. Pollak",

booktitle = "Statistical Language and Speech Processing",

note = "SLSP 2019 : Statistical Language and Speech Processing ; Conference date: 14-10-2019 Through 16-10-2019",

}

Ni, J, Hasegawa-Johnson, M & Scharenborg, O 2019, The Time-Course of Phoneme Category Adaptation in Deep Neural Networks. in C Martín-Vide, M Purver & S Pollak (eds), Statistical Language and Speech Processing: 7th International Conference, SLSP 2019. Part of the Lecture Notes in Computer Science book series, Also part of the Lecture Notes in Artificial Intelligence book sub series , vol. 11816, Springer, Cham, pp. 3-15, SLSP 2019, Ljubljana, Slovenia, 14/10/19. https://doi.org/10.1007/978-3-030-31372-2_1

The Time-Course of Phoneme Category Adaptation in Deep Neural Networks. / Ni, Junrui ; Hasegawa-Johnson, Mark; Scharenborg, Odette.
Statistical Language and Speech Processing: 7th International Conference, SLSP 2019. ed. / C. Martín-Vide; M. Purver; S. Pollak. Cham: Springer, 2019. p. 3-15 (Part of the Lecture Notes in Computer Science book series, Also part of the Lecture Notes in Artificial Intelligence book sub series ; Vol. 11816).

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - The Time-Course of Phoneme Category Adaptation in Deep Neural Networks

AU - Ni, Junrui

AU - Hasegawa-Johnson, Mark

AU - Scharenborg, Odette

N1 - Conference code: 7th

PY - 2019

Y1 - 2019

N2 - Both human listeners and machines need to adapt their sound categories whenever a new speaker is encountered. This perceptual learning is driven by lexical information. In previous work, we have shown that deep neural network-based (DNN) ASR systems can learn to adapt their phoneme category boundaries from a few labeled examples after exposure (i.e., training) to ambiguous sounds, as humans have been found to do. Here, we investigate the time-course of phoneme category adaptation in a DNN in more detail, with the ultimate aim to investigate the DNN’s ability to serve as a model of human perceptual learning. We do so by providing the DNN with an increasing number of ambiguous retraining tokens (in 10 bins of 4 ambiguous items), and comparing classification accuracy on the ambiguous items in a held-out test set for the different bins. Results showed that DNNs, similar to human listeners, show a step-like function: The DNNs show perceptual learning already after the first bin (only 4 tokens of the ambiguous phone), with little further adaptation for subsequent bins. In follow-up research, we plan to test specific predictions made by the DNN about human speech processing.

AB - Both human listeners and machines need to adapt their sound categories whenever a new speaker is encountered. This perceptual learning is driven by lexical information. In previous work, we have shown that deep neural network-based (DNN) ASR systems can learn to adapt their phoneme category boundaries from a few labeled examples after exposure (i.e., training) to ambiguous sounds, as humans have been found to do. Here, we investigate the time-course of phoneme category adaptation in a DNN in more detail, with the ultimate aim to investigate the DNN’s ability to serve as a model of human perceptual learning. We do so by providing the DNN with an increasing number of ambiguous retraining tokens (in 10 bins of 4 ambiguous items), and comparing classification accuracy on the ambiguous items in a held-out test set for the different bins. Results showed that DNNs, similar to human listeners, show a step-like function: The DNNs show perceptual learning already after the first bin (only 4 tokens of the ambiguous phone), with little further adaptation for subsequent bins. In follow-up research, we plan to test specific predictions made by the DNN about human speech processing.

KW - Phoneme category adaptation

KW - Human perceptual learning

KW - Deep neural networks

KW - Time-course

U2 - 10.1007/978-3-030-31372-2_1

DO - 10.1007/978-3-030-31372-2_1

M3 - Conference contribution

SN - 978-3-030-31371-5

T3 - Part of the Lecture Notes in Computer Science book series, Also part of the Lecture Notes in Artificial Intelligence book sub series

SP - 3

EP - 15

BT - Statistical Language and Speech Processing

A2 - Martín-Vide, C.

A2 - Purver, M.

A2 - Pollak, S.

PB - Springer

CY - Cham

T2 - SLSP 2019

Y2 - 14 October 2019 through 16 October 2019

ER -

Ni J, Hasegawa-Johnson M, Scharenborg O. The Time-Course of Phoneme Category Adaptation in Deep Neural Networks. In Martín-Vide C, Purver M, Pollak S, editors, Statistical Language and Speech Processing: 7th International Conference, SLSP 2019. Cham: Springer. 2019. p. 3-15. (Part of the Lecture Notes in Computer Science book series, Also part of the Lecture Notes in Artificial Intelligence book sub series ). doi: 10.1007/978-3-030-31372-2_1

The Time-Course of Phoneme Category Adaptation in Deep Neural Networks

Abstract

Publication series

Conference

Keywords

Access to Document

Fingerprint

Cite this