Recurrent Knowledge Distillation

Silvia L. Pintea; Yue Liu; Jan van Gemert

doi:10.1109/ICIP.2018.8451253

Recurrent Knowledge Distillation

Silvia L. Pintea, Yue Liu, Jan van Gemert

Pattern Recognition and Bioinformatics

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

2 Citations (Scopus)

23 Downloads (Pure)

Abstract

Knowledge distillation compacts deep networks by letting a small student network learn from a large teacher network. The accuracy of knowledge distillation recently benefited from adding residual layers. We propose to reduce the size of the student network even further by recasting multiple residual layers in the teacher network into a single recurrent student layer. We propose three variants of adding recurrent connections into the student network, and show experimentally on CIFAR-10, Scenes and MiniPlaces, that we can reduce the number of parameters at little loss in accuracy.

Original language	English
Title of host publication	2018 25th IEEE International Conference on Image Processing (ICIP)
Subtitle of host publication	Proceedings
Place of Publication	Piscataway
Publisher	IEEE
Pages	3393-3397
Number of pages	5
ISBN (Electronic)	978-1-4799-7061-2
ISBN (Print)	978-1-4799-7062-9
DOIs	https://doi.org/10.1109/ICIP.2018.8451253
Publication status	Published - 2018
Event	25th IEEE International Conference on Image Processing - Athens, Greece Duration: 7 Oct 2018 → 10 Oct 2018 Conference number: 25

Conference

Conference	25th IEEE International Conference on Image Processing
Abbreviated title	ICIP 2018
Country/Territory	Greece
City	Athens
Period	7/10/18 → 10/10/18

Keywords

Knowledge distillation
compacting deep representations for image classification
recurrent layers

Access to Document

10.1109/ICIP.2018.8451253

1805.07170Accepted author manuscript, 375 KB

Cite this

@inproceedings{f6e444abfccd4a138e96109d4027497f,

title = "Recurrent Knowledge Distillation",

abstract = "Knowledge distillation compacts deep networks by letting a small student network learn from a large teacher network. The accuracy of knowledge distillation recently benefited from adding residual layers. We propose to reduce the size of the student network even further by recasting multiple residual layers in the teacher network into a single recurrent student layer. We propose three variants of adding recurrent connections into the student network, and show experimentally on CIFAR-10, Scenes and MiniPlaces, that we can reduce the number of parameters at little loss in accuracy.",

keywords = "Knowledge distillation, compacting deep representations for image classification, recurrent layers",

author = "Pintea, {Silvia L.} and Yue Liu and {van Gemert}, Jan",

year = "2018",

doi = "10.1109/ICIP.2018.8451253",

language = "English",

isbn = "978-1-4799-7062-9",

pages = "3393--3397",

booktitle = "2018 25th IEEE International Conference on Image Processing (ICIP)",

publisher = "IEEE",

address = "United States",

note = "25th IEEE International Conference on Image Processing , ICIP 2018 ; Conference date: 07-10-2018 Through 10-10-2018",

}

TY - GEN

T1 - Recurrent Knowledge Distillation

AU - Pintea, Silvia L.

AU - Liu, Yue

AU - van Gemert, Jan

N1 - Conference code: 25

PY - 2018

Y1 - 2018

N2 - Knowledge distillation compacts deep networks by letting a small student network learn from a large teacher network. The accuracy of knowledge distillation recently benefited from adding residual layers. We propose to reduce the size of the student network even further by recasting multiple residual layers in the teacher network into a single recurrent student layer. We propose three variants of adding recurrent connections into the student network, and show experimentally on CIFAR-10, Scenes and MiniPlaces, that we can reduce the number of parameters at little loss in accuracy.

AB - Knowledge distillation compacts deep networks by letting a small student network learn from a large teacher network. The accuracy of knowledge distillation recently benefited from adding residual layers. We propose to reduce the size of the student network even further by recasting multiple residual layers in the teacher network into a single recurrent student layer. We propose three variants of adding recurrent connections into the student network, and show experimentally on CIFAR-10, Scenes and MiniPlaces, that we can reduce the number of parameters at little loss in accuracy.

KW - Knowledge distillation

KW - compacting deep representations for image classification

KW - recurrent layers

U2 - 10.1109/ICIP.2018.8451253

DO - 10.1109/ICIP.2018.8451253

M3 - Conference contribution

SN - 978-1-4799-7062-9

SP - 3393

EP - 3397

BT - 2018 25th IEEE International Conference on Image Processing (ICIP)

PB - IEEE

CY - Piscataway

T2 - 25th IEEE International Conference on Image Processing

Y2 - 7 October 2018 through 10 October 2018

ER -

Recurrent Knowledge Distillation

Abstract

Conference

Keywords

Access to Document

Fingerprint

Cite this