TY - GEN
T1 - On the Influence of Optimizers in Deep Learning-Based Side-Channel Analysis
AU - Perin, Guilherme
AU - Picek, Stjepan
PY - 2021
Y1 - 2021
N2 - The deep learning-based side-channel analysis represents a powerful and easy to deploy option for profiling side-channel attacks. A detailed tuning phase is often required to reach a good performance where one first needs to select relevant hyperparameters and then tune them. A common selection for the tuning phase are hyperparameters connected with the neural network architecture, while those influencing the training process are less explored. In this work, we concentrate on the optimizer hyperparameter, and we show that this hyperparameter has a significant role in the attack performance. Our results show that common choices of optimizers (Adam and RMSprop) indeed work well, but they easily overfit, which means that we must use short training phases, small profiling models, and explicit regularization. On the other hand, SGD type of optimizers works well on average (slower convergence and less overfit), but only if momentum is used. Finally, our results show that Adagrad represents a strong option to use in scenarios with longer training phases or larger profiling models.
AB - The deep learning-based side-channel analysis represents a powerful and easy to deploy option for profiling side-channel attacks. A detailed tuning phase is often required to reach a good performance where one first needs to select relevant hyperparameters and then tune them. A common selection for the tuning phase are hyperparameters connected with the neural network architecture, while those influencing the training process are less explored. In this work, we concentrate on the optimizer hyperparameter, and we show that this hyperparameter has a significant role in the attack performance. Our results show that common choices of optimizers (Adam and RMSprop) indeed work well, but they easily overfit, which means that we must use short training phases, small profiling models, and explicit regularization. On the other hand, SGD type of optimizers works well on average (slower convergence and less overfit), but only if momentum is used. Finally, our results show that Adagrad represents a strong option to use in scenarios with longer training phases or larger profiling models.
KW - Neural networks
KW - Optimizers
KW - Profiling attacks
KW - Side-channel analysis
UR - http://www.scopus.com/inward/record.url?scp=85113487267&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-81652-0_24
DO - 10.1007/978-3-030-81652-0_24
M3 - Conference contribution
AN - SCOPUS:85113487267
SN - 978-3-030-81651-3
VL - 12804
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 615
EP - 636
BT - Selected Areas in Cryptography
A2 - Dunkelman, Orr
A2 - Jacobson, Jr., Michael J.
A2 - O’Flynn, Colin
PB - Springer Science+Business Media
CY - Cham
T2 - 27th International Conference on Selected Areas in Cryptography, SAC 2020
Y2 - 21 October 2020 through 23 October 2020
ER -