The deep learning-based side-channel analysis represents a powerful and easy to deploy option for profiling side-channel attacks. A detailed tuning phase is often required to reach a good performance where one first needs to select relevant hyperparameters and then tune them. A common selection for the tuning phase are hyperparameters connected with the neural network architecture, while those influencing the training process are less explored. In this work, we concentrate on the optimizer hyperparameter, and we show that this hyperparameter has a significant role in the attack performance. Our results show that common choices of optimizers (Adam and RMSprop) indeed work well, but they easily overfit, which means that we must use short training phases, small profiling models, and explicit regularization. On the other hand, SGD type of optimizers works well on average (slower convergence and less overfit), but only if momentum is used. Finally, our results show that Adagrad represents a strong option to use in scenarios with longer training phases or larger profiling models.