TY - GEN
T1 - Heed the noise in performance evaluations in neural architecture search
AU - Dushatskiy, Arkadiy
AU - Alderliesten, Tanja
AU - Bosman, Peter A.N.
PY - 2022
Y1 - 2022
N2 - Neural Architecture Search (NAS) has recently become a topic of great interest. However, there is a potentially impactful issue within NAS that remains largely unrecognized: noise. Due to stochastic factors in neural network initialization, training, and the chosen train/validation dataset split, the performance evaluation of a neural network architecture, which is often based on a single learning run, is also stochastic. This may have a particularly large impact if a dataset is small. We therefore propose to reduce this noise by evaluating architectures based on average performance over multiple network training runs using different random seeds and cross-validation. We perform experiments for a combinatorial optimization formulation of NAS in which we vary noise reduction levels. We use the same computational budget for each noise level in terms of network training runs, i.e., we allow less architecture evaluations when averaging over more training runs. Multiple search algorithms are considered, including evolutionary algorithms which generally perform well for NAS. We use two publicly available datasets from the medical image segmentation domain where datasets are often limited and variability among samples is often high. Our results show that reducing noise in architecture evaluations enables finding better architectures by all considered search algorithms.
AB - Neural Architecture Search (NAS) has recently become a topic of great interest. However, there is a potentially impactful issue within NAS that remains largely unrecognized: noise. Due to stochastic factors in neural network initialization, training, and the chosen train/validation dataset split, the performance evaluation of a neural network architecture, which is often based on a single learning run, is also stochastic. This may have a particularly large impact if a dataset is small. We therefore propose to reduce this noise by evaluating architectures based on average performance over multiple network training runs using different random seeds and cross-validation. We perform experiments for a combinatorial optimization formulation of NAS in which we vary noise reduction levels. We use the same computational budget for each noise level in terms of network training runs, i.e., we allow less architecture evaluations when averaging over more training runs. Multiple search algorithms are considered, including evolutionary algorithms which generally perform well for NAS. We use two publicly available datasets from the medical image segmentation domain where datasets are often limited and variability among samples is often high. Our results show that reducing noise in architecture evaluations enables finding better architectures by all considered search algorithms.
KW - medical image segmentation
KW - neural architecture search
KW - noise
UR - http://www.scopus.com/inward/record.url?scp=85136337907&partnerID=8YFLogxK
U2 - 10.1145/3520304.3533995
DO - 10.1145/3520304.3533995
M3 - Conference contribution
AN - SCOPUS:85136337907
T3 - GECCO 2022 Companion - Proceedings of the 2022 Genetic and Evolutionary Computation Conference
SP - 2104
EP - 2112
BT - GECCO 2022 Companion - Proceedings of the 2022 Genetic and Evolutionary Computation Conference
PB - Association for Computing Machinery (ACM)
T2 - 2022 Genetic and Evolutionary Computation Conference, GECCO 2022
Y2 - 9 July 2022 through 13 July 2022
ER -