TY - GEN
T1 - Mini-Batching, Gradient-Clipping, First-versus Second-Order
T2 - 2023 Genetic and Evolutionary Computation Conference, GECCO 2023
AU - Harrison, Joe
AU - Virgolin, Marco
AU - Alderliesten, Tanja
AU - Bosman, Peter
PY - 2023
Y1 - 2023
N2 - The aim of Symbolic Regression (SR) is to discover interpretable expressions that accurately describe data. The accuracy of an expression depends on both its structure and coefficients. To keep the structure simple enough to be interpretable, effective coefficient optimisation becomes key. Gradient-based optimisation is clearly effective at training neural networks in Deep Learning (DL), which can essentially be viewed as large, over-parameterised expressions: in this paper, we study how gradient-based optimisation techniques as often used in DL transfer to SR. In particular, we first assess what techniques work well across random SR expressions, independent of any specific SR algorithm. We find that mini-batching and gradient-clipping can be helpful (similar to DL), while second-order optimisers outperform first-order ones (different from DL). Next, we consider whether including gradient-based optimisation in Genetic Programming (GP), a classic SR algorithm, is beneficial. On five real-world datasets, in a generation-based comparison, we find that second-order optimisation outperforms coefficient mutation (or no optimisation). However, in time-based comparisons, performance gaps shrink substantially because the computational expensiveness of second-order optimisation causes GP to perform fewer generations. The interplay of computational costs between the optimisation of structure and coefficients is thus a critical aspect to consider.
AB - The aim of Symbolic Regression (SR) is to discover interpretable expressions that accurately describe data. The accuracy of an expression depends on both its structure and coefficients. To keep the structure simple enough to be interpretable, effective coefficient optimisation becomes key. Gradient-based optimisation is clearly effective at training neural networks in Deep Learning (DL), which can essentially be viewed as large, over-parameterised expressions: in this paper, we study how gradient-based optimisation techniques as often used in DL transfer to SR. In particular, we first assess what techniques work well across random SR expressions, independent of any specific SR algorithm. We find that mini-batching and gradient-clipping can be helpful (similar to DL), while second-order optimisers outperform first-order ones (different from DL). Next, we consider whether including gradient-based optimisation in Genetic Programming (GP), a classic SR algorithm, is beneficial. On five real-world datasets, in a generation-based comparison, we find that second-order optimisation outperforms coefficient mutation (or no optimisation). However, in time-based comparisons, performance gaps shrink substantially because the computational expensiveness of second-order optimisation causes GP to perform fewer generations. The interplay of computational costs between the optimisation of structure and coefficients is thus a critical aspect to consider.
KW - coefficient optimisation
KW - explainable AI
KW - genetic programming
KW - gradient descent
KW - symbolic regression
UR - http://www.scopus.com/inward/record.url?scp=85167713712&partnerID=8YFLogxK
U2 - 10.1145/3583131.3590368
DO - 10.1145/3583131.3590368
M3 - Conference contribution
AN - SCOPUS:85167713712
T3 - GECCO 2023 - Proceedings of the 2023 Genetic and Evolutionary Computation Conference
SP - 1127
EP - 1136
BT - GECCO 2023 - Proceedings of the 2023 Genetic and Evolutionary Computation Conference
PB - Association for Computing Machinery (ACM)
Y2 - 15 July 2023 through 19 July 2023
ER -