Improving Model-Based Genetic Programming for Symbolic Regression of Small Expressions

M. Virgolin; T. Alderliesten; C. Witteveen; P. A.N. Bosman

doi:10.1162/evco_a_00278

Improving Model-Based Genetic Programming for Symbolic Regression of Small Expressions

M. Virgolin, T. Alderliesten, C. Witteveen, P. A.N. Bosman

Algorithmics

Research output: Contribution to journal › Article › Scientific › peer-review

32 Citations (Scopus)

Abstract

The Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) is a model-based EA framework that has been shown to perform well in several domains, including Genetic Programming (GP). Differently from traditional EAs where variation acts blindly, GOMEA learns a model of interdependencies within the genotype, that is, the linkage, to estimate what patterns to propagate. In this article, we study the role of Linkage Learning (LL) performed by GOMEA in Symbolic Regression (SR). We show that the non-uniformity in the distribution of the genotype in GP populations negatively biases LL, and propose a method to correct for this. We also propose approaches to improve LL when ephemeral random constants are used. Furthermore, we adapt a scheme of interleaving runs to alleviate the burden of tuning the population size, a crucial parameter for LL, to SR. We run experiments on 10 real-world datasets, enforcing a strict limitation on solution size, to enable interpretability. We find that the new LL method outperforms the standard one, and that GOMEA outperforms both traditional and semantic GP. We also find that the small solutions evolved by GOMEA are competitive with tuned decision trees, making GOMEA a promising new approach to SR.

Original language	English
Pages (from-to)	211-237
Number of pages	27
Journal	Evolutionary computation
Volume	29
Issue number	2
DOIs	https://doi.org/10.1162/evco_a_00278
Publication status	Published - 2021

Keywords

Genetic programming
GOMEA
interpretability.
linkage
machine learning
symbolic regression

Access to Document

10.1162/evco_a_00278

Cite this

@article{bd610945ac654991a367c46ae5b9e4a8,

title = "Improving Model-Based Genetic Programming for Symbolic Regression of Small Expressions",

abstract = "The Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) is a model-based EA framework that has been shown to perform well in several domains, including Genetic Programming (GP). Differently from traditional EAs where variation acts blindly, GOMEA learns a model of interdependencies within the genotype, that is, the linkage, to estimate what patterns to propagate. In this article, we study the role of Linkage Learning (LL) performed by GOMEA in Symbolic Regression (SR). We show that the non-uniformity in the distribution of the genotype in GP populations negatively biases LL, and propose a method to correct for this. We also propose approaches to improve LL when ephemeral random constants are used. Furthermore, we adapt a scheme of interleaving runs to alleviate the burden of tuning the population size, a crucial parameter for LL, to SR. We run experiments on 10 real-world datasets, enforcing a strict limitation on solution size, to enable interpretability. We find that the new LL method outperforms the standard one, and that GOMEA outperforms both traditional and semantic GP. We also find that the small solutions evolved by GOMEA are competitive with tuned decision trees, making GOMEA a promising new approach to SR.",

keywords = "Genetic programming, GOMEA, interpretability., linkage, machine learning, symbolic regression",

author = "M. Virgolin and T. Alderliesten and C. Witteveen and Bosman, {P. A.N.}",

year = "2021",

doi = "10.1162/evco_a_00278",

language = "English",

volume = "29",

pages = "211--237",

journal = "Evolutionary computation",

issn = "1530-9304",

publisher = "MIT Press Journals",

number = "2",

}

TY - JOUR

T1 - Improving Model-Based Genetic Programming for Symbolic Regression of Small Expressions

AU - Virgolin, M.

AU - Alderliesten, T.

AU - Witteveen, C.

AU - Bosman, P. A.N.

PY - 2021

Y1 - 2021

N2 - The Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) is a model-based EA framework that has been shown to perform well in several domains, including Genetic Programming (GP). Differently from traditional EAs where variation acts blindly, GOMEA learns a model of interdependencies within the genotype, that is, the linkage, to estimate what patterns to propagate. In this article, we study the role of Linkage Learning (LL) performed by GOMEA in Symbolic Regression (SR). We show that the non-uniformity in the distribution of the genotype in GP populations negatively biases LL, and propose a method to correct for this. We also propose approaches to improve LL when ephemeral random constants are used. Furthermore, we adapt a scheme of interleaving runs to alleviate the burden of tuning the population size, a crucial parameter for LL, to SR. We run experiments on 10 real-world datasets, enforcing a strict limitation on solution size, to enable interpretability. We find that the new LL method outperforms the standard one, and that GOMEA outperforms both traditional and semantic GP. We also find that the small solutions evolved by GOMEA are competitive with tuned decision trees, making GOMEA a promising new approach to SR.

AB - The Gene-pool Optimal Mixing Evolutionary Algorithm (GOMEA) is a model-based EA framework that has been shown to perform well in several domains, including Genetic Programming (GP). Differently from traditional EAs where variation acts blindly, GOMEA learns a model of interdependencies within the genotype, that is, the linkage, to estimate what patterns to propagate. In this article, we study the role of Linkage Learning (LL) performed by GOMEA in Symbolic Regression (SR). We show that the non-uniformity in the distribution of the genotype in GP populations negatively biases LL, and propose a method to correct for this. We also propose approaches to improve LL when ephemeral random constants are used. Furthermore, we adapt a scheme of interleaving runs to alleviate the burden of tuning the population size, a crucial parameter for LL, to SR. We run experiments on 10 real-world datasets, enforcing a strict limitation on solution size, to enable interpretability. We find that the new LL method outperforms the standard one, and that GOMEA outperforms both traditional and semantic GP. We also find that the small solutions evolved by GOMEA are competitive with tuned decision trees, making GOMEA a promising new approach to SR.

KW - Genetic programming

KW - GOMEA

KW - interpretability.

KW - linkage

KW - machine learning

KW - symbolic regression

UR - http://www.scopus.com/inward/record.url?scp=85116173154&partnerID=8YFLogxK

U2 - 10.1162/evco_a_00278

DO - 10.1162/evco_a_00278

M3 - Article

C2 - 32574084

AN - SCOPUS:85116173154

SN - 1530-9304

VL - 29

SP - 211

EP - 237

JO - Evolutionary computation

JF - Evolutionary computation

IS - 2

ER -

Improving Model-Based Genetic Programming for Symbolic Regression of Small Expressions

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this