On explaining machine learning models by evolving crucial and compact features

Marco Virgolin; Tanja Alderliesten; Peter A.N. Bosman

doi:10.1016/j.swevo.2019.100640

On explaining machine learning models by evolving crucial and compact features

Marco Virgolin^*, Tanja Alderliesten, Peter A.N. Bosman

^*Corresponding author for this work

Algorithmics

Research output: Contribution to journal › Article › Scientific › peer-review

17 Citations (Scopus)

Abstract

Feature construction can substantially improve the accuracy of Machine Learning (ML) algorithms. Genetic Programming (GP) has been proven to be effective at this task by evolving non-linear combinations of input features. GP additionally has the potential to improve ML explainability since explicit expressions are evolved. Yet, in most GP works the complexity of evolved features is not explicitly bound or minimized though this is arguably key for explainability. In this article, we assess to what extent GP still performs favorably at feature construction when constructing features that are (1) Of small-enough number, to enable visualization of the behavior of the ML model; (2) Of small-enough size, to enable interpretability of the features themselves; (3) Of sufficient informative power, to retain or even improve the performance of the ML algorithm. We consider a simple feature construction scheme using three different GP algorithms, as well as random search, to evolve features for five ML algorithms, including support vector machines and random forest. Our results on 21 datasets pertaining to classification and regression problems show that constructing only two compact features can be sufficient to rival the use of the entire original feature set. We further find that a modern GP algorithm, GP-GOMEA, performs best overall. These results, combined with examples that we provide of readable constructed features and of 2D visualizations of ML behavior, lead us to positively conclude that GP-based feature construction still works well when explicitly searching for compact features, making it extremely helpful to explain ML models.

Original language	English
Article number	100640
Pages (from-to)	1-13
Number of pages	13
Journal	Swarm and Evolutionary Computation
Volume	53
DOIs	https://doi.org/10.1016/j.swevo.2019.100640
Publication status	Published - 2020

Keywords

Feature construction
Genetic programming
GOMEA
Interpretable machine learning

Access to Document

10.1016/j.swevo.2019.100640

Cite this

@article{6a354e4b51624da29f753ddc04035296,

title = "On explaining machine learning models by evolving crucial and compact features",

abstract = "Feature construction can substantially improve the accuracy of Machine Learning (ML) algorithms. Genetic Programming (GP) has been proven to be effective at this task by evolving non-linear combinations of input features. GP additionally has the potential to improve ML explainability since explicit expressions are evolved. Yet, in most GP works the complexity of evolved features is not explicitly bound or minimized though this is arguably key for explainability. In this article, we assess to what extent GP still performs favorably at feature construction when constructing features that are (1) Of small-enough number, to enable visualization of the behavior of the ML model; (2) Of small-enough size, to enable interpretability of the features themselves; (3) Of sufficient informative power, to retain or even improve the performance of the ML algorithm. We consider a simple feature construction scheme using three different GP algorithms, as well as random search, to evolve features for five ML algorithms, including support vector machines and random forest. Our results on 21 datasets pertaining to classification and regression problems show that constructing only two compact features can be sufficient to rival the use of the entire original feature set. We further find that a modern GP algorithm, GP-GOMEA, performs best overall. These results, combined with examples that we provide of readable constructed features and of 2D visualizations of ML behavior, lead us to positively conclude that GP-based feature construction still works well when explicitly searching for compact features, making it extremely helpful to explain ML models.",

keywords = "Feature construction, Genetic programming, GOMEA, Interpretable machine learning",

author = "Marco Virgolin and Tanja Alderliesten and Bosman, {Peter A.N.}",

year = "2020",

doi = "10.1016/j.swevo.2019.100640",

language = "English",

volume = "53",

pages = "1--13",

journal = "Swarm and Evolutionary Computation",

issn = "2210-6502",

publisher = "Elsevier",

}

TY - JOUR

T1 - On explaining machine learning models by evolving crucial and compact features

AU - Virgolin, Marco

AU - Alderliesten, Tanja

AU - Bosman, Peter A.N.

PY - 2020

Y1 - 2020

N2 - Feature construction can substantially improve the accuracy of Machine Learning (ML) algorithms. Genetic Programming (GP) has been proven to be effective at this task by evolving non-linear combinations of input features. GP additionally has the potential to improve ML explainability since explicit expressions are evolved. Yet, in most GP works the complexity of evolved features is not explicitly bound or minimized though this is arguably key for explainability. In this article, we assess to what extent GP still performs favorably at feature construction when constructing features that are (1) Of small-enough number, to enable visualization of the behavior of the ML model; (2) Of small-enough size, to enable interpretability of the features themselves; (3) Of sufficient informative power, to retain or even improve the performance of the ML algorithm. We consider a simple feature construction scheme using three different GP algorithms, as well as random search, to evolve features for five ML algorithms, including support vector machines and random forest. Our results on 21 datasets pertaining to classification and regression problems show that constructing only two compact features can be sufficient to rival the use of the entire original feature set. We further find that a modern GP algorithm, GP-GOMEA, performs best overall. These results, combined with examples that we provide of readable constructed features and of 2D visualizations of ML behavior, lead us to positively conclude that GP-based feature construction still works well when explicitly searching for compact features, making it extremely helpful to explain ML models.

AB - Feature construction can substantially improve the accuracy of Machine Learning (ML) algorithms. Genetic Programming (GP) has been proven to be effective at this task by evolving non-linear combinations of input features. GP additionally has the potential to improve ML explainability since explicit expressions are evolved. Yet, in most GP works the complexity of evolved features is not explicitly bound or minimized though this is arguably key for explainability. In this article, we assess to what extent GP still performs favorably at feature construction when constructing features that are (1) Of small-enough number, to enable visualization of the behavior of the ML model; (2) Of small-enough size, to enable interpretability of the features themselves; (3) Of sufficient informative power, to retain or even improve the performance of the ML algorithm. We consider a simple feature construction scheme using three different GP algorithms, as well as random search, to evolve features for five ML algorithms, including support vector machines and random forest. Our results on 21 datasets pertaining to classification and regression problems show that constructing only two compact features can be sufficient to rival the use of the entire original feature set. We further find that a modern GP algorithm, GP-GOMEA, performs best overall. These results, combined with examples that we provide of readable constructed features and of 2D visualizations of ML behavior, lead us to positively conclude that GP-based feature construction still works well when explicitly searching for compact features, making it extremely helpful to explain ML models.

KW - Feature construction

KW - Genetic programming

KW - GOMEA

KW - Interpretable machine learning

UR - http://www.scopus.com/inward/record.url?scp=85077510080&partnerID=8YFLogxK

U2 - 10.1016/j.swevo.2019.100640

DO - 10.1016/j.swevo.2019.100640

M3 - Article

AN - SCOPUS:85077510080

SN - 2210-6502

VL - 53

SP - 1

EP - 13

JO - Swarm and Evolutionary Computation

JF - Swarm and Evolutionary Computation

M1 - 100640

ER -

On explaining machine learning models by evolving crucial and compact features

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this