Obey validity limits of data-driven models through topological data analysis and one-class classification

Artur M. Schweidtmann; Jana M. Weber; Christian Wende; Linus Netze; Alexander Mitsos

doi:10.1007/s11081-021-09608-0

Obey validity limits of data-driven models through topological data analysis and one-class classification

Artur M. Schweidtmann^*, Jana M. Weber, Christian Wende, Linus Netze, Alexander Mitsos

^*Corresponding author for this work

ChemE/Product and Process Engineering

Research output: Contribution to journal › Article › Scientific › peer-review

22 Citations (Scopus)

59 Downloads (Pure)

Abstract

Data-driven models are becoming increasingly popular in engineering, on their own or in combination with mechanistic models. Commonly, the trained models are subsequently used in model-based optimization of design and/or operation of processes. Thus, it is critical to ensure that data-driven models are not evaluated outside their validity domain during process optimization. We propose a method to learn this validity domain and encode it as constraints in process optimization. We first perform a topological data analysis using persistent homology identifying potential holes or separated clusters in the training data. In case clusters or holes are identified, we train a one-class classifier, i.e., a one-class support vector machine, on the training data domain and encode it as constraints in the subsequent process optimization. Otherwise, we construct the convex hull of the data and encode it as constraints. We finally perform deterministic global process optimization with the data-driven models subject to their respective validity constraints. To ensure computational tractability, we develop a reduced-space formulation for trained one-class support vector machines and show that our formulation outperforms common full-space formulations by a factor of over 3000, making it a viable tool for engineering applications. The method is ready-to-use and available open-source as part of our MeLOn toolbox (https://git.rwth-aachen.de/avt.svt/public/MeLOn).

Original language	English
Pages (from-to)	855-876
Number of pages	22
Journal	Optimization and Engineering
Volume	23
Issue number	2
DOIs	https://doi.org/10.1007/s11081-021-09608-0
Publication status	Published - 2021

Keywords

Deterministic global optimization
Machine-learning
One-class support vector machine
Persistent homology
Topological data analysis

Access to Document

10.1007/s11081-021-09608-0

Schweidtmann2021_Article_ObeyValidityLimitsOfData-driveFinal published version, 2.25 MBLicence: CC BY

Cite this

@article{ad8a12b24d35473b91a1fbcb61269475,

title = "Obey validity limits of data-driven models through topological data analysis and one-class classification",

abstract = "Data-driven models are becoming increasingly popular in engineering, on their own or in combination with mechanistic models. Commonly, the trained models are subsequently used in model-based optimization of design and/or operation of processes. Thus, it is critical to ensure that data-driven models are not evaluated outside their validity domain during process optimization. We propose a method to learn this validity domain and encode it as constraints in process optimization. We first perform a topological data analysis using persistent homology identifying potential holes or separated clusters in the training data. In case clusters or holes are identified, we train a one-class classifier, i.e., a one-class support vector machine, on the training data domain and encode it as constraints in the subsequent process optimization. Otherwise, we construct the convex hull of the data and encode it as constraints. We finally perform deterministic global process optimization with the data-driven models subject to their respective validity constraints. To ensure computational tractability, we develop a reduced-space formulation for trained one-class support vector machines and show that our formulation outperforms common full-space formulations by a factor of over 3000, making it a viable tool for engineering applications. The method is ready-to-use and available open-source as part of our MeLOn toolbox (https://git.rwth-aachen.de/avt.svt/public/MeLOn).",

keywords = "Deterministic global optimization, Machine-learning, One-class support vector machine, Persistent homology, Topological data analysis",

author = "Schweidtmann, {Artur M.} and Weber, {Jana M.} and Christian Wende and Linus Netze and Alexander Mitsos",

year = "2021",

doi = "10.1007/s11081-021-09608-0",

language = "English",

volume = "23",

pages = "855--876",

journal = "Optimization and Engineering",

issn = "1389-4420",

publisher = "Springer",

number = "2",

}

TY - JOUR

T1 - Obey validity limits of data-driven models through topological data analysis and one-class classification

AU - Schweidtmann, Artur M.

AU - Weber, Jana M.

AU - Wende, Christian

AU - Netze, Linus

AU - Mitsos, Alexander

PY - 2021

Y1 - 2021

N2 - Data-driven models are becoming increasingly popular in engineering, on their own or in combination with mechanistic models. Commonly, the trained models are subsequently used in model-based optimization of design and/or operation of processes. Thus, it is critical to ensure that data-driven models are not evaluated outside their validity domain during process optimization. We propose a method to learn this validity domain and encode it as constraints in process optimization. We first perform a topological data analysis using persistent homology identifying potential holes or separated clusters in the training data. In case clusters or holes are identified, we train a one-class classifier, i.e., a one-class support vector machine, on the training data domain and encode it as constraints in the subsequent process optimization. Otherwise, we construct the convex hull of the data and encode it as constraints. We finally perform deterministic global process optimization with the data-driven models subject to their respective validity constraints. To ensure computational tractability, we develop a reduced-space formulation for trained one-class support vector machines and show that our formulation outperforms common full-space formulations by a factor of over 3000, making it a viable tool for engineering applications. The method is ready-to-use and available open-source as part of our MeLOn toolbox (https://git.rwth-aachen.de/avt.svt/public/MeLOn).

AB - Data-driven models are becoming increasingly popular in engineering, on their own or in combination with mechanistic models. Commonly, the trained models are subsequently used in model-based optimization of design and/or operation of processes. Thus, it is critical to ensure that data-driven models are not evaluated outside their validity domain during process optimization. We propose a method to learn this validity domain and encode it as constraints in process optimization. We first perform a topological data analysis using persistent homology identifying potential holes or separated clusters in the training data. In case clusters or holes are identified, we train a one-class classifier, i.e., a one-class support vector machine, on the training data domain and encode it as constraints in the subsequent process optimization. Otherwise, we construct the convex hull of the data and encode it as constraints. We finally perform deterministic global process optimization with the data-driven models subject to their respective validity constraints. To ensure computational tractability, we develop a reduced-space formulation for trained one-class support vector machines and show that our formulation outperforms common full-space formulations by a factor of over 3000, making it a viable tool for engineering applications. The method is ready-to-use and available open-source as part of our MeLOn toolbox (https://git.rwth-aachen.de/avt.svt/public/MeLOn).

KW - Deterministic global optimization

KW - Machine-learning

KW - One-class support vector machine

KW - Persistent homology

KW - Topological data analysis

UR - http://www.scopus.com/inward/record.url?scp=85105868106&partnerID=8YFLogxK

U2 - 10.1007/s11081-021-09608-0

DO - 10.1007/s11081-021-09608-0

M3 - Article

AN - SCOPUS:85105868106

SN - 1389-4420

VL - 23

SP - 855

EP - 876

JO - Optimization and Engineering

JF - Optimization and Engineering

IS - 2

ER -

Obey validity limits of data-driven models through topological data analysis and one-class classification

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this