Flowsheets are the most important building blocks to define and communicate the structure of chemical processes. Gaining access to large data sets of machine-readable chemical flowsheets could significantly enhance process synthesis through artificial intelligence. A large number of these flowsheets are publicly available in the scientific literature and patents but hidden among innumerable other figures. Therefore, an automatic program is needed to recognize flowsheets. In this paper, we present a deep convolutional neural network (CNN) that can identify flowsheets within images from literature. We use a transfer learning approach to initialize the CNN's parameter. The CNN reaches an accuracy of 97.9% on an independent test set. The presented algorithm can be combined with publication mining algorithms to enable an autonomous flowsheet mining. This will eventually result in big chemical process databases.
|Title of host publication||Computer Aided Chemical Engineering|
|Publication status||Published - 2022|
|Name||Computer Aided Chemical Engineering|
Bibliographical noteGreen Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.
- Data Mining
- Deep Learning
- Image Classification
- Transfer Learning