Joint embedding predictive architecture for self-supervised pretraining on polymer molecular graphs

Francesco Piccoli, Gabriel Vogel, Jana M. Weber*

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

Abstract

Recent advances in machine learning (ML) have shown promise in accelerating the discovery of polymers with desired properties by aiding in tasks such as virtual screening via property prediction. However, progress in polymer ML is hampered by the scarcity of high-quality labeled datasets, which are necessary for training supervised ML models. In this work, we study the use of the very recent ‘Joint Embedding Predictive Architecture’ (JEPA), a type of architecture developed for self-supervised learning (SSL), on polymer molecular graphs to understand whether pretraining with the proposed SSL strategy improves downstream performance when labeled data is scarce. We first pretrain our polymer-JEPA model on a large dataset of conjugated copolymer photocatalysts. The pretrained model is then fine-tuned on two distinct downstream tasks: predicting electron affinity in the same chemical space and classifying phase behavior in diblock copolymers, a different chemical space. Our results indicate that JEPA-based self-supervised pretraining enhances downstream performance, particularly when labeled data is very scarce, achieving improvements across both tested datasets. The method provides performance gains in cross-domain fine-tuning, highlighting its potential to extract general knowledge across different classes of polymers. By leveraging large amounts of unlabeled polymer structures for pretraining, the proposed strategy can further reduce the dependence on extensive labeled datasets.

Original languageEnglish
Number of pages16
JournalDigital Discovery
DOIs
Publication statusPublished - 2026

Fingerprint

Dive into the research topics of 'Joint embedding predictive architecture for self-supervised pretraining on polymer molecular graphs'. Together they form a unique fingerprint.

Cite this