TY - JOUR
T1 - Heterogeneous ensemble enables a universal uncertainty metric for atomistic foundation models
AU - Liu, Kai
AU - Wei, Zixiong
AU - Gao, Wei
AU - Dey, Poulumi
AU - Sluiter, Marcel H.F.
AU - Shuang, Fei
PY - 2026
Y1 - 2026
N2 - Universal machine-learning interatomic potentials (uMLIPs) are emerging as foundation models for atomistic simulation, offering near-ab initio accuracy at far lower cost. Their safe, broad deployment is limited by the absence of reliable, general uncertainty estimates. We present a unified, scalable uncertainty metric, U, built from a heterogeneous ensemble that reuses existing pretrained MLIPs. Across diverse chemistries and structures, U strongly tracks true prediction errors and robustly ranks configuration-level risk. Using U, we perform uncertainty-aware distillation to train system-specific potentials with far fewer labels: for tungsten, we match full density-functional-theory (DFT) training using 4% of the DFT data; for MoNbTaW, a dataset distilled by U supports high-accuracy potential training. By filtering numerical label noise, the distilled models can in some cases exceed the accuracy of the MLIPs trained on DFT data. This framework provides a practical reliability monitor and guides data selection and fine-tuning, enabling cost-efficient, accurate, and safer deployment of foundation models.
AB - Universal machine-learning interatomic potentials (uMLIPs) are emerging as foundation models for atomistic simulation, offering near-ab initio accuracy at far lower cost. Their safe, broad deployment is limited by the absence of reliable, general uncertainty estimates. We present a unified, scalable uncertainty metric, U, built from a heterogeneous ensemble that reuses existing pretrained MLIPs. Across diverse chemistries and structures, U strongly tracks true prediction errors and robustly ranks configuration-level risk. Using U, we perform uncertainty-aware distillation to train system-specific potentials with far fewer labels: for tungsten, we match full density-functional-theory (DFT) training using 4% of the DFT data; for MoNbTaW, a dataset distilled by U supports high-accuracy potential training. By filtering numerical label noise, the distilled models can in some cases exceed the accuracy of the MLIPs trained on DFT data. This framework provides a practical reliability monitor and guides data selection and fine-tuning, enabling cost-efficient, accurate, and safer deployment of foundation models.
UR - http://www.scopus.com/inward/record.url?scp=105027889609&partnerID=8YFLogxK
U2 - 10.1038/s41524-025-01905-x
DO - 10.1038/s41524-025-01905-x
M3 - Article
AN - SCOPUS:105027889609
SN - 2057-3960
VL - 12
JO - npj Computational Materials
JF - npj Computational Materials
IS - 1
M1 - 34
ER -