TY - JOUR
T1 - TEREE
T2 - Transformer-based emotion recognition using EEG and Eye movement data
AU - Esmi, Nima
AU - Shahbahrami, Asadollah
AU - Gaydadjiev, Georgi
AU - de Jonge, Peter
PY - 2025
Y1 - 2025
N2 - Multimodal AI systems increasingly rely on biomedical signals such as EEG and eye movement data for emotion recognition. However, these models face challenges including limited training data, inter-subject variability, session-specific spurious correlations, and incomplete modality representation, all of which reduce generalization and reliability. We propose TEREE, a multimodal transformer-based model that integrates temporal, spatial, and spectral EEG features with eye movement data. To mitigate session-specific artifacts, Bayesian Spurious Correlation Minimization (BSCM) is applied. In addition, a holistic multimodal processing strategy enables robust handling of incomplete data. The model was trained and evaluated using the SEED and SEED-FRA benchmark datasets under one-to-one and multi-to-one transfer paradigms. TEREE achieved state-of-the-art performance, with average multi-to-one transfer accuracies of 97.7% on SEED and 98.8% on SEED-FRA. Ablation studies confirmed that fusing EEG with eye movement features consistently improved accuracy compared to unimodal baselines. Standard deviations across repeated experiments were below 5%, indicating stability. By addressing inter-subject variability, spurious correlations, and incomplete modality issues, TEREE enhances the robustness and generalization of emotion recognition systems. These findings suggest that multimodal transformer-based models can substantially improve the reliability of affective computing applications such as human–computer interaction and mental health monitoring.
AB - Multimodal AI systems increasingly rely on biomedical signals such as EEG and eye movement data for emotion recognition. However, these models face challenges including limited training data, inter-subject variability, session-specific spurious correlations, and incomplete modality representation, all of which reduce generalization and reliability. We propose TEREE, a multimodal transformer-based model that integrates temporal, spatial, and spectral EEG features with eye movement data. To mitigate session-specific artifacts, Bayesian Spurious Correlation Minimization (BSCM) is applied. In addition, a holistic multimodal processing strategy enables robust handling of incomplete data. The model was trained and evaluated using the SEED and SEED-FRA benchmark datasets under one-to-one and multi-to-one transfer paradigms. TEREE achieved state-of-the-art performance, with average multi-to-one transfer accuracies of 97.7% on SEED and 98.8% on SEED-FRA. Ablation studies confirmed that fusing EEG with eye movement features consistently improved accuracy compared to unimodal baselines. Standard deviations across repeated experiments were below 5%, indicating stability. By addressing inter-subject variability, spurious correlations, and incomplete modality issues, TEREE enhances the robustness and generalization of emotion recognition systems. These findings suggest that multimodal transformer-based models can substantially improve the reliability of affective computing applications such as human–computer interaction and mental health monitoring.
KW - Bayesian spurious correlation minimization (BSCM)
KW - Electroencephalogram (EEG)
KW - Emotion recognition
KW - Eye Movement (EM)
KW - Multimodal transformer
UR - http://www.scopus.com/inward/record.url?scp=105020574870&partnerID=8YFLogxK
U2 - 10.1016/j.ibmed.2025.100305
DO - 10.1016/j.ibmed.2025.100305
M3 - Article
AN - SCOPUS:105020574870
SN - 2666-5212
VL - 12
JO - Intelligence-Based Medicine
JF - Intelligence-Based Medicine
M1 - 100305
ER -