TY - GEN
T1 - LLM-Based Evaluation Methodology of Explanation Strategies
AU - Soyarar, Ege
AU - Aydogan, Reyhan
AU - Buzcu, Berk
AU - Calvaresi, Davide
N1 - Green Open Access added to TU Delft Institutional Repository as part of the Taverne amendment. More information about this copyright law amendment can be found at https://www.openaccess.nl. Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.
PY - 2026
Y1 - 2026
N2 - As data privacy regulations, such as the EU AI Act and EU Data Act, become increasingly stringent, processing real user data for AI models like movie recommendation systems has grown more challenging. Moreover, the human-centric data collection and evaluation of Explainable AI (XAI) systems are often costly and time-consuming; making it hard to sustain. Hence, this study adopts the Synthetic Behavior Generation (SBG) approach, leveraging large language models (LLMs) to evaluate AI explanations while ensuring compliance with regulations and providing cost-effective solutions for human feedback. To assess the quality of these explanations, we utilize three different LLMs, which are fed synthetically generated user behaviors to evaluate explanations of an AI system as if they were real users. The evaluation focuses on key criteria such as convincingness, clarity, accuracy, and the impact on decision-making, facilitating a thorough assessment of explanation effectiveness. The results indicated that LLMs can deliver structured and consistent evaluations based on the provided synthetic user behavior.
AB - As data privacy regulations, such as the EU AI Act and EU Data Act, become increasingly stringent, processing real user data for AI models like movie recommendation systems has grown more challenging. Moreover, the human-centric data collection and evaluation of Explainable AI (XAI) systems are often costly and time-consuming; making it hard to sustain. Hence, this study adopts the Synthetic Behavior Generation (SBG) approach, leveraging large language models (LLMs) to evaluate AI explanations while ensuring compliance with regulations and providing cost-effective solutions for human feedback. To assess the quality of these explanations, we utilize three different LLMs, which are fed synthetically generated user behaviors to evaluate explanations of an AI system as if they were real users. The evaluation focuses on key criteria such as convincingness, clarity, accuracy, and the impact on decision-making, facilitating a thorough assessment of explanation effectiveness. The results indicated that LLMs can deliver structured and consistent evaluations based on the provided synthetic user behavior.
KW - Explainable AI (XAI)
KW - Explanation Evaluation
KW - Large Language Models (LLMs)
KW - Recommender Systems
KW - Synthetic Data Generation
UR - http://www.scopus.com/inward/record.url?scp=105020015162&partnerID=8YFLogxK
U2 - 10.1007/978-3-032-01399-6_6
DO - 10.1007/978-3-032-01399-6_6
M3 - Conference contribution
AN - SCOPUS:105020015162
SN - 9783032013989
T3 - Lecture Notes in Computer Science
SP - 85
EP - 103
BT - Explainable, Trustworthy, and Responsible AI and Multi-Agent Systems - 7th International Workshop, EXTRAAMAS 2025, Revised Selected Papers
A2 - Calvaresi, Davide
A2 - Najjar, Amro
A2 - Omicini, Andrea
A2 - Ciatto, Giovanni
A2 - Aydogan, Reyhan
A2 - Carli, Rachele
A2 - Främling, Kary
A2 - Tiribelli, Simona
PB - Springer
T2 - 7th International Workshop on Explainable, Transparent Autonomous Agents and Multi-Agent Systems, EXTRAAMAS 2025
Y2 - 19 May 2025 through 20 May 2025
ER -