TY - GEN
T1 - Multimodal Quantitative Measures for Multiparty Behavior Evaluation
AU - Shirekar, Ojas
AU - Pouw, Wim
AU - Hao, Chenxu
AU - Phadnis, Vrushank
AU - Beeler, Thabo
AU - Raman, Chirag
PY - 2025
Y1 - 2025
N2 - Digital humans are emerging as autonomous agents in multiparty interactions, yet existing evaluation metrics largely ignore contextual coordination dynamics. We introduce a unified, intervention-driven framework for objective assessment of multiparty social behaviour in skeletal motion data, spanning three complementary dimensions: (1) synchrony via Cross-Recurrence Quantification Analysis, (2) temporal alignment via Multiscale Empirical Mode Decomposition-based Beat Consistency, and (3) structural similarity via Soft Dynamic Time Warping. We validate metric sensitivity through three theory-driven perturbations - gesture kinematic dampening, uniform speech-gesture delays, and prosodic pitch-variance reduction - applied to ≈ 145 30-second thin slices of group interactions from the DnD dataset. Mixed-effects analyses reveal predictable, joint-independent shifts: dampening increases CRQA determinism and reduces beat consistency, delays weaken cross-participant coupling, and pitch flattening elevates F0 Soft-DTW costs. A complementary perception study (N = 27) compares judgments of full-video and skeleton-only renderings to quantify representation effects. Our three measures deliver orthogonal insights into spatial structure, timing alignment, and behavioural variability. Thereby forming a robust toolkit for evaluating and refining socially intelligent agents. Code available on GitHub.
AB - Digital humans are emerging as autonomous agents in multiparty interactions, yet existing evaluation metrics largely ignore contextual coordination dynamics. We introduce a unified, intervention-driven framework for objective assessment of multiparty social behaviour in skeletal motion data, spanning three complementary dimensions: (1) synchrony via Cross-Recurrence Quantification Analysis, (2) temporal alignment via Multiscale Empirical Mode Decomposition-based Beat Consistency, and (3) structural similarity via Soft Dynamic Time Warping. We validate metric sensitivity through three theory-driven perturbations - gesture kinematic dampening, uniform speech-gesture delays, and prosodic pitch-variance reduction - applied to ≈ 145 30-second thin slices of group interactions from the DnD dataset. Mixed-effects analyses reveal predictable, joint-independent shifts: dampening increases CRQA determinism and reduces beat consistency, delays weaken cross-participant coupling, and pitch flattening elevates F0 Soft-DTW costs. A complementary perception study (N = 27) compares judgments of full-video and skeleton-only renderings to quantify representation effects. Our three measures deliver orthogonal insights into spatial structure, timing alignment, and behavioural variability. Thereby forming a robust toolkit for evaluating and refining socially intelligent agents. Code available on GitHub.
KW - interpersonal synchrony; coordination; RQA; EMD; SoftDTW; human perception; social computing
UR - http://www.scopus.com/inward/record.url?scp=105022280720&partnerID=8YFLogxK
U2 - 10.1145/3716553.3750752
DO - 10.1145/3716553.3750752
M3 - Conference contribution
AN - SCOPUS:105022280720
T3 - ICMI 2025 - Proceedings of the 27th International Conference on Multimodal Interaction
SP - 249
EP - 264
BT - ICMI 2025 - Proceedings of the 27th International Conference on Multimodal Interaction
A2 - Subramanian, Ram
A2 - Nakano, Yukiko I.
A2 - Gedeon, Tom
A2 - Kankanhalli, Mohan
A2 - Guha, Tanaya
A2 - Shukla, Jainendra
A2 - Mohammadi, Gelareh
A2 - Celiktutan, Oya
PB - Association for Computing Machinery (ACM)
T2 - 27th International Conference on Multimodal Interaction, ICMI 2025
Y2 - 13 October 2025 through 17 October 2025
ER -