TY - GEN
T1 - Multimodal fusion of body movement signals for no-audio speech detection
AU - Wang, Xinsheng
AU - Zhu, Jihua
AU - Scharenborg, Odette
PY - 2020
Y1 - 2020
N2 - No-audio Multimodal Speech Detection is one of the tasks in Media- Eval 2020, with the goal to automatically detect whether someone is speaking in social interaction on the basis of body movement signals. In this paper, a multimodal fusion method, combining signals obtained by an overhead camera and a wearable accelerometer, was proposed to determine whether someone was speaking. The proposed system directly takes the accelerometer signals as input, while using a pre-trained 3D convolutional network to extract the video features that work as input. Experiments on the No-audio Multimodal Speech Detection task show that our method outperforms all submissions of previous years.
AB - No-audio Multimodal Speech Detection is one of the tasks in Media- Eval 2020, with the goal to automatically detect whether someone is speaking in social interaction on the basis of body movement signals. In this paper, a multimodal fusion method, combining signals obtained by an overhead camera and a wearable accelerometer, was proposed to determine whether someone was speaking. The proposed system directly takes the accelerometer signals as input, while using a pre-trained 3D convolutional network to extract the video features that work as input. Experiments on the No-audio Multimodal Speech Detection task show that our method outperforms all submissions of previous years.
UR - http://www.scopus.com/inward/record.url?scp=85108080364&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85108080364
VL - 2882
T3 - CEUR Workshop Proceedings
BT - MediaEval 2020: Multimedia Benchmark Workshop 2020
A2 - Hicks , Steven
A2 - Jha , Debesh
A2 - Pogorelov, Konstantin
T2 - Multimedia Evaluation Benchmark Workshop 2020, MediaEval 2020
Y2 - 14 December 2020 through 15 December 2020
ER -