Impact of Annotation Modality on Label Quality and Model Performance in the Automatic Assessment of Laughter In-the-Wild

Jose Vargas-Quiros; Laura Cabrera-Quiros; Catharine Oertel; Hayley Hung

doi:10.1109/TAFFC.2023.3269003

Impact of Annotation Modality on Label Quality and Model Performance in the Automatic Assessment of Laughter In-the-Wild

Jose Vargas-Quiros, Laura Cabrera-Quiros, Catharine Oertel, Hayley Hung

Research output: Contribution to journal › Article › Scientific › peer-review

Abstract

Although laughter is known to be a multimodal signal, it is primarily annotated from audio. It is unclear how laughter labels may differ when annotated from modalities like video, which capture body movements and are relevant in in-the-wild studies. In this work we ask whether annotations of laughter are congruent across modalities, and compare the effect that labeling modality has on machine learning model performance. We compare annotations and models for laughter detection, intensity estimation, and segmentation, using a challenging in-the-wild conversational dataset with a variety of camera angles, noise conditions and voices. Our study with 48 annotators revealed evidence for incongruity in the perception of laughter and its intensity between modalities, mainly due to lower recall in the video condition. Our machine learning experiments compared the performance of modern unimodal and multi-modal models for different combinations of input modalities, training, and testing label modalities. In addition to the same input modalities rated by annotators (audio and video), we trained models with body acceleration inputs, robust to cross-contamination, occlusion and perspective differences. Our results show that performance of models with body movement inputs does not suffer when trained with video-acquired labels, despite their lower inter-rater agreement.

Original language	English
Pages (from-to)	1-17
Number of pages	17
Journal	IEEE Transactions on Affective Computing
DOIs	https://doi.org/10.1109/TAFFC.2023.3269003
Publication status	E-pub ahead of print - 2023

Keywords

Action recognition
annotation
Annotations
Cameras
continuous annotation
Face recognition
Labeling
laughter
laughter detection
laughter intensity
Machine learning
mingling datasets
Physiology
Task analysis

Access to Document

10.1109/TAFFC.2023.3269003

Cite this

@article{80de5c168bd24a54a823e1032f4496e1,

title = "Impact of Annotation Modality on Label Quality and Model Performance in the Automatic Assessment of Laughter In-the-Wild",

abstract = "Although laughter is known to be a multimodal signal, it is primarily annotated from audio. It is unclear how laughter labels may differ when annotated from modalities like video, which capture body movements and are relevant in in-the-wild studies. In this work we ask whether annotations of laughter are congruent across modalities, and compare the effect that labeling modality has on machine learning model performance. We compare annotations and models for laughter detection, intensity estimation, and segmentation, using a challenging in-the-wild conversational dataset with a variety of camera angles, noise conditions and voices. Our study with 48 annotators revealed evidence for incongruity in the perception of laughter and its intensity between modalities, mainly due to lower recall in the video condition. Our machine learning experiments compared the performance of modern unimodal and multi-modal models for different combinations of input modalities, training, and testing label modalities. In addition to the same input modalities rated by annotators (audio and video), we trained models with body acceleration inputs, robust to cross-contamination, occlusion and perspective differences. Our results show that performance of models with body movement inputs does not suffer when trained with video-acquired labels, despite their lower inter-rater agreement.",

keywords = "Action recognition, annotation, Annotations, Cameras, continuous annotation, Face recognition, Labeling, laughter, laughter detection, laughter intensity, Machine learning, mingling datasets, Physiology, Task analysis",

author = "Jose Vargas-Quiros and Laura Cabrera-Quiros and Catharine Oertel and Hayley Hung",

year = "2023",

doi = "10.1109/TAFFC.2023.3269003",

language = "English",

pages = "1--17",

journal = "IEEE Transactions on Affective Computing",

issn = "1949-3045",

publisher = "Institute of Electrical and Electronics Engineers (IEEE)",

}

TY - JOUR

T1 - Impact of Annotation Modality on Label Quality and Model Performance in the Automatic Assessment of Laughter In-the-Wild

AU - Vargas-Quiros, Jose

AU - Cabrera-Quiros, Laura

AU - Oertel, Catharine

AU - Hung, Hayley

PY - 2023

Y1 - 2023

N2 - Although laughter is known to be a multimodal signal, it is primarily annotated from audio. It is unclear how laughter labels may differ when annotated from modalities like video, which capture body movements and are relevant in in-the-wild studies. In this work we ask whether annotations of laughter are congruent across modalities, and compare the effect that labeling modality has on machine learning model performance. We compare annotations and models for laughter detection, intensity estimation, and segmentation, using a challenging in-the-wild conversational dataset with a variety of camera angles, noise conditions and voices. Our study with 48 annotators revealed evidence for incongruity in the perception of laughter and its intensity between modalities, mainly due to lower recall in the video condition. Our machine learning experiments compared the performance of modern unimodal and multi-modal models for different combinations of input modalities, training, and testing label modalities. In addition to the same input modalities rated by annotators (audio and video), we trained models with body acceleration inputs, robust to cross-contamination, occlusion and perspective differences. Our results show that performance of models with body movement inputs does not suffer when trained with video-acquired labels, despite their lower inter-rater agreement.

AB - Although laughter is known to be a multimodal signal, it is primarily annotated from audio. It is unclear how laughter labels may differ when annotated from modalities like video, which capture body movements and are relevant in in-the-wild studies. In this work we ask whether annotations of laughter are congruent across modalities, and compare the effect that labeling modality has on machine learning model performance. We compare annotations and models for laughter detection, intensity estimation, and segmentation, using a challenging in-the-wild conversational dataset with a variety of camera angles, noise conditions and voices. Our study with 48 annotators revealed evidence for incongruity in the perception of laughter and its intensity between modalities, mainly due to lower recall in the video condition. Our machine learning experiments compared the performance of modern unimodal and multi-modal models for different combinations of input modalities, training, and testing label modalities. In addition to the same input modalities rated by annotators (audio and video), we trained models with body acceleration inputs, robust to cross-contamination, occlusion and perspective differences. Our results show that performance of models with body movement inputs does not suffer when trained with video-acquired labels, despite their lower inter-rater agreement.

KW - Action recognition

KW - annotation

KW - Annotations

KW - Cameras

KW - continuous annotation

KW - Face recognition

KW - Labeling

KW - laughter

KW - laughter detection

KW - laughter intensity

KW - Machine learning

KW - mingling datasets

KW - Physiology

KW - Task analysis

UR - http://www.scopus.com/inward/record.url?scp=85161034458&partnerID=8YFLogxK

U2 - 10.1109/TAFFC.2023.3269003

DO - 10.1109/TAFFC.2023.3269003

M3 - Article

AN - SCOPUS:85161034458

SN - 1949-3045

SP - 1

EP - 17

JO - IEEE Transactions on Affective Computing

JF - IEEE Transactions on Affective Computing

ER -

Impact of Annotation Modality on Label Quality and Model Performance in the Automatic Assessment of Laughter In-the-Wild

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this