Exploiting scene maps and spatial relationships in quasi-static scenes for video face clustering

Alessio Bazzica; Cynthia C.S. Liem; Alan Hanjalic

doi:10.1016/j.imavis.2016.11.005

Exploiting scene maps and spatial relationships in quasi-static scenes for video face clustering

Alessio Bazzica, Cynthia C.S. Liem, Alan Hanjalic

Multimedia Computing

Research output: Contribution to journal › Article › Scientific › peer-review

2 Citations (Scopus)

Abstract

Video face clustering is a fundamental step in automatically annotating a video in terms of when and where (i.e., in which video shot and where in a video frame) a given person is visible. State-of-the-art face clustering solutions typically rely on the information derived from visual appearances of the face images. This is challenging because of a high degree of variation in these visual appearances due to factors like scale, viewpoint, head pose and facial expression. As a result, either the generated face clusters are not sufficiently pure, or their number is much higher than that of people appearing in the video. A possible way towards improved clustering performance is to analyze visual appearances of faces in specific contexts and take the contextual information into account when designing the clustering algorithm. In this paper, we focus on the context of quasi-static scenes, in which we can assume that the people's positions in a scene are (quasi-)stationary. We present a novel video clustering algorithm that exploits this property to match faces and efficiently propagate face labels across the scope of viewpoints, scale and level of zoom characterizing different frames and shots of a video. We also present a novel publicly available dataset of manually annotated quasi-static scene videos. Experimental assessment on the latter indicates that exploiting information derived by the scene and the spatial relationships between people can substantially improve the clustering performance compared to the state-of-the-art in the field.

Original language	English
Pages (from-to)	25-43
Number of pages	19
Journal	Image and Vision Computing
Volume	57
DOIs	https://doi.org/10.1016/j.imavis.2016.11.005
Publication status	Published - 1 Jan 2017

Keywords

Video face annotation
Face clustering
Re-identification

Access to Document

10.1016/j.imavis.2016.11.005

Cite this

@article{e9b9dc9d4c144f65b991869e1d5c0532,

title = "Exploiting scene maps and spatial relationships in quasi-static scenes for video face clustering",

abstract = "Video face clustering is a fundamental step in automatically annotating a video in terms of when and where (i.e., in which video shot and where in a video frame) a given person is visible. State-of-the-art face clustering solutions typically rely on the information derived from visual appearances of the face images. This is challenging because of a high degree of variation in these visual appearances due to factors like scale, viewpoint, head pose and facial expression. As a result, either the generated face clusters are not sufficiently pure, or their number is much higher than that of people appearing in the video. A possible way towards improved clustering performance is to analyze visual appearances of faces in specific contexts and take the contextual information into account when designing the clustering algorithm. In this paper, we focus on the context of quasi-static scenes, in which we can assume that the people's positions in a scene are (quasi-)stationary. We present a novel video clustering algorithm that exploits this property to match faces and efficiently propagate face labels across the scope of viewpoints, scale and level of zoom characterizing different frames and shots of a video. We also present a novel publicly available dataset of manually annotated quasi-static scene videos. Experimental assessment on the latter indicates that exploiting information derived by the scene and the spatial relationships between people can substantially improve the clustering performance compared to the state-of-the-art in the field.",

keywords = "Video face annotation, Face clustering, Re-identification",

author = "Alessio Bazzica and Liem, {Cynthia C.S.} and Alan Hanjalic",

year = "2017",

month = jan,

day = "1",

doi = "10.1016/j.imavis.2016.11.005",

language = "English",

volume = "57",

pages = "25--43",

journal = "Image and Vision Computing",

issn = "0262-8856",

publisher = "Elsevier",

}

TY - JOUR

T1 - Exploiting scene maps and spatial relationships in quasi-static scenes for video face clustering

AU - Bazzica, Alessio

AU - Liem, Cynthia C.S.

AU - Hanjalic, Alan

PY - 2017/1/1

Y1 - 2017/1/1

N2 - Video face clustering is a fundamental step in automatically annotating a video in terms of when and where (i.e., in which video shot and where in a video frame) a given person is visible. State-of-the-art face clustering solutions typically rely on the information derived from visual appearances of the face images. This is challenging because of a high degree of variation in these visual appearances due to factors like scale, viewpoint, head pose and facial expression. As a result, either the generated face clusters are not sufficiently pure, or their number is much higher than that of people appearing in the video. A possible way towards improved clustering performance is to analyze visual appearances of faces in specific contexts and take the contextual information into account when designing the clustering algorithm. In this paper, we focus on the context of quasi-static scenes, in which we can assume that the people's positions in a scene are (quasi-)stationary. We present a novel video clustering algorithm that exploits this property to match faces and efficiently propagate face labels across the scope of viewpoints, scale and level of zoom characterizing different frames and shots of a video. We also present a novel publicly available dataset of manually annotated quasi-static scene videos. Experimental assessment on the latter indicates that exploiting information derived by the scene and the spatial relationships between people can substantially improve the clustering performance compared to the state-of-the-art in the field.

AB - Video face clustering is a fundamental step in automatically annotating a video in terms of when and where (i.e., in which video shot and where in a video frame) a given person is visible. State-of-the-art face clustering solutions typically rely on the information derived from visual appearances of the face images. This is challenging because of a high degree of variation in these visual appearances due to factors like scale, viewpoint, head pose and facial expression. As a result, either the generated face clusters are not sufficiently pure, or their number is much higher than that of people appearing in the video. A possible way towards improved clustering performance is to analyze visual appearances of faces in specific contexts and take the contextual information into account when designing the clustering algorithm. In this paper, we focus on the context of quasi-static scenes, in which we can assume that the people's positions in a scene are (quasi-)stationary. We present a novel video clustering algorithm that exploits this property to match faces and efficiently propagate face labels across the scope of viewpoints, scale and level of zoom characterizing different frames and shots of a video. We also present a novel publicly available dataset of manually annotated quasi-static scene videos. Experimental assessment on the latter indicates that exploiting information derived by the scene and the spatial relationships between people can substantially improve the clustering performance compared to the state-of-the-art in the field.

KW - Video face annotation

KW - Face clustering

KW - Re-identification

U2 - 10.1016/j.imavis.2016.11.005

DO - 10.1016/j.imavis.2016.11.005

M3 - Article

SN - 0262-8856

VL - 57

SP - 25

EP - 43

JO - Image and Vision Computing

JF - Image and Vision Computing

ER -

Exploiting scene maps and spatial relationships in quasi-static scenes for video face clustering

Abstract

Keywords

Access to Document

Fingerprint

Cite this