Identifying multiple objects from their appearance in inaccurate detections

Julian F.P. Kooij; Gwenn Englebienne; Dariu M. Gavrila

doi:10.1016/j.cviu.2015.03.012

Identifying multiple objects from their appearance in inaccurate detections

Julian F.P. Kooij, Gwenn Englebienne, Dariu M. Gavrila^*

^*Corresponding author for this work

Research output: Contribution to journal › Article › Scientific › peer-review

5 Citations (Scopus)

Abstract

We propose a novel method for keeping track of multiple objects in provided regions of interest, i.e. object detections, specifically in cases where a single object results in multiple co-occurring detections (e.g. when objects exhibit unusual size or pose) or a single detection spans multiple objects (e.g. during occlusion). Our method identifies a minimal set of objects to explain the observed features, which are extracted from the regions of interest in a set of frames. Focusing on appearance rather than temporal cues, we treat video as an unordered collection of frames, and "unmix" object appearances from inaccurate detections within a Latent Dirichlet Allocation (LDA) framework, for which we propose an efficient Variational Bayes inference method. After the objects have been localized and their appearances have been learned, we can use the posterior distributions to "back-project" the assigned object features to the image and obtain segmentation at pixel level. In experiments on challenging datasets, we show that our batch method outperforms state-of-the-art batch and on-line multi-view trackers in terms of number of identity switches and proportion of correctly identified objects. We make our software and new dataset publicly available for non-commercial, benchmarking purposes.

Original language	English
Pages (from-to)	103-116
Number of pages	14
Journal	Computer Vision and Image Understanding
Volume	136
DOIs	https://doi.org/10.1016/j.cviu.2015.03.012
Publication status	Published - 1 Jan 2015
Externally published	Yes

Keywords

Generative model
Latent Dirichlet Allocation
Object recognition
Segmentation
Unsupervised learning
Video surveillance

Access to Document

10.1016/j.cviu.2015.03.012

Cite this

@article{034e4191fca44946b10de52dc0cc4f1e,

title = "Identifying multiple objects from their appearance in inaccurate detections",

abstract = "We propose a novel method for keeping track of multiple objects in provided regions of interest, i.e. object detections, specifically in cases where a single object results in multiple co-occurring detections (e.g. when objects exhibit unusual size or pose) or a single detection spans multiple objects (e.g. during occlusion). Our method identifies a minimal set of objects to explain the observed features, which are extracted from the regions of interest in a set of frames. Focusing on appearance rather than temporal cues, we treat video as an unordered collection of frames, and {"}unmix{"} object appearances from inaccurate detections within a Latent Dirichlet Allocation (LDA) framework, for which we propose an efficient Variational Bayes inference method. After the objects have been localized and their appearances have been learned, we can use the posterior distributions to {"}back-project{"} the assigned object features to the image and obtain segmentation at pixel level. In experiments on challenging datasets, we show that our batch method outperforms state-of-the-art batch and on-line multi-view trackers in terms of number of identity switches and proportion of correctly identified objects. We make our software and new dataset publicly available for non-commercial, benchmarking purposes.",

keywords = "Generative model, Latent Dirichlet Allocation, Object recognition, Segmentation, Unsupervised learning, Video surveillance",

author = "Kooij, {Julian F.P.} and Gwenn Englebienne and Gavrila, {Dariu M.}",

year = "2015",

month = jan,

day = "1",

doi = "10.1016/j.cviu.2015.03.012",

language = "English",

volume = "136",

pages = "103--116",

journal = "Computer Vision and Image Understanding",

issn = "1077-3142",

publisher = "Academic Press",

}

TY - JOUR

T1 - Identifying multiple objects from their appearance in inaccurate detections

AU - Kooij, Julian F.P.

AU - Englebienne, Gwenn

AU - Gavrila, Dariu M.

PY - 2015/1/1

Y1 - 2015/1/1

N2 - We propose a novel method for keeping track of multiple objects in provided regions of interest, i.e. object detections, specifically in cases where a single object results in multiple co-occurring detections (e.g. when objects exhibit unusual size or pose) or a single detection spans multiple objects (e.g. during occlusion). Our method identifies a minimal set of objects to explain the observed features, which are extracted from the regions of interest in a set of frames. Focusing on appearance rather than temporal cues, we treat video as an unordered collection of frames, and "unmix" object appearances from inaccurate detections within a Latent Dirichlet Allocation (LDA) framework, for which we propose an efficient Variational Bayes inference method. After the objects have been localized and their appearances have been learned, we can use the posterior distributions to "back-project" the assigned object features to the image and obtain segmentation at pixel level. In experiments on challenging datasets, we show that our batch method outperforms state-of-the-art batch and on-line multi-view trackers in terms of number of identity switches and proportion of correctly identified objects. We make our software and new dataset publicly available for non-commercial, benchmarking purposes.

AB - We propose a novel method for keeping track of multiple objects in provided regions of interest, i.e. object detections, specifically in cases where a single object results in multiple co-occurring detections (e.g. when objects exhibit unusual size or pose) or a single detection spans multiple objects (e.g. during occlusion). Our method identifies a minimal set of objects to explain the observed features, which are extracted from the regions of interest in a set of frames. Focusing on appearance rather than temporal cues, we treat video as an unordered collection of frames, and "unmix" object appearances from inaccurate detections within a Latent Dirichlet Allocation (LDA) framework, for which we propose an efficient Variational Bayes inference method. After the objects have been localized and their appearances have been learned, we can use the posterior distributions to "back-project" the assigned object features to the image and obtain segmentation at pixel level. In experiments on challenging datasets, we show that our batch method outperforms state-of-the-art batch and on-line multi-view trackers in terms of number of identity switches and proportion of correctly identified objects. We make our software and new dataset publicly available for non-commercial, benchmarking purposes.

KW - Generative model

KW - Latent Dirichlet Allocation

KW - Object recognition

KW - Segmentation

KW - Unsupervised learning

KW - Video surveillance

UR - http://www.scopus.com/inward/record.url?scp=84942366062&partnerID=8YFLogxK

U2 - 10.1016/j.cviu.2015.03.012

DO - 10.1016/j.cviu.2015.03.012

M3 - Article

AN - SCOPUS:84942366062

SN - 1077-3142

VL - 136

SP - 103

EP - 116

JO - Computer Vision and Image Understanding

JF - Computer Vision and Image Understanding

ER -

Identifying multiple objects from their appearance in inaccurate detections

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this