Tubelets: Unsupervised Action Proposals from Spatiotemporal Super-Voxels

Mihir  Jain; Jan van Gemert; Hervé  Jégou; Patrick  Bouthemy; Cees G.M. Snoek

doi:10.1007/s11263-017-1023-9

Tubelets: Unsupervised Action Proposals from Spatiotemporal Super-Voxels

Mihir Jain, Jan van Gemert, Hervé Jégou, Patrick Bouthemy, Cees G.M. Snoek

Pattern Recognition and Bioinformatics

Research output: Contribution to journal › Article › Scientific › peer-review

14 Citations (Scopus)

55 Downloads (Pure)

Abstract

This paper considers the problem of localizing actions in videos as sequences of bounding boxes. The objective is to generate action proposals that are likely to include the action of interest, ideally achieving high recall with few proposals. Our contributions are threefold. First, inspired by selective search for object proposals, we introduce an approach to generate action proposals from spatiotemporal super-voxels in an unsupervised manner, we call them Tubelets. Second, along with the static features from individual frames our approach advantageously exploits motion. We introduce independent motion evidence as a feature to characterize how the action deviates from the background and explicitly incorporate such motion information in various stages of the proposal generation. Finally, we introduce spatiotemporal refinement of Tubelets, for more precise localization of actions, and pruning to keep the number of Tubelets limited. We demonstrate the suitability of our approach by extensive experiments for action proposal quality and action localization on three public datasets: UCF Sports, MSR-II and UCF101. For action proposal quality, our unsupervised proposals beat all other existing approaches on the three datasets. For action localization, we show top performance on both the trimmed videos of UCF Sports and UCF101 as well as the untrimmed videos of MSR-II.

Original language	English
Pages (from-to)	287-311
Number of pages	25
Journal	International Journal of Computer Vision
Volume	124
Issue number	3
DOIs	https://doi.org/10.1007/s11263-017-1023-9
Publication status	Published - 2017

Keywords

Action classification
Action localization
Video representation

Access to Document

10.1007/s11263-017-1023-9

10.1007_s11263-017-1023-9Final published version, 6.52 MBLicence: CC BY

Cite this

@article{657888022a35429b887a40e5bff26663,

title = "Tubelets: Unsupervised Action Proposals from Spatiotemporal Super-Voxels",

abstract = "This paper considers the problem of localizing actions in videos as sequences of bounding boxes. The objective is to generate action proposals that are likely to include the action of interest, ideally achieving high recall with few proposals. Our contributions are threefold. First, inspired by selective search for object proposals, we introduce an approach to generate action proposals from spatiotemporal super-voxels in an unsupervised manner, we call them Tubelets. Second, along with the static features from individual frames our approach advantageously exploits motion. We introduce independent motion evidence as a feature to characterize how the action deviates from the background and explicitly incorporate such motion information in various stages of the proposal generation. Finally, we introduce spatiotemporal refinement of Tubelets, for more precise localization of actions, and pruning to keep the number of Tubelets limited. We demonstrate the suitability of our approach by extensive experiments for action proposal quality and action localization on three public datasets: UCF Sports, MSR-II and UCF101. For action proposal quality, our unsupervised proposals beat all other existing approaches on the three datasets. For action localization, we show top performance on both the trimmed videos of UCF Sports and UCF101 as well as the untrimmed videos of MSR-II.",

keywords = "Action classification, Action localization, Video representation",

author = "Mihir Jain and {van Gemert}, Jan and Herv{\'e} J{\'e}gou and Patrick Bouthemy and Snoek, {Cees G.M.}",

year = "2017",

doi = "10.1007/s11263-017-1023-9",

language = "English",

volume = "124",

pages = "287--311",

journal = "International Journal of Computer Vision",

issn = "0920-5691",

publisher = "Springer",

number = "3",

}

TY - JOUR

T1 - Tubelets

T2 - Unsupervised Action Proposals from Spatiotemporal Super-Voxels

AU - Jain, Mihir

AU - van Gemert, Jan

AU - Jégou, Hervé

AU - Bouthemy, Patrick

AU - Snoek, Cees G.M.

PY - 2017

Y1 - 2017

N2 - This paper considers the problem of localizing actions in videos as sequences of bounding boxes. The objective is to generate action proposals that are likely to include the action of interest, ideally achieving high recall with few proposals. Our contributions are threefold. First, inspired by selective search for object proposals, we introduce an approach to generate action proposals from spatiotemporal super-voxels in an unsupervised manner, we call them Tubelets. Second, along with the static features from individual frames our approach advantageously exploits motion. We introduce independent motion evidence as a feature to characterize how the action deviates from the background and explicitly incorporate such motion information in various stages of the proposal generation. Finally, we introduce spatiotemporal refinement of Tubelets, for more precise localization of actions, and pruning to keep the number of Tubelets limited. We demonstrate the suitability of our approach by extensive experiments for action proposal quality and action localization on three public datasets: UCF Sports, MSR-II and UCF101. For action proposal quality, our unsupervised proposals beat all other existing approaches on the three datasets. For action localization, we show top performance on both the trimmed videos of UCF Sports and UCF101 as well as the untrimmed videos of MSR-II.

AB - This paper considers the problem of localizing actions in videos as sequences of bounding boxes. The objective is to generate action proposals that are likely to include the action of interest, ideally achieving high recall with few proposals. Our contributions are threefold. First, inspired by selective search for object proposals, we introduce an approach to generate action proposals from spatiotemporal super-voxels in an unsupervised manner, we call them Tubelets. Second, along with the static features from individual frames our approach advantageously exploits motion. We introduce independent motion evidence as a feature to characterize how the action deviates from the background and explicitly incorporate such motion information in various stages of the proposal generation. Finally, we introduce spatiotemporal refinement of Tubelets, for more precise localization of actions, and pruning to keep the number of Tubelets limited. We demonstrate the suitability of our approach by extensive experiments for action proposal quality and action localization on three public datasets: UCF Sports, MSR-II and UCF101. For action proposal quality, our unsupervised proposals beat all other existing approaches on the three datasets. For action localization, we show top performance on both the trimmed videos of UCF Sports and UCF101 as well as the untrimmed videos of MSR-II.

KW - Action classification

KW - Action localization

KW - Video representation

UR - http://www.scopus.com/inward/record.url?scp=85020422221&partnerID=8YFLogxK

UR - http://resolver.tudelft.nl/uuid:65788802-2a35-429b-887a-40e5bff26663

U2 - 10.1007/s11263-017-1023-9

DO - 10.1007/s11263-017-1023-9

M3 - Article

AN - SCOPUS:85020422221

SN - 0920-5691

VL - 124

SP - 287

EP - 311

JO - International Journal of Computer Vision

JF - International Journal of Computer Vision

IS - 3

ER -

Tubelets: Unsupervised Action Proposals from Spatiotemporal Super-Voxels

Abstract

Keywords

Access to Document

Other files and links

Fingerprint

Cite this