Spot On: Action Localization from Pointly-Supervised Proposals

Pascal Mettes; Jan van Gemert; CGM Snoek

doi:10.1007/978-3-319-46454-1_27

Spot On: Action Localization from Pointly-Supervised Proposals

Pascal Mettes, Jan van Gemert, CGM Snoek

Pattern Recognition and Bioinformatics

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

48 Citations (Scopus)

Abstract

We strive for spatio-temporal localization of actions in videos. The state-of-the-art relies on action proposals at test time and selects the best one with a classifier trained on carefully annotated box annotations. Annotating action boxes in video is cumbersome, tedious, and error prone. Rather than annotating boxes, we propose to annotate actions in video with points on a sparse subset of frames only. We introduce an overlap measure between action proposals and points and incorporate them all into the objective of a non-convex Multiple Instance Learning optimization. Experimental evaluation on the UCF Sports and UCF 101 datasets shows that (i) spatio-temporal proposals can be used to train classifiers while retaining the localization performance, (ii) point annotations yield results comparable to box annotations while being significantly faster to annotate, (iii) with a minimum amount of supervision our approach is competitive to the state-of-the-art. Finally, we introduce spatio-temporal action annotations on the train and test videos of Hollywood2, resulting in Hollywood2Tubes, available at http://tinyurl.com/hollywood2tubes.

Original language	English
Title of host publication	Computer Vision ECCV 2016
Subtitle of host publication	14th European Conference, proceedings
Editors	B. Leibe, J. Matas, N. Sebe, M. Welling
Place of Publication	Cham
Publisher	Springer
Pages	437-453
Number of pages	17
Volume	5
ISBN (Electronic)	978-3-319-46454-1
ISBN (Print)	978-3-319-46453-4
DOIs	https://doi.org/10.1007/978-3-319-46454-1_27
Publication status	Published - 2016
Event	ECCV 2016: 29th European Conference on Computer Vision - Amsterdam, Netherlands Duration: 8 Oct 2016 → 16 Oct 2016

Publication series

Name	Lecture Notes in Computer Science
Publisher	Springer International Publishing AG
Volume	9909
ISSN (Print)	0302-9743

Conference

Conference	ECCV 2016
Country/Territory	Netherlands
City	Amsterdam
Period	8/10/16 → 16/10/16

Keywords

Action localization
Action proposals

Access to Document

10.1007/978-3-319-46454-1_27

Cite this

@inproceedings{26061bd453e044c6bf8ff070fbcff0a6,

title = "Spot On: Action Localization from Pointly-Supervised Proposals",

abstract = "We strive for spatio-temporal localization of actions in videos. The state-of-the-art relies on action proposals at test time and selects the best one with a classifier trained on carefully annotated box annotations. Annotating action boxes in video is cumbersome, tedious, and error prone. Rather than annotating boxes, we propose to annotate actions in video with points on a sparse subset of frames only. We introduce an overlap measure between action proposals and points and incorporate them all into the objective of a non-convex Multiple Instance Learning optimization. Experimental evaluation on the UCF Sports and UCF 101 datasets shows that (i) spatio-temporal proposals can be used to train classifiers while retaining the localization performance, (ii) point annotations yield results comparable to box annotations while being significantly faster to annotate, (iii) with a minimum amount of supervision our approach is competitive to the state-of-the-art. Finally, we introduce spatio-temporal action annotations on the train and test videos of Hollywood2, resulting in Hollywood2Tubes, available at http://tinyurl.com/hollywood2tubes.",

keywords = "Action localization, Action proposals",

author = "Pascal Mettes and {van Gemert}, Jan and CGM Snoek",

year = "2016",

doi = "10.1007/978-3-319-46454-1_27",

language = "English",

isbn = "978-3-319-46453-4",

volume = "5",

series = "Lecture Notes in Computer Science",

publisher = "Springer",

pages = "437--453",

editor = "B. Leibe and J. Matas and N. Sebe and M. Welling",

booktitle = "Computer Vision ECCV 2016",

note = "ECCV 2016 : 29th European Conference on Computer Vision ; Conference date: 08-10-2016 Through 16-10-2016",

}

Mettes, P, van Gemert, J & Snoek, CGM 2016, Spot On: Action Localization from Pointly-Supervised Proposals. in B Leibe, J Matas, N Sebe & M Welling (eds), Computer Vision ECCV 2016: 14th European Conference, proceedings. vol. 5, Lecture Notes in Computer Science, vol. 9909, Springer, Cham, pp. 437-453, ECCV 2016, Amsterdam, Netherlands, 8/10/16. https://doi.org/10.1007/978-3-319-46454-1_27

Spot On: Action Localization from Pointly-Supervised Proposals. / Mettes, Pascal; van Gemert, Jan; Snoek, CGM.
Computer Vision ECCV 2016: 14th European Conference, proceedings. ed. / B. Leibe; J. Matas; N. Sebe; M. Welling. Vol. 5 Cham: Springer, 2016. p. 437-453 (Lecture Notes in Computer Science; Vol. 9909).

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Spot On

T2 - ECCV 2016

AU - Mettes, Pascal

AU - van Gemert, Jan

AU - Snoek, CGM

PY - 2016

Y1 - 2016

N2 - We strive for spatio-temporal localization of actions in videos. The state-of-the-art relies on action proposals at test time and selects the best one with a classifier trained on carefully annotated box annotations. Annotating action boxes in video is cumbersome, tedious, and error prone. Rather than annotating boxes, we propose to annotate actions in video with points on a sparse subset of frames only. We introduce an overlap measure between action proposals and points and incorporate them all into the objective of a non-convex Multiple Instance Learning optimization. Experimental evaluation on the UCF Sports and UCF 101 datasets shows that (i) spatio-temporal proposals can be used to train classifiers while retaining the localization performance, (ii) point annotations yield results comparable to box annotations while being significantly faster to annotate, (iii) with a minimum amount of supervision our approach is competitive to the state-of-the-art. Finally, we introduce spatio-temporal action annotations on the train and test videos of Hollywood2, resulting in Hollywood2Tubes, available at http://tinyurl.com/hollywood2tubes.

AB - We strive for spatio-temporal localization of actions in videos. The state-of-the-art relies on action proposals at test time and selects the best one with a classifier trained on carefully annotated box annotations. Annotating action boxes in video is cumbersome, tedious, and error prone. Rather than annotating boxes, we propose to annotate actions in video with points on a sparse subset of frames only. We introduce an overlap measure between action proposals and points and incorporate them all into the objective of a non-convex Multiple Instance Learning optimization. Experimental evaluation on the UCF Sports and UCF 101 datasets shows that (i) spatio-temporal proposals can be used to train classifiers while retaining the localization performance, (ii) point annotations yield results comparable to box annotations while being significantly faster to annotate, (iii) with a minimum amount of supervision our approach is competitive to the state-of-the-art. Finally, we introduce spatio-temporal action annotations on the train and test videos of Hollywood2, resulting in Hollywood2Tubes, available at http://tinyurl.com/hollywood2tubes.

KW - Action localization

KW - Action proposals

U2 - 10.1007/978-3-319-46454-1_27

DO - 10.1007/978-3-319-46454-1_27

M3 - Conference contribution

SN - 978-3-319-46453-4

VL - 5

T3 - Lecture Notes in Computer Science

SP - 437

EP - 453

BT - Computer Vision ECCV 2016

A2 - Leibe, B.

A2 - Matas, J.

A2 - Sebe, N.

A2 - Welling, M.

PB - Springer

CY - Cham

Y2 - 8 October 2016 through 16 October 2016

ER -

Spot On: Action Localization from Pointly-Supervised Proposals

Abstract

Publication series

Conference

Keywords

Access to Document

Fingerprint

Cite this