PUNet: Temporal Action Proposal Generation With Positive Unlabeled Learning Using Key Frame Annotations

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

Abstract

Popular approaches to classifying action segments in long, realistic, untrimmed videos start with high quality action proposals. Current action proposal methods based on deep learning are trained on labeled video segments. Obtaining annotated segments for untrimmed videos is time consuming, expensive and error-prone as annotated temporal action boundaries are imprecise, subjective and inconsistent. By embracing this uncertainty we explore to significantly speed up temporal annotations by using just a single key frame label for each action instance instead of the inherently imprecise start and end frames. To tackle the class imbalance by using only a single frame, we evaluate an extremely simple Positive-Unlabeled algorithm (PU-learning). We demonstrate on THUMOS’14 and ActivityNet that using a single key frame label give good results while being significantly faster to annotate. In addition, we show that our simple method, PUNet 1, is data-efficient which further reduces the need for expensive annotations.

Original languageEnglish
Title of host publication2021 IEEE International Conference on Image Processing (ICIP)
Subtitle of host publicationProceedings
Place of PublicationPiscataway
PublisherIEEE
Pages2598-2602
Number of pages5
ISBN (Electronic)978-1-6654-4115-5
ISBN (Print)978-1-6654-3102-6
DOIs
Publication statusPublished - 2021
Event2021 IEEE International Conference on Image Processing (ICIP) - Virtual at Anchorage, United States
Duration: 19 Sept 202122 Sept 2021

Conference

Conference2021 IEEE International Conference on Image Processing (ICIP)
Country/TerritoryUnited States
CityVirtual at Anchorage
Period19/09/2122/09/21

Keywords

  • Proposal Generation
  • Action Localization
  • Positive-Unlabeled Learning

Fingerprint

Dive into the research topics of 'PUNet: Temporal Action Proposal Generation With Positive Unlabeled Learning Using Key Frame Annotations'. Together they form a unique fingerprint.

Cite this