Benchmarking Data Efficiency and Computational Efficiency of Temporal Action Localization Models

J. Warchocki, T. Oprescu, Y. Wang, A. Dămăcuș, P.M. Misterka, R. Bruintjes, A. Lengyel, O. Strafforello, J.C. van Gemert

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

11 Downloads (Pure)

Abstract

In temporal action localization, given an input video, the goal is to predict which actions it contains, where they begin, and where they end. Training and testing current state-of- the-art deep learning models requires access to large amounts of data and computational power. However, gathering such data is challenging and computational resources might be limited. This work explores and measures how current deep temporal action localization models perform in settings constrained by the amount of data or computational power. We measure data efficiency by training each model on a subset of the training set. We find that TemporalMaxer outperforms other models in data-limited settings. Furthermore, we recommend TriDet when training time is limited. To test the efficiency of the models during inference, we pass videos of different lengths through each model. We find that TemporalMaxer requires the least computational resources, likely due to its simple architecture.
Original languageEnglish
Title of host publicationProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops
Pages3008-3016
Number of pages9
Publication statusPublished - 2023
EventICCV 2023: International Conference on Computer Vision - Paris, France
Duration: 2 Oct 20236 Oct 2023

Conference

ConferenceICCV 2023: International Conference on Computer Vision
Country/TerritoryFrance
CityParis
Period2/10/236/10/23

Fingerprint

Dive into the research topics of 'Benchmarking Data Efficiency and Computational Efficiency of Temporal Action Localization Models'. Together they form a unique fingerprint.

Cite this