Video BagNet: Short temporal receptive fields increase robustness in long-term action recognition

Ombretta Strafforello; Xin Liu; Klamer Schutte; Jan van Gemert

doi:10.1109/ICCVW60793.2023.00023

Video BagNet: Short temporal receptive fields increase robustness in long-term action recognition

Ombretta Strafforello, Xin Liu, Klamer Schutte, Jan van Gemert

Pattern Recognition and Bioinformatics

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

1 Citation (Scopus)

Abstract

Previous work on long-term video action recognition relies on deep 3D-convolutional models that have a large temporal receptive field (RF). We argue that these models are not always the best choice for temporal modeling in videos. A large temporal receptive field allows the model to encode the exact sub-action order of a video, which causes a performance decrease when testing videos have a different sub-action order. In this work, we investigate whether we can improve the model robustness to the sub-action order by shrinking the temporal receptive field of action recognition models. For this, we design Video BagNet, a variant of the 3D ResNet-50 model with the temporal receptive field size limited to 1, 9, 17 or 33 frames. We analyze Video Bag-Net on synthetic and real-world video datasets and experimentally compare models with varying temporal receptive fields. We find that short receptive fields are robust to sub-action order changes, while larger temporal receptive fields are sensitive to the sub-action order.

Original language	English
Title of host publication	Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
Editors	Cristina Ceballos
Place of Publication	Piscataway
Publisher	IEEE
Pages	159-166
Number of pages	8
ISBN (Electronic)	979-8-3503-0744-3
ISBN (Print)	979-8-3503-0745-0
DOIs	https://doi.org/10.1109/ICCVW60793.2023.00023
Publication status	Published - 2023
Event	2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) - Paris, France Duration: 2 Oct 2023 → 6 Oct 2023

Conference

Conference	2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
Country/Territory	France
City	Paris
Period	2/10/23 → 6/10/23

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Access to Document

10.1109/ICCVW60793.2023.00023

Cite this

@inproceedings{3ae2da81d8ac47cc9ad0e6b03724c81c,

title = "Video BagNet: Short temporal receptive fields increase robustness in long-term action recognition",

abstract = "Previous work on long-term video action recognition relies on deep 3D-convolutional models that have a large temporal receptive field (RF). We argue that these models are not always the best choice for temporal modeling in videos. A large temporal receptive field allows the model to encode the exact sub-action order of a video, which causes a performance decrease when testing videos have a different sub-action order. In this work, we investigate whether we can improve the model robustness to the sub-action order by shrinking the temporal receptive field of action recognition models. For this, we design Video BagNet, a variant of the 3D ResNet-50 model with the temporal receptive field size limited to 1, 9, 17 or 33 frames. We analyze Video Bag-Net on synthetic and real-world video datasets and experimentally compare models with varying temporal receptive fields. We find that short receptive fields are robust to sub-action order changes, while larger temporal receptive fields are sensitive to the sub-action order.",

author = "Ombretta Strafforello and Xin Liu and Klamer Schutte and {van Gemert}, Jan",

note = "Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.; 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) ; Conference date: 02-10-2023 Through 06-10-2023",

year = "2023",

doi = "10.1109/ICCVW60793.2023.00023",

language = "English",

isbn = "979-8-3503-0745-0",

pages = "159--166",

editor = "Ceballos, {Cristina }",

booktitle = "Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)",

publisher = "IEEE",

address = "United States",

}

Strafforello, O , Liu, X, Schutte, K & van Gemert, J 2023, Video BagNet: Short temporal receptive fields increase robustness in long-term action recognition. in C Ceballos (ed.), Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). IEEE, Piscataway, pp. 159-166, 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), Paris, France, 2/10/23. https://doi.org/10.1109/ICCVW60793.2023.00023

Video BagNet: Short temporal receptive fields increase robustness in long-term action recognition. / Strafforello, Ombretta ; Liu, Xin; Schutte, Klamer et al.
Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). ed. / Cristina Ceballos. Piscataway: IEEE, 2023. p. 159-166.

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Video BagNet

T2 - 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)

AU - Strafforello, Ombretta

AU - Liu, Xin

AU - Schutte, Klamer

AU - van Gemert, Jan

N1 - Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

PY - 2023

Y1 - 2023

N2 - Previous work on long-term video action recognition relies on deep 3D-convolutional models that have a large temporal receptive field (RF). We argue that these models are not always the best choice for temporal modeling in videos. A large temporal receptive field allows the model to encode the exact sub-action order of a video, which causes a performance decrease when testing videos have a different sub-action order. In this work, we investigate whether we can improve the model robustness to the sub-action order by shrinking the temporal receptive field of action recognition models. For this, we design Video BagNet, a variant of the 3D ResNet-50 model with the temporal receptive field size limited to 1, 9, 17 or 33 frames. We analyze Video Bag-Net on synthetic and real-world video datasets and experimentally compare models with varying temporal receptive fields. We find that short receptive fields are robust to sub-action order changes, while larger temporal receptive fields are sensitive to the sub-action order.

AB - Previous work on long-term video action recognition relies on deep 3D-convolutional models that have a large temporal receptive field (RF). We argue that these models are not always the best choice for temporal modeling in videos. A large temporal receptive field allows the model to encode the exact sub-action order of a video, which causes a performance decrease when testing videos have a different sub-action order. In this work, we investigate whether we can improve the model robustness to the sub-action order by shrinking the temporal receptive field of action recognition models. For this, we design Video BagNet, a variant of the 3D ResNet-50 model with the temporal receptive field size limited to 1, 9, 17 or 33 frames. We analyze Video Bag-Net on synthetic and real-world video datasets and experimentally compare models with varying temporal receptive fields. We find that short receptive fields are robust to sub-action order changes, while larger temporal receptive fields are sensitive to the sub-action order.

UR - http://www.scopus.com/inward/record.url?scp=85170272140&partnerID=8YFLogxK

U2 - 10.1109/ICCVW60793.2023.00023

DO - 10.1109/ICCVW60793.2023.00023

M3 - Conference contribution

SN - 979-8-3503-0745-0

SP - 159

EP - 166

BT - Proceedings of the 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)

A2 - Ceballos, Cristina

PB - IEEE

CY - Piscataway

Y2 - 2 October 2023 through 6 October 2023

ER -

Video BagNet: Short temporal receptive fields increase robustness in long-term action recognition

Abstract

Conference

Bibliographical note

Access to Document

Other files and links

Embargoed Document

Fingerprint

Cite this