Investigating transformers in the decomposition of polygonal shapes as point collections

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

2 Citations (Scopus)

Abstract

Transformers can generate predictions in two approaches: 1. auto-regressively by conditioning each sequence element on the previous ones, or 2. directly produce an output sequences in parallel. While research has mostly explored upon this difference on sequential tasks in NLP, we study the difference between auto-regressive and parallel prediction on visual set prediction tasks, and in particular on polygonal shapes in images because polygons are representative of numerous types of objects, such as buildings or obstacles for aerial vehicles. This is challenging for deep learning architectures as a polygon can consist of a varying carnality of points. We provide evidence on the importance of natural orders for Transformers, and show the benefit of decomposing complex polygons into collections of points in an auto-regressive manner.
Original languageEnglish
Title of host publicationProceedings - 2021 IEEE/CVF International Conference on Computer Vision Workshops, ICCVW 2021
Subtitle of host publicationProceedings
EditorsL. O'Conner
Place of PublicationPiscataway
PublisherIEEE
Pages2076-2085
Number of pages10
ISBN (Electronic)978-1-6654-0191-3
ISBN (Print)978-1-6654-0192-0
DOIs
Publication statusPublished - 2021
Event2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) - Virtual at Montreal, Canada
Duration: 11 Oct 202117 Oct 2021

Publication series

NameProceedings of the IEEE International Conference on Computer Vision
Volume2021-October
ISSN (Print)1550-5499

Conference

Conference2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)
Country/TerritoryCanada
CityVirtual at Montreal
Period11/10/2117/10/21

Fingerprint

Dive into the research topics of 'Investigating transformers in the decomposition of polygonal shapes as point collections'. Together they form a unique fingerprint.

Cite this