Recurrent Affine Transform Encoder for Image Representation

Letao Liu*, Xudong Jiang, Martin Saerbeck, Justin Dauwels

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

63 Downloads (Pure)

Abstract

This paper proposes a Recurrent Affine Transform Encoder (RATE) that can be used for image representation learning. We propose a learning architecture that enables a CNN encoder to learn the affine transform parameter of images. The proposed learning architecture decomposes an affine transform matrix into two transform matrices and learns them jointly in a self-supervised manner. The proposed RATE is trained by unlabeled image data without any ground truth and infers the affine transform parameter of input images recurrently. The inferred affine transform parameter can be used to represent images in canonical form to greatly reduce the image variations in affine transforms such as rotation, scaling, and translation. Different from the spatial transformer network, the proposed RATE does not need to be embedded into other networks for training with the aid of other learning objectives. We show that the proposed RATE learns the affine transform parameter of images and achieves impressive image representation results in terms of invariance to translation, scaling, and rotation. We also show that the classification performance is enhanced and is more robust against distortion by incorporating the RATE into the existing classification model.
Original languageEnglish
Article number9709266
Pages (from-to)18653-18666
Number of pages14
JournalIEEE Access
Volume10
DOIs
Publication statusPublished - 2022

Keywords

  • Canonical image base
  • self-supervised learning
  • representation learning

Fingerprint

Dive into the research topics of 'Recurrent Affine Transform Encoder for Image Representation'. Together they form a unique fingerprint.

Cite this