Convolutional Cross-View Pose Estimation

Research output: Contribution to journalArticleScientificpeer-review


We propose a novel end-to-end method for cross-view pose estimation. Given a ground-level query image and an aerial image that covers the query's local neighborhood, the 3 Degrees-of-Freedom camera pose of the query is estimated by matching its image descriptor to descriptors of local regions within the aerial image. The orientation-aware descriptors are obtained by using a translationally equivariant convolutional ground image encoder and contrastive learning. The Localization Decoder produces a dense probability distribution in a coarse-to-fine manner with a novel Localization Matching Upsampling module. A smaller Orientation Decoder produces a vector field to condition the orientation estimate on the localization. Our method is validated on the VIGOR and KITTI datasets, where it surpasses the state-of-the-art baseline by 72% and 36% in median localization error for comparable orientation estimation accuracy. The predicted probability distribution can represent localization ambiguity, and enables rejecting possible erroneous predictions. Without re-training, the model can infer on ground images with different field of views and utilize orientation priors if available. On the Oxford RobotCar dataset, our method can reliably estimate the ego-vehicle's pose over time, achieving a median localization error under 1 meter and a median orientation error of around 1 degree at 14 FPS.

Original languageEnglish
Article number10373898
Pages (from-to)3813-3831
Number of pages19
JournalIEEE Transactions on Pattern Analysis and Machine Intelligence
Issue number5
Publication statusPublished - 2024

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.


  • aerial imagery
  • camera pose estimation
  • Cameras
  • Cross-view matching
  • Decoding
  • Feature extraction
  • Image retrieval
  • localization
  • Location awareness
  • orientation estimation
  • Pose estimation
  • Task analysis


Dive into the research topics of 'Convolutional Cross-View Pose Estimation'. Together they form a unique fingerprint.

Cite this