CUAHN-VIO: Content-and-uncertainty-aware homography network for visual-inertial odometry

Yingfu Xu*, Guido C.H.E. de Croon

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

13 Downloads (Pure)

Abstract

Learning-based visual ego-motion estimation is promising yet not ready for navigating agile mobile robots in the real world. In this article, we propose CUAHN-VIO, a robust and efficient monocular visual-inertial odometry (VIO) designed for micro aerial vehicles (MAVs) equipped with a downward-facing camera. The vision frontend is a content-and-uncertainty-aware homography network (CUAHN). Content awareness measures the robustness of the network toward non-homography image content, e.g. 3-dimensional objects lying on a planar surface. Uncertainty awareness refers that the network not only predicts the homography transformation but also estimates the prediction uncertainty. The training requires no ground truth that is often difficult to obtain. The network has good generalization that enables “plug-and-play” deployment in new environments without fine-tuning. A lightweight extended Kalman filter (EKF) serves as the VIO backend and utilizes the mean prediction and variance estimation from the network for visual measurement updates. CUAHN-VIO is evaluated on a high-speed public dataset and shows rivaling accuracy to state-of-the-art (SOTA) VIO approaches. Thanks to the robustness to motion blur, low network inference time (∼23 ms), and stable processing latency (∼26 ms), CUAHN-VIO successfully runs onboard an Nvidia Jetson TX2 embedded processor to navigate a fast autonomous MAV.

Original languageEnglish
Article number104866
Number of pages18
JournalRobotics and Autonomous Systems
Volume185
DOIs
Publication statusPublished - 2025

Keywords

  • Deep homography
  • Micro air vehicle
  • Self-supervised learning
  • Uncertainty estimation
  • Visual-inertial odometry

Fingerprint

Dive into the research topics of 'CUAHN-VIO: Content-and-uncertainty-aware homography network for visual-inertial odometry'. Together they form a unique fingerprint.

Cite this