Skip to main navigation Skip to search Skip to main content

CrossTracker: Robust Multi-Modal 3D Multi-Object Tracking via Cross Correction

Lipeng Gu, Xuefeng Yan*, Weiming Wang*, Honghua Chen, Dingkun Zhu, Liangliang Nan, Mingqiang Wei

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

1 Downloads (Pure)

Abstract

Inaccurate detections remain a critical bottleneck in 3D multi-object tracking (MOT). Recent detection fusion-based methods incorporate camera detections as supplementary to reduce false detections and compensate for missing ones in LiDAR. However, their unidirectional camera-LiDAR correction lacks a feedback mechanism, precluding iterative mutual refinement between modalities for more robust LiDAR-based tracking. Inspired by the coarse-to-fine strategy in two-stage object detection, we introduce CrossTracker, a novel two-stage framework for online multi-modal 3D MOT. CrossTracker first constructs coarse camera and LiDAR trajectories independently, then performs trajectory fusion using both current and historical frames, without requiring future data. This ensures more robust mutual refinement between modalities. Specifically, CrossTracker comprises three core modules: i) the multi-modal modeling (M3) module, which fuses data from images, point clouds, and even planar geometry derived from images to establish a robust tracking constraint; ii) the coarse trajectory generation (C-TG) module, which independently generates coarse trajectories for both modalities using the M3 constraint; and iii) the trajectory fusion (TF) module, which applies mutual refinement between coarse LiDAR and camera trajectories through cross correction to ensure robust LiDAR trajectories. Extensive experiments show that CrossTracker outperforms 19 state-of-the-art methods, highlighting its effectiveness in leveraging the synergistic strengths of camera and LiDAR sensors for robust multi-modal 3D MOT. The code is available at https://github.com/lipeng-gu/CrossTracker.

Original languageEnglish
Pages (from-to)2191-2206
Number of pages16
JournalIEEE Transactions on Circuits and Systems for Video Technology
Volume36
Issue number2
DOIs
Publication statusPublished - 2026

Keywords

  • cross correction
  • CrossTracker
  • multi-modal 3D MOT
  • trajectory fusion
  • two-stage solution

Fingerprint

Dive into the research topics of 'CrossTracker: Robust Multi-Modal 3D Multi-Object Tracking via Cross Correction'. Together they form a unique fingerprint.

Cite this