This paper presents a method for multiple object tracking (MOT) in video streams. The method incorporates the prediction of physical locations of people into a tracking-by-detection paradigm. We predict the trajectories of people on an estimated ground plane and apply a learning-based network to extract the appearance features across frames. The method transforms the detected object locations from image space to an estimated ground space to refine the tracking trajectories. This transform space allows the objects detected from multi-view images to be associated under one coordinate system. Besides, the occluded pedestrians in image space can be well separated in a rectified ground plane where the motion models of the pedestrians are estimated. The effectiveness of this method is evaluated on different datasets by extensive comparisons with state-of-The-Art techniques. Experimental results show that the proposed method improves MOT tasks in terms of the number of identity switches (IDSW) and the fragmentations (Frag).
|Number of pages||7|
|Journal||ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences|
|Publication status||Published - 2022|
|Event||2022 24th ISPRS Congress on Imaging Today, Foreseeing Tomorrow, Commission IV - Nice, France|
Duration: 6 Jun 2022 → 11 Jun 2022
- Data Association
- Deep Features
- Multiple Object Tracking
- Transform Space.