Deep end-to-end 3D person detection from Camera and Lidar

Markus Roth, Dominik Jargot, Dariu Gavrila

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

8 Citations (Scopus)
174 Downloads (Pure)


We present a method for 3D person detection from camera images and lidar point clouds in automotive scenes. The method comprises a deep neural network which estimates the 3D location and extent of persons present in the scene. 3D anchor proposals are refined in two stages: a region proposal network and a subsequent detection network.For both input modalities high-level feature representations are learned from raw sensor data instead of being manually designed. To that end, we use Voxel Feature Encoders [1] to obtain point cloud features instead of widely used projection-based point cloud representations, thus allowing the network to learn to predict the location and extent of persons in an end-to-end manner.Experiments on the validation set of the KITTI 3D object detection benchmark [2] show that the proposed method outperforms state-of-the-art methods with an average precision (AP) of 47.06% on moderate difficulty.
Original languageEnglish
Title of host publicationProceedings IEEE Intelligent Transportation Systems Conference (ITSC 2019)
Place of PublicationPiscataway, NJ, USA
ISBN (Print)978-1-5386-7024-8
Publication statusPublished - 2019
EventIEEE Intelligent Transportation Systems Conference - Auckland, New Zealand
Duration: 27 Oct 201930 Oct 2019


ConferenceIEEE Intelligent Transportation Systems Conference
Abbreviated titleITSC 2019
Country/TerritoryNew Zealand

Bibliographical note

Accepted Author Manuscript


Dive into the research topics of 'Deep end-to-end 3D person detection from Camera and Lidar'. Together they form a unique fingerprint.

Cite this