Deep end-to-end 3D person detection from Camera and Lidar

Markus Roth; Dominik Jargot; Dariu Gavrila

doi:10.1109/ITSC.2019.8917366

Deep end-to-end 3D person detection from Camera and Lidar

Markus Roth, Dominik Jargot, Dariu Gavrila

Intelligent Vehicles

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

13 Citations (Scopus)

278 Downloads (Pure)

Abstract

We present a method for 3D person detection from camera images and lidar point clouds in automotive scenes. The method comprises a deep neural network which estimates the 3D location and extent of persons present in the scene. 3D anchor proposals are refined in two stages: a region proposal network and a subsequent detection network.For both input modalities high-level feature representations are learned from raw sensor data instead of being manually designed. To that end, we use Voxel Feature Encoders [1] to obtain point cloud features instead of widely used projection-based point cloud representations, thus allowing the network to learn to predict the location and extent of persons in an end-to-end manner.Experiments on the validation set of the KITTI 3D object detection benchmark [2] show that the proposed method outperforms state-of-the-art methods with an average precision (AP) of 47.06% on moderate difficulty.

Original language	English
Title of host publication	Proceedings IEEE Intelligent Transportation Systems Conference (ITSC 2019)
Place of Publication	Piscataway, NJ, USA
Publisher	IEEE
Pages	521-527
ISBN (Print)	978-1-5386-7024-8
DOIs	https://doi.org/10.1109/ITSC.2019.8917366
Publication status	Published - 2019
Event	IEEE Intelligent Transportation Systems Conference - Auckland, New Zealand Duration: 27 Oct 2019 → 30 Oct 2019

Conference

Conference	IEEE Intelligent Transportation Systems Conference
Abbreviated title	ITSC 2019
Country/Territory	New Zealand
City	Auckland
Period	27/10/19 → 30/10/19

Bibliographical note

Accepted Author Manuscript

Access to Document

10.1109/ITSC.2019.8917366

roth2019itsc_lidar_person_detectionAccepted author manuscript, 2.95 MB

Cite this

@inproceedings{83f7a017a71340099505f758f58c07e1,

title = "Deep end-to-end 3D person detection from Camera and Lidar",

abstract = "We present a method for 3D person detection from camera images and lidar point clouds in automotive scenes. The method comprises a deep neural network which estimates the 3D location and extent of persons present in the scene. 3D anchor proposals are refined in two stages: a region proposal network and a subsequent detection network.For both input modalities high-level feature representations are learned from raw sensor data instead of being manually designed. To that end, we use Voxel Feature Encoders [1] to obtain point cloud features instead of widely used projection-based point cloud representations, thus allowing the network to learn to predict the location and extent of persons in an end-to-end manner.Experiments on the validation set of the KITTI 3D object detection benchmark [2] show that the proposed method outperforms state-of-the-art methods with an average precision (AP) of 47.06% on moderate difficulty.",

author = "Markus Roth and Dominik Jargot and Dariu Gavrila",

note = "Accepted Author Manuscript; IEEE Intelligent Transportation Systems Conference , ITSC 2019 ; Conference date: 27-10-2019 Through 30-10-2019",

year = "2019",

doi = "10.1109/ITSC.2019.8917366",

language = "English",

isbn = "978-1-5386-7024-8",

pages = "521--527",

booktitle = "Proceedings IEEE Intelligent Transportation Systems Conference (ITSC 2019)",

publisher = "IEEE",

address = "United States",

}

TY - GEN

T1 - Deep end-to-end 3D person detection from Camera and Lidar

AU - Roth, Markus

AU - Jargot, Dominik

AU - Gavrila, Dariu

N1 - Accepted Author Manuscript

PY - 2019

Y1 - 2019

N2 - We present a method for 3D person detection from camera images and lidar point clouds in automotive scenes. The method comprises a deep neural network which estimates the 3D location and extent of persons present in the scene. 3D anchor proposals are refined in two stages: a region proposal network and a subsequent detection network.For both input modalities high-level feature representations are learned from raw sensor data instead of being manually designed. To that end, we use Voxel Feature Encoders [1] to obtain point cloud features instead of widely used projection-based point cloud representations, thus allowing the network to learn to predict the location and extent of persons in an end-to-end manner.Experiments on the validation set of the KITTI 3D object detection benchmark [2] show that the proposed method outperforms state-of-the-art methods with an average precision (AP) of 47.06% on moderate difficulty.

AB - We present a method for 3D person detection from camera images and lidar point clouds in automotive scenes. The method comprises a deep neural network which estimates the 3D location and extent of persons present in the scene. 3D anchor proposals are refined in two stages: a region proposal network and a subsequent detection network.For both input modalities high-level feature representations are learned from raw sensor data instead of being manually designed. To that end, we use Voxel Feature Encoders [1] to obtain point cloud features instead of widely used projection-based point cloud representations, thus allowing the network to learn to predict the location and extent of persons in an end-to-end manner.Experiments on the validation set of the KITTI 3D object detection benchmark [2] show that the proposed method outperforms state-of-the-art methods with an average precision (AP) of 47.06% on moderate difficulty.

UR - http://www.scopus.com/inward/record.url?scp=85076801929&partnerID=8YFLogxK

U2 - 10.1109/ITSC.2019.8917366

DO - 10.1109/ITSC.2019.8917366

M3 - Conference contribution

SN - 978-1-5386-7024-8

SP - 521

EP - 527

BT - Proceedings IEEE Intelligent Transportation Systems Conference (ITSC 2019)

PB - IEEE

CY - Piscataway, NJ, USA

T2 - IEEE Intelligent Transportation Systems Conference

Y2 - 27 October 2019 through 30 October 2019

ER -

Deep end-to-end 3D person detection from Camera and Lidar

Abstract

Conference

Bibliographical note

Access to Document

Other files and links

Fingerprint

Driver and Pedestrian Mutual Awareness for Path Prediction in Intelligent Vehicles

Cite this