TY - JOUR
T1 - EuroCity persons
T2 - A novel benchmark for person detection in traffic scenes
AU - Braun, Markus
AU - Krebs, Sebastian
AU - Flohr, Fabian
AU - Gavrila, Dariu
N1 - Accepted Author Manuscript
PY - 2019
Y1 - 2019
N2 - Big data has had a great share in the success of deep learning in computer vision. Recent works suggest that there is significant further potential to increase object detection performance by utilizing even bigger datasets. In this paper, we introduce the EuroCity Persons dataset, which provides a large number of highly diverse, accurate and detailed annotations of pedestrians, cyclists and other riders in urban traffic scenes. The images for this dataset were collected on-board a moving vehicle in 31 cities of 12 European countries. With over 238,200 person instances manually labeled in over 47,300 images, EuroCity Persons is nearly one order of magnitude larger than datasets used previously for person detection in traffic scenes. The dataset furthermore contains a large number of person orientation annotations (over 211,200). We optimize four state-of-the-art deep learning approaches (Faster R-CNN, R-FCN, SSD and YOLOv3) to serve as baselines for the new object detection benchmark. In experiments with previous datasets we analyze the generalization capabilities of these detectors when trained with the new dataset. We furthermore study the effect of the training set size, the dataset diversity (day- versus night-time, geographical region), the dataset detail (i.e., availability of object orientation information) and the annotation quality on the detector performance. Finally, we analyze error sources and discuss the road ahead.
AB - Big data has had a great share in the success of deep learning in computer vision. Recent works suggest that there is significant further potential to increase object detection performance by utilizing even bigger datasets. In this paper, we introduce the EuroCity Persons dataset, which provides a large number of highly diverse, accurate and detailed annotations of pedestrians, cyclists and other riders in urban traffic scenes. The images for this dataset were collected on-board a moving vehicle in 31 cities of 12 European countries. With over 238,200 person instances manually labeled in over 47,300 images, EuroCity Persons is nearly one order of magnitude larger than datasets used previously for person detection in traffic scenes. The dataset furthermore contains a large number of person orientation annotations (over 211,200). We optimize four state-of-the-art deep learning approaches (Faster R-CNN, R-FCN, SSD and YOLOv3) to serve as baselines for the new object detection benchmark. In experiments with previous datasets we analyze the generalization capabilities of these detectors when trained with the new dataset. We furthermore study the effect of the training set size, the dataset diversity (day- versus night-time, geographical region), the dataset detail (i.e., availability of object orientation information) and the annotation quality on the detector performance. Finally, we analyze error sources and discuss the road ahead.
KW - Object detection
KW - benchmarking
UR - http://www.scopus.com/inward/record.url?scp=85068468112&partnerID=8YFLogxK
U2 - 10.1109/TPAMI.2019.2897684
DO - 10.1109/TPAMI.2019.2897684
M3 - Article
VL - 41
SP - 1844
EP - 1861
JO - IEEE Transactions on Pattern Analysis and Machine Intelligence
JF - IEEE Transactions on Pattern Analysis and Machine Intelligence
SN - 0162-8828
IS - 8
ER -