Driver and Pedestrian Mutual Awareness for Path Prediction in Intelligent Vehicles

Research output: ThesisDissertation (TU Delft)

58 Downloads (Pure)


This thesis addresses the sensor-based perception of driver and pedestrian to improve joint path prediction of ego-vehicle and pedestrian based on mutual awareness in the domain of intelligent vehicles.

According to the World Health Organization (WHO), more than half of global traffic deaths are among Vulnerable Road Users (VRUs), such as pedestrians and riders, and human error is still a major cause of accidents. This motivates paying special attention to pedestrians and drivers while they are interacting in traffic. For the foreseeable future, the reality on the road (and the accident numbers) will largely be determined by Advanced Driver-assistance Systems (ADAS) where the driver is still required to keep the eyes on the road. To that end, the scope of this thesis resides within ADAS and driving automation up to (including) autonomy level 3 as defined by the Society of Automotive Engineers (SAE). While current ADAS consider pedestrians and the driver individually, their mutual awareness has not been leveraged to improve path prediction and thereby road safety. This thesis presents a framework that estimates driver head pose from driver camera images, estimates pedestrian location and orientation from exterior camera images and lidar point clouds, uses this information over time to reason about driver and pedestrian mutual awareness, and performs joint probabilistic path prediction of ego-vehicle and pedestrian to assess collision risk.

Deep neural networks demand a large training set to tune the vast amount of parameters. This thesis introduces DD-Pose, the Daimler TU Delft Driver Head Pose Benchmark, a large-scale and diverse benchmark for image-based head pose estimation and driver analysis. It contains 330k measurements from multiple cameras acquired by an in-car setup during naturalistic drives. Large out-of-plane head rotations and occlusions are induced by complex driving scenarios. Precise head pose annotations are obtained by a motion capture sensor and a novel calibration device. The new dataset offers a broad distribution of head poses, comprising an order of magnitude more samples of rare poses than a comparable dataset.

Utilizing the dataset, this thesis presents intrApose, a novel method for continuous 6 degrees of freedom (DOF) head pose estimation from a single camera image without prior detection or landmark localization. intrApose uses camera intrinsics consistently within the deep neural network and is crop-aware and scale-aware: poses estimated from bounding boxes within the overall image are converted to a consistent pose within the camera frame. It employs a continuous, differentiable rotation representation that simplifies the overall architecture compared to existing methods. Experiments show that leveraging camera intrinsics and a continuous rotation representation (SVDO+) results in improved pose estimation compared to intrinsics agnostic variants and variants with discontinuous rotation representations. Driver head pose of naturalistic driving is biased towards close-to-frontal orientations. Training with an unbiased data distribution, i.e., a more uniform distribution of head poses, further reduces rotation error, specifically for extreme orientations and occlusions.

In addition to considering the inside of the vehicle, this thesis also focuses on the outside environment and presents a method for 3D person detection from a pair of camera image and lidar point cloud in automotive scenes. The method comprises a deep neural network that estimates the 3D location, spatial extent, and yaw orientation of persons present in the scene. 3D anchor proposals are refined in two stages: a region proposal network and a subsequent detection network. For both input modalities high-level feature representations are learned from raw sensor data instead of being manually designed. To that end, the method uses Voxel Feature Encoders to obtain point cloud features instead of widely used projection-based point cloud representations. Experiments are conducted on the KITTI 3D object detection benchmark, a commonly used dataset in the automotive domain.

Eventually, the output provided by the methods of the former chapters, namely, driver head pose and 3D person locations, are leveraged by a novel method for vehicle-pedestrian path prediction that takes into account the awareness of the driver and the pedestrian of each other’s presence. The method jointly models the paths of ego-vehicle and a pedestrian within a single Dynamic Bayesian Network (DBN). In this DBN, subgraphs model the environment and entity-specific context cues of the vehicle and pedestrian (incl. awareness), which affect their future motion. These sub-graphs share a latent state which models whether the vehicle and pedestrian are on collision course. The method is validated with real-world data obtained by on-board vehicle sensing, spanning various awareness conditions and dynamic characteristics of the participants. Results show that at a prediction horizon of 1.5 s, context-aware models outperform context-agnostic models in path prediction for scenarios with a dynamics change while performing similarly otherwise. Results further indicate that driver attention-aware models improve collision risk estimation compared to driver-agnostic models. This illustrates that driver contextual cues can support a more anticipatory collision warning and vehicle control strategy.

The main conclusions and findings of this thesis are: using a measurement device with a per-subject calibration procedure simplifies the data acquisition process to obtain a broad distribution of head poses. Using an intrinsics-aware head pose estimation method with a continuous rotation representations allows for a simple architecture that yields robust head pose estimates across a broad spectrum of head poses. Modeling of both driver and pedestrian mutual awareness in a unified DBN improves joint probabilistic path prediction compared to driver-agnostic models. Additionally, it provides explainability for model parameters and interpretability of the internal decision making process. Further research can be conducted to understand the behavior of humans inside and outside an intelligent vehicle. Two major trends go towards integrating uncertainties into the components and combining them to a system that can be trained end-to-end from raw sensor data to predicted paths. Future work would greatly benefit from representative, worldwide, naturalistic, multi-sensor, temporal data which cover the outside environment as well as the inside of the vehicle - ideally shared across research institutions and companies.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • Delft University of Technology
  • Gavrila, D., Supervisor
  • Kooij, J.F.P., Advisor
Award date20 Dec 2023
Electronic ISBNs978-94-6384-502-1
Publication statusPublished - 2023


  • Head pose estimation
  • Head pose dataset
  • Person detection
  • Ego-vehicle path prediction
  • Pedestrian path prediction
  • Intelligent vehicles
  • Automated driving


Dive into the research topics of 'Driver and Pedestrian Mutual Awareness for Path Prediction in Intelligent Vehicles'. Together they form a unique fingerprint.

Cite this