Context-based Cyclist Path Prediction: Crafted and Learned Models for Intelligent Vehicles

E.A.I. Pool

doi:10.4233/uuid:a5689f32-6eed-4949-9527-60723e16c8b5

Context-based Cyclist Path Prediction: Crafted and Learned Models for Intelligent Vehicles

E.A.I. Pool

Intelligent Vehicles

Research output: Thesis › Dissertation (TU Delft)

213 Downloads (Pure)

Abstract

This thesis addresses the problem of path prediction for cyclists.
Instead of solely focusing on how to predict the future trajectory based on previous position measurements, this thesis investigates how to leverage additional contextual information that can inform on the future intent of cyclists.
This thesis does this with the application of intelligent vehicles in mind.
That means all measurements come from the point of view of a vehicle on the road.
Additionally, the resulting predictions must be usable by a motion planner.
In practice, this means the predictions are a probability distribution over the future position rather than a single point in space.

This thesis starts with an investigation of one of the modules that allow path prediction in the first place: 3D object detection.
Two existing state-of-the-art 3D object detectors that exploit Lidar data are evaluated beyond the standard metrics of 3D object detection.
3D object detectors predict an oriented 3D bounding box. The standard metric determines a correct detection based on the accuracy of the position, extent, and orientation of the bounding box all at once.
By loosening the requirements for when a detection is considered correct, the accuracy of the estimated position, extent, and orientation can be evaluated separately.
The results show that a large number of detections are considered incorrect largely because of inaccurate bounding box extent rather than bounding box position, which is arguably a more important aspect for path prediction.
As a result, the performance of these 3D object detectors when used for path prediction can be considered to be higher than what the common metrics suggest.

After this, this thesis investigates how knowledge of the road topology can be used to improve the accuracy of cyclist path prediction.
The trajectories of cyclists near an intersection are extracted from a naturalistic cyclist detection dataset.
These are categorized and grouped based on the action taken by each cyclist (hard left/right, slight left/right, or straight).
A Linear Dynamical System (LDS) is fitted on each group. These LDSs are used together to create a Mixture of Linear Dynamical Systems (MoLDS).
During online inference, the relative probability of each underlying LDS allows the MoLDS to evaluate which direction the cyclist is most likely to take.
This chapter demonstrates that the highest prediction accuracy is obtained when this model is additionally given prior knowledge on which directions are available for the cyclist to take.

Next, context cues related to a specific scenario are considered.
In the scenario, a cyclist in front of the ego-vehicle approaches an intersection and has the option to either continue straight or turn left.
The three context cues considered are the distance of the cyclist to the intersection, whether the cyclist is raising their arm, and the criticality of the situation.
This last context cue is based on the time it will take the ego-vehicle to overtake the cyclist: the lower this is, the more risk a left turn brings.
This scenario is first modeled with a Switching Linear Dynamical System (SLDS) with two motion models that represent "cycling straight" and "turning left", respectively.
This model does not yet use any context cues.
Still, the SLDS is shown to outperform a baseline model that represents the scenario with a single motion model.
By letting the context cues inform the SLDS whether switching from one motion model to the other is likely to happen the performance is increased even further. The resulting model is referred to as a Dynamic Bayesian Network (DBN).

The context-based path prediction methods described so far have been designed with specific motion models and interplay of context cues in mind: the overall state representation has been hand-crafted.
The advantage of this approach is that the state representation is then interpretable, making it easy to understand why a model predicts what it does, even when it fails to predict something correctly.
However, methods with a learned state representation often attain higher performances.
The next point of investigation of this thesis is then to compare a model with a crafted state representation to a model with a learned one. Specifically, the DBN is compared to a Recurrent Neural Network (RNN), using the cyclist scenario from before.
To level the playing field as much as possible two actions are taken. First, the contextual cues are supplied to the RNN as well, and experiments assert that the performance of the RNN does in fact improve when it incorporates these cues.
Secondly, the optimization method used in the RNN is applied to the DBN as well, but in such a way that the interpretation of its crafted state representation remains the same.
Of the two methods, the RNN attains the highest performance. Still, optimizing the DBN largely closes the performance gap between the two.

Finally, this thesis determines whether the DBN is not only performant but also useful in practice: it is integrated in an intelligent vehicle.
The cyclist scenario is performed live, in which the intelligent vehicle extracts the relevant context cues directly from sensor data.
The resulting predictions are used to create an early warning system for the driver, to warn them if the cyclist intends to turn left.
The model is also used for predictions in an autonomously driving intelligent vehicle, but due to safety reasons on a different scenario that contains comparable contextual cues.
An automated dummy plays the role of a pedestrian on the sidewalk who walks towards the curbside in order to cross the road.
The intelligent vehicle is driving on this road towards the pedestrian and has right of way.
In this scenario, a pedestrian is only expected to cross the road if they are unaware of the approaching vehicle.
Furthermore, if they will stop, they are expected to only stop at the curbside.
The intelligent vehicle determines whether the pedestrian is aware of it by estimating the head orientation of the pedestrian.
Additionally, it measures the distance between the pedestrian and the curbside, and predicts the future trajectory of the pedestrian accordingly.
With the model in place, the vehicle can autonomously follow a planned trajectory and evade the pedestrian if the pedestrian does indeed cross the road.
The real-world experiments confirm the feasibility of the system. By evaluating the entire pipeline at once, from detections to motion planning, this chapter is able to propose future work that bridges these various disciplines and shows what intelligent vehicles can already realistically achieve.

Original language	English
Qualification	Doctor of Philosophy
Awarding Institution	Delft University of Technology
Supervisors/Advisors	Gavrila, D., Supervisor Kooij, J.F.P., Advisor
Award date	7 Jun 2021
Print ISBNs	978-94-6416-489-3
DOIs	https://doi.org/10.4233/uuid:a5689f32-6eed-4949-9527-60723e16c8b5
Publication status	Published - 2021

Keywords

Context modeling
Predictive models
Intelligent Vehicles

Access to Document

10.4233/uuid:a5689f32-6eed-4949-9527-60723e16c8b5

Ewoud_Pool_DissertationFinal published version, 22.8 MB

Cite this

@phdthesis{a5689f326eed4949952760723e16c8b5,

title = "Context-based Cyclist Path Prediction: Crafted and Learned Models for Intelligent Vehicles",

abstract = "This thesis addresses the problem of path prediction for cyclists.Instead of solely focusing on how to predict the future trajectory based on previous position measurements, this thesis investigates how to leverage additional contextual information that can inform on the future intent of cyclists.This thesis does this with the application of intelligent vehicles in mind. That means all measurements come from the point of view of a vehicle on the road.Additionally, the resulting predictions must be usable by a motion planner. In practice, this means the predictions are a probability distribution over the future position rather than a single point in space.This thesis starts with an investigation of one of the modules that allow path prediction in the first place: 3D object detection. Two existing state-of-the-art 3D object detectors that exploit Lidar data are evaluated beyond the standard metrics of 3D object detection.3D object detectors predict an oriented 3D bounding box. The standard metric determines a correct detection based on the accuracy of the position, extent, and orientation of the bounding box all at once.By loosening the requirements for when a detection is considered correct, the accuracy of the estimated position, extent, and orientation can be evaluated separately.The results show that a large number of detections are considered incorrect largely because of inaccurate bounding box extent rather than bounding box position, which is arguably a more important aspect for path prediction. As a result, the performance of these 3D object detectors when used for path prediction can be considered to be higher than what the common metrics suggest.After this, this thesis investigates how knowledge of the road topology can be used to improve the accuracy of cyclist path prediction. The trajectories of cyclists near an intersection are extracted from a naturalistic cyclist detection dataset.These are categorized and grouped based on the action taken by each cyclist (hard left/right, slight left/right, or straight). A Linear Dynamical System (LDS) is fitted on each group. These LDSs are used together to create a Mixture of Linear Dynamical Systems (MoLDS). During online inference, the relative probability of each underlying LDS allows the MoLDS to evaluate which direction the cyclist is most likely to take. This chapter demonstrates that the highest prediction accuracy is obtained when this model is additionally given prior knowledge on which directions are available for the cyclist to take.Next, context cues related to a specific scenario are considered.In the scenario, a cyclist in front of the ego-vehicle approaches an intersection and has the option to either continue straight or turn left. The three context cues considered are the distance of the cyclist to the intersection, whether the cyclist is raising their arm, and the criticality of the situation. This last context cue is based on the time it will take the ego-vehicle to overtake the cyclist: the lower this is, the more risk a left turn brings.This scenario is first modeled with a Switching Linear Dynamical System (SLDS) with two motion models that represent {"}cycling straight{"} and {"}turning left{"}, respectively. This model does not yet use any context cues.Still, the SLDS is shown to outperform a baseline model that represents the scenario with a single motion model.By letting the context cues inform the SLDS whether switching from one motion model to the other is likely to happen the performance is increased even further. The resulting model is referred to as a Dynamic Bayesian Network (DBN).The context-based path prediction methods described so far have been designed with specific motion models and interplay of context cues in mind: the overall state representation has been hand-crafted. The advantage of this approach is that the state representation is then interpretable, making it easy to understand why a model predicts what it does, even when it fails to predict something correctly.However, methods with a learned state representation often attain higher performances.The next point of investigation of this thesis is then to compare a model with a crafted state representation to a model with a learned one. Specifically, the DBN is compared to a Recurrent Neural Network (RNN), using the cyclist scenario from before.To level the playing field as much as possible two actions are taken. First, the contextual cues are supplied to the RNN as well, and experiments assert that the performance of the RNN does in fact improve when it incorporates these cues.Secondly, the optimization method used in the RNN is applied to the DBN as well, but in such a way that the interpretation of its crafted state representation remains the same.Of the two methods, the RNN attains the highest performance. Still, optimizing the DBN largely closes the performance gap between the two.Finally, this thesis determines whether the DBN is not only performant but also useful in practice: it is integrated in an intelligent vehicle. The cyclist scenario is performed live, in which the intelligent vehicle extracts the relevant context cues directly from sensor data. The resulting predictions are used to create an early warning system for the driver, to warn them if the cyclist intends to turn left.The model is also used for predictions in an autonomously driving intelligent vehicle, but due to safety reasons on a different scenario that contains comparable contextual cues.An automated dummy plays the role of a pedestrian on the sidewalk who walks towards the curbside in order to cross the road. The intelligent vehicle is driving on this road towards the pedestrian and has right of way.In this scenario, a pedestrian is only expected to cross the road if they are unaware of the approaching vehicle.Furthermore, if they will stop, they are expected to only stop at the curbside.The intelligent vehicle determines whether the pedestrian is aware of it by estimating the head orientation of the pedestrian. Additionally, it measures the distance between the pedestrian and the curbside, and predicts the future trajectory of the pedestrian accordingly.With the model in place, the vehicle can autonomously follow a planned trajectory and evade the pedestrian if the pedestrian does indeed cross the road.The real-world experiments confirm the feasibility of the system. By evaluating the entire pipeline at once, from detections to motion planning, this chapter is able to propose future work that bridges these various disciplines and shows what intelligent vehicles can already realistically achieve.",

keywords = "Context modeling, Predictive models, Intelligent Vehicles",

author = "E.A.I. Pool",

year = "2021",

doi = "10.4233/uuid:a5689f32-6eed-4949-9527-60723e16c8b5",

language = "English",

isbn = "978-94-6416-489-3",

type = "Dissertation (TU Delft)",

school = "Delft University of Technology",

}

TY - THES

T1 - Context-based Cyclist Path Prediction

T2 - Crafted and Learned Models for Intelligent Vehicles

AU - Pool, E.A.I.

PY - 2021

Y1 - 2021

N2 - This thesis addresses the problem of path prediction for cyclists.Instead of solely focusing on how to predict the future trajectory based on previous position measurements, this thesis investigates how to leverage additional contextual information that can inform on the future intent of cyclists.This thesis does this with the application of intelligent vehicles in mind. That means all measurements come from the point of view of a vehicle on the road.Additionally, the resulting predictions must be usable by a motion planner. In practice, this means the predictions are a probability distribution over the future position rather than a single point in space.This thesis starts with an investigation of one of the modules that allow path prediction in the first place: 3D object detection. Two existing state-of-the-art 3D object detectors that exploit Lidar data are evaluated beyond the standard metrics of 3D object detection.3D object detectors predict an oriented 3D bounding box. The standard metric determines a correct detection based on the accuracy of the position, extent, and orientation of the bounding box all at once.By loosening the requirements for when a detection is considered correct, the accuracy of the estimated position, extent, and orientation can be evaluated separately.The results show that a large number of detections are considered incorrect largely because of inaccurate bounding box extent rather than bounding box position, which is arguably a more important aspect for path prediction. As a result, the performance of these 3D object detectors when used for path prediction can be considered to be higher than what the common metrics suggest.After this, this thesis investigates how knowledge of the road topology can be used to improve the accuracy of cyclist path prediction. The trajectories of cyclists near an intersection are extracted from a naturalistic cyclist detection dataset.These are categorized and grouped based on the action taken by each cyclist (hard left/right, slight left/right, or straight). A Linear Dynamical System (LDS) is fitted on each group. These LDSs are used together to create a Mixture of Linear Dynamical Systems (MoLDS). During online inference, the relative probability of each underlying LDS allows the MoLDS to evaluate which direction the cyclist is most likely to take. This chapter demonstrates that the highest prediction accuracy is obtained when this model is additionally given prior knowledge on which directions are available for the cyclist to take.Next, context cues related to a specific scenario are considered.In the scenario, a cyclist in front of the ego-vehicle approaches an intersection and has the option to either continue straight or turn left. The three context cues considered are the distance of the cyclist to the intersection, whether the cyclist is raising their arm, and the criticality of the situation. This last context cue is based on the time it will take the ego-vehicle to overtake the cyclist: the lower this is, the more risk a left turn brings.This scenario is first modeled with a Switching Linear Dynamical System (SLDS) with two motion models that represent "cycling straight" and "turning left", respectively. This model does not yet use any context cues.Still, the SLDS is shown to outperform a baseline model that represents the scenario with a single motion model.By letting the context cues inform the SLDS whether switching from one motion model to the other is likely to happen the performance is increased even further. The resulting model is referred to as a Dynamic Bayesian Network (DBN).The context-based path prediction methods described so far have been designed with specific motion models and interplay of context cues in mind: the overall state representation has been hand-crafted. The advantage of this approach is that the state representation is then interpretable, making it easy to understand why a model predicts what it does, even when it fails to predict something correctly.However, methods with a learned state representation often attain higher performances.The next point of investigation of this thesis is then to compare a model with a crafted state representation to a model with a learned one. Specifically, the DBN is compared to a Recurrent Neural Network (RNN), using the cyclist scenario from before.To level the playing field as much as possible two actions are taken. First, the contextual cues are supplied to the RNN as well, and experiments assert that the performance of the RNN does in fact improve when it incorporates these cues.Secondly, the optimization method used in the RNN is applied to the DBN as well, but in such a way that the interpretation of its crafted state representation remains the same.Of the two methods, the RNN attains the highest performance. Still, optimizing the DBN largely closes the performance gap between the two.Finally, this thesis determines whether the DBN is not only performant but also useful in practice: it is integrated in an intelligent vehicle. The cyclist scenario is performed live, in which the intelligent vehicle extracts the relevant context cues directly from sensor data. The resulting predictions are used to create an early warning system for the driver, to warn them if the cyclist intends to turn left.The model is also used for predictions in an autonomously driving intelligent vehicle, but due to safety reasons on a different scenario that contains comparable contextual cues.An automated dummy plays the role of a pedestrian on the sidewalk who walks towards the curbside in order to cross the road. The intelligent vehicle is driving on this road towards the pedestrian and has right of way.In this scenario, a pedestrian is only expected to cross the road if they are unaware of the approaching vehicle.Furthermore, if they will stop, they are expected to only stop at the curbside.The intelligent vehicle determines whether the pedestrian is aware of it by estimating the head orientation of the pedestrian. Additionally, it measures the distance between the pedestrian and the curbside, and predicts the future trajectory of the pedestrian accordingly.With the model in place, the vehicle can autonomously follow a planned trajectory and evade the pedestrian if the pedestrian does indeed cross the road.The real-world experiments confirm the feasibility of the system. By evaluating the entire pipeline at once, from detections to motion planning, this chapter is able to propose future work that bridges these various disciplines and shows what intelligent vehicles can already realistically achieve.

AB - This thesis addresses the problem of path prediction for cyclists.Instead of solely focusing on how to predict the future trajectory based on previous position measurements, this thesis investigates how to leverage additional contextual information that can inform on the future intent of cyclists.This thesis does this with the application of intelligent vehicles in mind. That means all measurements come from the point of view of a vehicle on the road.Additionally, the resulting predictions must be usable by a motion planner. In practice, this means the predictions are a probability distribution over the future position rather than a single point in space.This thesis starts with an investigation of one of the modules that allow path prediction in the first place: 3D object detection. Two existing state-of-the-art 3D object detectors that exploit Lidar data are evaluated beyond the standard metrics of 3D object detection.3D object detectors predict an oriented 3D bounding box. The standard metric determines a correct detection based on the accuracy of the position, extent, and orientation of the bounding box all at once.By loosening the requirements for when a detection is considered correct, the accuracy of the estimated position, extent, and orientation can be evaluated separately.The results show that a large number of detections are considered incorrect largely because of inaccurate bounding box extent rather than bounding box position, which is arguably a more important aspect for path prediction. As a result, the performance of these 3D object detectors when used for path prediction can be considered to be higher than what the common metrics suggest.After this, this thesis investigates how knowledge of the road topology can be used to improve the accuracy of cyclist path prediction. The trajectories of cyclists near an intersection are extracted from a naturalistic cyclist detection dataset.These are categorized and grouped based on the action taken by each cyclist (hard left/right, slight left/right, or straight). A Linear Dynamical System (LDS) is fitted on each group. These LDSs are used together to create a Mixture of Linear Dynamical Systems (MoLDS). During online inference, the relative probability of each underlying LDS allows the MoLDS to evaluate which direction the cyclist is most likely to take. This chapter demonstrates that the highest prediction accuracy is obtained when this model is additionally given prior knowledge on which directions are available for the cyclist to take.Next, context cues related to a specific scenario are considered.In the scenario, a cyclist in front of the ego-vehicle approaches an intersection and has the option to either continue straight or turn left. The three context cues considered are the distance of the cyclist to the intersection, whether the cyclist is raising their arm, and the criticality of the situation. This last context cue is based on the time it will take the ego-vehicle to overtake the cyclist: the lower this is, the more risk a left turn brings.This scenario is first modeled with a Switching Linear Dynamical System (SLDS) with two motion models that represent "cycling straight" and "turning left", respectively. This model does not yet use any context cues.Still, the SLDS is shown to outperform a baseline model that represents the scenario with a single motion model.By letting the context cues inform the SLDS whether switching from one motion model to the other is likely to happen the performance is increased even further. The resulting model is referred to as a Dynamic Bayesian Network (DBN).The context-based path prediction methods described so far have been designed with specific motion models and interplay of context cues in mind: the overall state representation has been hand-crafted. The advantage of this approach is that the state representation is then interpretable, making it easy to understand why a model predicts what it does, even when it fails to predict something correctly.However, methods with a learned state representation often attain higher performances.The next point of investigation of this thesis is then to compare a model with a crafted state representation to a model with a learned one. Specifically, the DBN is compared to a Recurrent Neural Network (RNN), using the cyclist scenario from before.To level the playing field as much as possible two actions are taken. First, the contextual cues are supplied to the RNN as well, and experiments assert that the performance of the RNN does in fact improve when it incorporates these cues.Secondly, the optimization method used in the RNN is applied to the DBN as well, but in such a way that the interpretation of its crafted state representation remains the same.Of the two methods, the RNN attains the highest performance. Still, optimizing the DBN largely closes the performance gap between the two.Finally, this thesis determines whether the DBN is not only performant but also useful in practice: it is integrated in an intelligent vehicle. The cyclist scenario is performed live, in which the intelligent vehicle extracts the relevant context cues directly from sensor data. The resulting predictions are used to create an early warning system for the driver, to warn them if the cyclist intends to turn left.The model is also used for predictions in an autonomously driving intelligent vehicle, but due to safety reasons on a different scenario that contains comparable contextual cues.An automated dummy plays the role of a pedestrian on the sidewalk who walks towards the curbside in order to cross the road. The intelligent vehicle is driving on this road towards the pedestrian and has right of way.In this scenario, a pedestrian is only expected to cross the road if they are unaware of the approaching vehicle.Furthermore, if they will stop, they are expected to only stop at the curbside.The intelligent vehicle determines whether the pedestrian is aware of it by estimating the head orientation of the pedestrian. Additionally, it measures the distance between the pedestrian and the curbside, and predicts the future trajectory of the pedestrian accordingly.With the model in place, the vehicle can autonomously follow a planned trajectory and evade the pedestrian if the pedestrian does indeed cross the road.The real-world experiments confirm the feasibility of the system. By evaluating the entire pipeline at once, from detections to motion planning, this chapter is able to propose future work that bridges these various disciplines and shows what intelligent vehicles can already realistically achieve.

KW - Context modeling

KW - Predictive models

KW - Intelligent Vehicles

U2 - 10.4233/uuid:a5689f32-6eed-4949-9527-60723e16c8b5

DO - 10.4233/uuid:a5689f32-6eed-4949-9527-60723e16c8b5

M3 - Dissertation (TU Delft)

SN - 978-94-6416-489-3

ER -

Context-based Cyclist Path Prediction: Crafted and Learned Models for Intelligent Vehicles

Abstract

Keywords

Access to Document

Fingerprint

Cite this