## Abstract

In the safety literature, motor vehicle crashes are modelled predominately using single equation regression models, albeit with a variety of distributional assumptions and econometric enhancements. These models rely on a single linear additive predictive equation—which becomes multiplicative with a log transform—to specify the expected mean crash count conditioned on predictors. The models also specify the distribution of observations around the conditional mean, with common examples including the Poisson, Negative Binomial, and Conway-Maxwell distribution among others. This mainstream probabilistic conceptualization (i.e. model) of motor vehicle crash causation assumes that crashes are well-approximated by a single source of risk, wherein several contributing factors exert their collective, non-independent influences on the occurrence of crashes via a linear predictor. This study first postulates, and then demonstrates empirically, that crash occurrence may be more complex than can be adequately captured by a single equation regression model. The total crash count recorded at a transport network location (e.g. road segment) may arise from multiple simultaneous and inter-dependent sources of risk, rather than one. Each of these sources may uniquely contribute to the total observed crash count. For instance, a site's crash occurrence may be dominated by contributions from driver behaviour issues (e.g. speeding, impaired driving), while another site's crashes might arise predominately from design and operational deficiencies such as deteriorating pavements and worn lane markings. Stated succinctly, this research hypothesises that the unobserved heterogeneity in the accumulation of motor vehicle crashes at transport network locations arises because multiple sources of risk, not one, better captures complexity in the crash occurrence process. A stochastic multiple risk source methodological approach is developed to correspond with and empirically test this hypothesis. A joint econometric model with random parameters and instrumental variables demonstrates the applicability of the proposed theory and the corresponding methodological approach. The proposed model assumes that complexity of crash occurrence is well approximated using three sources of risk comprised of engineering, unobserved spatial, and driver behavioural factors. It is empirically tested using crash data from state controlled roads in Queensland, Australia. Finally, the multiple risk source model is compared to the traditional single risk source model to assess the viability of the proposed approach based on the sample data. The multiple risk source model significantly outperformed the single risk source model in terms of prediction ability and goodness of fit measures. In addition, while the single risk source model predicts total crash counts for individual sites, the multiple risk source model predicts crash count proportions contributed by each source of risk, and predicts crashes by risk source. The improvement in fit combined with the theoretical appeal of a multiple risk source model to explain unobserved heterogeneity in crashes suggests—at least for the sample used in the study—that the complexity in crash occurrence is better explained using multiple equation linear predictors. Further research should examine other datasets for repeatability and should further explore and test risk sources.

Original language | English |
---|---|

Pages (from-to) | 1-14 |

Number of pages | 14 |

Journal | Analytic Methods in Accident Research |

Volume | 18 |

DOIs | |

Publication status | Published - Jun 2018 |

Externally published | Yes |

## Keywords

- Crash causation mechanism
- Data generating process
- Instrumental variable
- Joint model
- Random parameters model
- Structural equation model