Artificial Neural Networks as a means to accommodate decision rules in choice models

Ahmad Alwosheel; Caspar Chorus; Sander van Cranenburgh

Abstract

In the past few decades, Artificial Neural Networks (ANNs) have been used to identify and model choice behavior in wide variety of fields (e.g., Bishop, 1995). To give some examples from the field of travel behavior, ANNs have been applied to model commuter mode choice and car ownership (e.g., Hensher and Ton, 2000; Mohammadian & Miller, 2002). ANNs aim to efficiently recognize patterns in the data, without being explicitly programmed where to look. A key feature of ANNs lies in their capability to approximate any Data Generating Process (DGP), provided that sufficient processing units are available; this feature is known as the Universal Approximation Theorem (Hornik et al., 1989). However, despite the strong pragmatic appeal of ANNs, they have been criticized for being too much ‘data driven’ and ‘theory poor’, in effect presenting the analyst with a black box-model of the DGP. This limitation has hampered their use by discrete choice modelers and travel behavior researchers. In several ways, Discrete Choice Theory (DCT) – which is the dominant approach in the travel behavior research community to model choice behavior – is the mirror image of ANN. In contrast to ANN, DCT presupposes a particular decision rule (DGP) and estimates a model based on that rule on choice data. In addition to the classical linear-in-parameters utility maximization rule, several alternative decision rules have been proposed more recently; see Leong & Hensher (2012) and Chorus (2014) for overviews. Clear advantages of the DCT approach are that it allows for the extraction of deep behavioral insights from choice data (McFadden, 2001) and rigorous conclusions concerning welfare effects of policies (Small & Rosen, 1981). However, despite recent work which allows for a more flexible treatment of decision rules in discrete choice models (e.g., Hess et al., 2012; Van Cranenburgh et al., 2015), DCT can still be considered a relatively rigid approach to model choice data, compared to ANN. Our paper sets out to explore in more depth the advantages and disadvantages (relative to DCT) of using ANN as a framework to analyze choice data. We focus in particular on ANN’s ability to learn which decision rule best represents the DGP in a discrete choice context. To this end, we perform three types of analyses: We analytically explore to what extent ANN’s Universal Approximation Theorem applies to a discrete choice context, with particular emphasis on the role of the size of the training data set and the number of nodes of the ANN (Vapnik & Chervonenkis, 2015). Using synthetic datasets, we explore to what extent ANNs are able to explain choice behavior when the decision rule underlying the DGP is known to the analyst. We focus on standard Random Utility Maximization and Random Regret Minimization (Chorus, 2010) rules. Particular attention is paid to the role of error term variance and the size of the training data set. Using real datasets, we explore to what extent ANNs are able to explain choice behavior when the decision rule underlying the DGP is unknown to the analyst (as is usually the case). A range of comparisons is made between different types of ANNs and DCT-based choice models. With reference to the above mentioned types of analyses, our preliminary results can be summarized as follows: The Universal Approximation Theorem applies to a discrete choice context. Out-of-sample analysis shows that the performance of the ANN improves when the size of the ANN training data set increases, and approaches the performance of a discrete choice model whose decision rule matches the decision rule of the DGP when the data training data set is sufficiently large, see the Figure 1 & 2 at supplementary files. More specifically, the fitted power functions (of the form y = a.x^b + c ) reveal their asymptotic behavior, which is in line with theoretical expectations. Furthermore, the parameters of the fitted power functions show that ANNs converge faster in case the DGP is RUM, as compared to when the DGP is RRM. These conclusions pave the way for a more informed debate about the potential and limitations of using ANNs to explain and predict choice data; they also provide clues as to how ANN and DCT may be combined to capitalize on their respective strengths.

Original language	English
Number of pages	1
Publication status	Published - 2017
Event	International Choice Modelling Conference 2017 - Vineyard Hotel, Cape Town, South Africa Duration: 3 Apr 2017 → 5 Apr 2017 http://www.icmconference.org.uk/index.php/icmc/ICMC2017

Conference

Conference	International Choice Modelling Conference 2017
Abbreviated title	ICMC 2017
Country/Territory	South Africa
City	Cape Town
Period	3/04/17 → 5/04/17
Internet address	http://www.icmconference.org.uk/index.php/icmc/ICMC2017

Cite this

@conference{85c263d6f63344d3b3c8365bc387673a,

title = "Artificial Neural Networks as a means to accommodate decision rules in choice models",

abstract = "In the past few decades, Artificial Neural Networks (ANNs) have been used to identify and model choice behavior in wide variety of fields (e.g., Bishop, 1995). To give some examples from the field of travel behavior, ANNs have been applied to model commuter mode choice and car ownership (e.g., Hensher and Ton, 2000; Mohammadian & Miller, 2002). ANNs aim to efficiently recognize patterns in the data, without being explicitly programmed where to look. A key feature of ANNs lies in their capability to approximate any Data Generating Process (DGP), provided that sufficient processing units are available; this feature is known as the Universal Approximation Theorem (Hornik et al., 1989). However, despite the strong pragmatic appeal of ANNs, they have been criticized for being too much {\textquoteleft}data driven{\textquoteright} and {\textquoteleft}theory poor{\textquoteright}, in effect presenting the analyst with a black box-model of the DGP. This limitation has hampered their use by discrete choice modelers and travel behavior researchers. In several ways, Discrete Choice Theory (DCT) – which is the dominant approach in the travel behavior research community to model choice behavior – is the mirror image of ANN. In contrast to ANN, DCT presupposes a particular decision rule (DGP) and estimates a model based on that rule on choice data. In addition to the classical linear-in-parameters utility maximization rule, several alternative decision rules have been proposed more recently; see Leong & Hensher (2012) and Chorus (2014) for overviews. Clear advantages of the DCT approach are that it allows for the extraction of deep behavioral insights from choice data (McFadden, 2001) and rigorous conclusions concerning welfare effects of policies (Small & Rosen, 1981). However, despite recent work which allows for a more flexible treatment of decision rules in discrete choice models (e.g., Hess et al., 2012; Van Cranenburgh et al., 2015), DCT can still be considered a relatively rigid approach to model choice data, compared to ANN. Our paper sets out to explore in more depth the advantages and disadvantages (relative to DCT) of using ANN as a framework to analyze choice data. We focus in particular on ANN{\textquoteright}s ability to learn which decision rule best represents the DGP in a discrete choice context. To this end, we perform three types of analyses: We analytically explore to what extent ANN{\textquoteright}s Universal Approximation Theorem applies to a discrete choice context, with particular emphasis on the role of the size of the training data set and the number of nodes of the ANN (Vapnik & Chervonenkis, 2015). Using synthetic datasets, we explore to what extent ANNs are able to explain choice behavior when the decision rule underlying the DGP is known to the analyst. We focus on standard Random Utility Maximization and Random Regret Minimization (Chorus, 2010) rules. Particular attention is paid to the role of error term variance and the size of the training data set. Using real datasets, we explore to what extent ANNs are able to explain choice behavior when the decision rule underlying the DGP is unknown to the analyst (as is usually the case). A range of comparisons is made between different types of ANNs and DCT-based choice models. With reference to the above mentioned types of analyses, our preliminary results can be summarized as follows: The Universal Approximation Theorem applies to a discrete choice context. Out-of-sample analysis shows that the performance of the ANN improves when the size of the ANN training data set increases, and approaches the performance of a discrete choice model whose decision rule matches the decision rule of the DGP when the data training data set is sufficiently large, see the Figure 1 & 2 at supplementary files. More specifically, the fitted power functions (of the form y = a.x^b + c ) reveal their asymptotic behavior, which is in line with theoretical expectations. Furthermore, the parameters of the fitted power functions show that ANNs converge faster in case the DGP is RUM, as compared to when the DGP is RRM. These conclusions pave the way for a more informed debate about the potential and limitations of using ANNs to explain and predict choice data; they also provide clues as to how ANN and DCT may be combined to capitalize on their respective strengths.",

author = "Ahmad Alwosheel and Caspar Chorus and {van Cranenburgh}, Sander",

year = "2017",

language = "English",

note = "International Choice Modelling Conference 2017, ICMC 2017 ; Conference date: 03-04-2017 Through 05-04-2017",

url = "http://www.icmconference.org.uk/index.php/icmc/ICMC2017",

}

TY - CONF

T1 - Artificial Neural Networks as a means to accommodate decision rules in choice models

AU - Alwosheel, Ahmad

AU - Chorus, Caspar

AU - van Cranenburgh, Sander

PY - 2017

Y1 - 2017

N2 - In the past few decades, Artificial Neural Networks (ANNs) have been used to identify and model choice behavior in wide variety of fields (e.g., Bishop, 1995). To give some examples from the field of travel behavior, ANNs have been applied to model commuter mode choice and car ownership (e.g., Hensher and Ton, 2000; Mohammadian & Miller, 2002). ANNs aim to efficiently recognize patterns in the data, without being explicitly programmed where to look. A key feature of ANNs lies in their capability to approximate any Data Generating Process (DGP), provided that sufficient processing units are available; this feature is known as the Universal Approximation Theorem (Hornik et al., 1989). However, despite the strong pragmatic appeal of ANNs, they have been criticized for being too much ‘data driven’ and ‘theory poor’, in effect presenting the analyst with a black box-model of the DGP. This limitation has hampered their use by discrete choice modelers and travel behavior researchers. In several ways, Discrete Choice Theory (DCT) – which is the dominant approach in the travel behavior research community to model choice behavior – is the mirror image of ANN. In contrast to ANN, DCT presupposes a particular decision rule (DGP) and estimates a model based on that rule on choice data. In addition to the classical linear-in-parameters utility maximization rule, several alternative decision rules have been proposed more recently; see Leong & Hensher (2012) and Chorus (2014) for overviews. Clear advantages of the DCT approach are that it allows for the extraction of deep behavioral insights from choice data (McFadden, 2001) and rigorous conclusions concerning welfare effects of policies (Small & Rosen, 1981). However, despite recent work which allows for a more flexible treatment of decision rules in discrete choice models (e.g., Hess et al., 2012; Van Cranenburgh et al., 2015), DCT can still be considered a relatively rigid approach to model choice data, compared to ANN. Our paper sets out to explore in more depth the advantages and disadvantages (relative to DCT) of using ANN as a framework to analyze choice data. We focus in particular on ANN’s ability to learn which decision rule best represents the DGP in a discrete choice context. To this end, we perform three types of analyses: We analytically explore to what extent ANN’s Universal Approximation Theorem applies to a discrete choice context, with particular emphasis on the role of the size of the training data set and the number of nodes of the ANN (Vapnik & Chervonenkis, 2015). Using synthetic datasets, we explore to what extent ANNs are able to explain choice behavior when the decision rule underlying the DGP is known to the analyst. We focus on standard Random Utility Maximization and Random Regret Minimization (Chorus, 2010) rules. Particular attention is paid to the role of error term variance and the size of the training data set. Using real datasets, we explore to what extent ANNs are able to explain choice behavior when the decision rule underlying the DGP is unknown to the analyst (as is usually the case). A range of comparisons is made between different types of ANNs and DCT-based choice models. With reference to the above mentioned types of analyses, our preliminary results can be summarized as follows: The Universal Approximation Theorem applies to a discrete choice context. Out-of-sample analysis shows that the performance of the ANN improves when the size of the ANN training data set increases, and approaches the performance of a discrete choice model whose decision rule matches the decision rule of the DGP when the data training data set is sufficiently large, see the Figure 1 & 2 at supplementary files. More specifically, the fitted power functions (of the form y = a.x^b + c ) reveal their asymptotic behavior, which is in line with theoretical expectations. Furthermore, the parameters of the fitted power functions show that ANNs converge faster in case the DGP is RUM, as compared to when the DGP is RRM. These conclusions pave the way for a more informed debate about the potential and limitations of using ANNs to explain and predict choice data; they also provide clues as to how ANN and DCT may be combined to capitalize on their respective strengths.

AB - In the past few decades, Artificial Neural Networks (ANNs) have been used to identify and model choice behavior in wide variety of fields (e.g., Bishop, 1995). To give some examples from the field of travel behavior, ANNs have been applied to model commuter mode choice and car ownership (e.g., Hensher and Ton, 2000; Mohammadian & Miller, 2002). ANNs aim to efficiently recognize patterns in the data, without being explicitly programmed where to look. A key feature of ANNs lies in their capability to approximate any Data Generating Process (DGP), provided that sufficient processing units are available; this feature is known as the Universal Approximation Theorem (Hornik et al., 1989). However, despite the strong pragmatic appeal of ANNs, they have been criticized for being too much ‘data driven’ and ‘theory poor’, in effect presenting the analyst with a black box-model of the DGP. This limitation has hampered their use by discrete choice modelers and travel behavior researchers. In several ways, Discrete Choice Theory (DCT) – which is the dominant approach in the travel behavior research community to model choice behavior – is the mirror image of ANN. In contrast to ANN, DCT presupposes a particular decision rule (DGP) and estimates a model based on that rule on choice data. In addition to the classical linear-in-parameters utility maximization rule, several alternative decision rules have been proposed more recently; see Leong & Hensher (2012) and Chorus (2014) for overviews. Clear advantages of the DCT approach are that it allows for the extraction of deep behavioral insights from choice data (McFadden, 2001) and rigorous conclusions concerning welfare effects of policies (Small & Rosen, 1981). However, despite recent work which allows for a more flexible treatment of decision rules in discrete choice models (e.g., Hess et al., 2012; Van Cranenburgh et al., 2015), DCT can still be considered a relatively rigid approach to model choice data, compared to ANN. Our paper sets out to explore in more depth the advantages and disadvantages (relative to DCT) of using ANN as a framework to analyze choice data. We focus in particular on ANN’s ability to learn which decision rule best represents the DGP in a discrete choice context. To this end, we perform three types of analyses: We analytically explore to what extent ANN’s Universal Approximation Theorem applies to a discrete choice context, with particular emphasis on the role of the size of the training data set and the number of nodes of the ANN (Vapnik & Chervonenkis, 2015). Using synthetic datasets, we explore to what extent ANNs are able to explain choice behavior when the decision rule underlying the DGP is known to the analyst. We focus on standard Random Utility Maximization and Random Regret Minimization (Chorus, 2010) rules. Particular attention is paid to the role of error term variance and the size of the training data set. Using real datasets, we explore to what extent ANNs are able to explain choice behavior when the decision rule underlying the DGP is unknown to the analyst (as is usually the case). A range of comparisons is made between different types of ANNs and DCT-based choice models. With reference to the above mentioned types of analyses, our preliminary results can be summarized as follows: The Universal Approximation Theorem applies to a discrete choice context. Out-of-sample analysis shows that the performance of the ANN improves when the size of the ANN training data set increases, and approaches the performance of a discrete choice model whose decision rule matches the decision rule of the DGP when the data training data set is sufficiently large, see the Figure 1 & 2 at supplementary files. More specifically, the fitted power functions (of the form y = a.x^b + c ) reveal their asymptotic behavior, which is in line with theoretical expectations. Furthermore, the parameters of the fitted power functions show that ANNs converge faster in case the DGP is RUM, as compared to when the DGP is RRM. These conclusions pave the way for a more informed debate about the potential and limitations of using ANNs to explain and predict choice data; they also provide clues as to how ANN and DCT may be combined to capitalize on their respective strengths.

UR - http://www.icmconference.org.uk/index.php/icmc/ICMC2017/paper/view/1150

M3 - Abstract

T2 - International Choice Modelling Conference 2017

Y2 - 3 April 2017 through 5 April 2017

ER -

Artificial Neural Networks as a means to accommodate decision rules in choice models

Abstract

Conference

Other files and links

Fingerprint

Cite this