Information theoretic-based sampling of observations

Sander van Cranenburgh; Michiel C.J. Bliemer

doi:10.1016/j.jocm.2018.02.003

Information theoretic-based sampling of observations

Sander van Cranenburgh^*, Michiel C.J. Bliemer

^*Corresponding author for this work

Transport and Logistics

Research output: Contribution to journal › Article › Scientific › peer-review

5 Citations (Scopus)

16 Downloads (Pure)

Abstract

Due to the surge in the amount of data that are being collected, analysts are increasingly faced with very large data sets. Estimation of sophisticated discrete choice models (such as Mixed Logit models) based on these typically large data sets can be computationally burdensome, or even infeasible. Hitherto, analysts tried to overcome these computational burdens by reverting to less computationally demanding choice models or by taking advantage of the increase in computational resources. In this paper we take a different approach: we develop a new method called Sampling of Observations (SoO) which scales down the size of the choice data set, prior to the estimation. More specifically, based on information-theoretic principles this method extracts a subset of observations from the data which is much smaller in volume than the original data set, yet produces statistically nearly identical results. We show that this method can be used to estimate sophisticated discrete choice models based on data sets that were originally too large to conduct sophisticated choice analysis.

Original language	English
Number of pages	34
Journal	Journal of Choice Modelling
DOIs	https://doi.org/10.1016/j.jocm.2018.02.003
Publication status	Published - 2018

Bibliographical note

Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Access to Document

10.1016/j.jocm.2018.02.003

1-s2.0-S1755534517301124-mainFinal published version, 1.42 MB

Cite this

@article{ea90ebb2a5d44a1eacd0e7550144c6a4,

title = "Information theoretic-based sampling of observations",

abstract = "Due to the surge in the amount of data that are being collected, analysts are increasingly faced with very large data sets. Estimation of sophisticated discrete choice models (such as Mixed Logit models) based on these typically large data sets can be computationally burdensome, or even infeasible. Hitherto, analysts tried to overcome these computational burdens by reverting to less computationally demanding choice models or by taking advantage of the increase in computational resources. In this paper we take a different approach: we develop a new method called Sampling of Observations (SoO) which scales down the size of the choice data set, prior to the estimation. More specifically, based on information-theoretic principles this method extracts a subset of observations from the data which is much smaller in volume than the original data set, yet produces statistically nearly identical results. We show that this method can be used to estimate sophisticated discrete choice models based on data sets that were originally too large to conduct sophisticated choice analysis.",

author = "{van Cranenburgh}, Sander and Bliemer, {Michiel C.J.}",

note = "Green Open Access added to TU Delft Institutional Repository {\textquoteleft}You share, we take care!{\textquoteright} – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.",

year = "2018",

doi = "10.1016/j.jocm.2018.02.003",

language = "English",

journal = "Journal of Choice Modelling",

issn = "1755-5345",

publisher = "Elsevier",

}

TY - JOUR

T1 - Information theoretic-based sampling of observations

AU - van Cranenburgh, Sander

AU - Bliemer, Michiel C.J.

N1 - Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

PY - 2018

Y1 - 2018

N2 - Due to the surge in the amount of data that are being collected, analysts are increasingly faced with very large data sets. Estimation of sophisticated discrete choice models (such as Mixed Logit models) based on these typically large data sets can be computationally burdensome, or even infeasible. Hitherto, analysts tried to overcome these computational burdens by reverting to less computationally demanding choice models or by taking advantage of the increase in computational resources. In this paper we take a different approach: we develop a new method called Sampling of Observations (SoO) which scales down the size of the choice data set, prior to the estimation. More specifically, based on information-theoretic principles this method extracts a subset of observations from the data which is much smaller in volume than the original data set, yet produces statistically nearly identical results. We show that this method can be used to estimate sophisticated discrete choice models based on data sets that were originally too large to conduct sophisticated choice analysis.

AB - Due to the surge in the amount of data that are being collected, analysts are increasingly faced with very large data sets. Estimation of sophisticated discrete choice models (such as Mixed Logit models) based on these typically large data sets can be computationally burdensome, or even infeasible. Hitherto, analysts tried to overcome these computational burdens by reverting to less computationally demanding choice models or by taking advantage of the increase in computational resources. In this paper we take a different approach: we develop a new method called Sampling of Observations (SoO) which scales down the size of the choice data set, prior to the estimation. More specifically, based on information-theoretic principles this method extracts a subset of observations from the data which is much smaller in volume than the original data set, yet produces statistically nearly identical results. We show that this method can be used to estimate sophisticated discrete choice models based on data sets that were originally too large to conduct sophisticated choice analysis.

UR - http://resolver.tudelft.nl/uuid:ea90ebb2-a5d4-4a1e-acd0-e7550144c6a4

UR - http://www.scopus.com/inward/record.url?scp=85044953042&partnerID=8YFLogxK

U2 - 10.1016/j.jocm.2018.02.003

DO - 10.1016/j.jocm.2018.02.003

M3 - Article

AN - SCOPUS:85044953042

SN - 1755-5345

JO - Journal of Choice Modelling

JF - Journal of Choice Modelling

ER -

Information theoretic-based sampling of observations

Abstract

Bibliographical note

Access to Document

Other files and links

Fingerprint

Cite this