Feature extraction and clustering analysis of highway congestion

Tin T. Nguyen; Panchamy Krishnakumari; Simeon C. Calvert; Hai L. Vu; Hans van Lint

doi:10.1016/j.trc.2019.01.017

Feature extraction and clustering analysis of highway congestion

Tin T. Nguyen^*, Panchamy Krishnakumari, Simeon C. Calvert, Hai L. Vu, Hans van Lint

^*Corresponding author for this work

Transport and Planning

Research output: Contribution to journal › Article › Scientific › peer-review

63 Citations (Scopus)

233 Downloads (Pure)

Abstract

Classification of congestion patterns is important in many areas in traffic planning and management, ranging from policy appraisal, database design, to prediction and real-time control. One of the key constraints in applying machine learning techniques for classification is the availability of sufficient data (traffic patterns) with clear and undisputed labels, e.g. traffic pattern X or Y. The challenge is that labelling traffic patterns (e.g. combinations of congested and freely flow areas over time and space) is highly subjective. In our view this means that assessment of how well algorithms label the data should also include a qualitative component that focuses on what the found patterns really mean for traffic flow operations and applications. In this study, we investigate the application of clustering analysis to obtain labels automatically from the data, where we indeed first qualitatively assess how meaningful the found labels are, and subsequently test quantitatively how well the labels separate the resulting feature space. By transforming traffic measurements (speeds) into (colored) images, two different approaches are proposed to extract the features of a large number of traffic patterns for clustering: point-based and area-based. The point-based approach is widely applied in the image processing literature, and explores local interest points in images (i.e. where large changes occur in color intensity); whereas a new area-based approach combines domain knowledge with Watershed segmentation to partition the images into different spatial-temporal segments from which domain specific features, such as wide moving jam patterns, are extracted. The results show that the Watershed segmentation separates the traffic (congestion) patterns into more meaningful and separable classes, comparable to those that have been proposed in the literature. Since there is no ground-truth set of labels, the quantitative assessment tests how well both methods are able to separate the respective feature spaces they construct for the (large) database of traffic patterns. We argue that the more crisp this separation is; the better the labelling has turned out. For this quantitative comparison we train a multinomial classifier that maps unseen patterns to the labels discovered by each of the two labeling approaches. The most important result is that the classifier using the area-based feature vector achieves the highest average levels of confidence in its decisions to classify patterns, implying a highly separable feature vector space. We argue this is good news! Not only does the combination of image processing (Watershed) and domain knowledge (traffic flow characteristics) lead to meaningful labels that can be automatically retrieved from large databases of data; this method also leads to a more efficient separation of the resulting feature space. Our next endeavor is to further refine and use this method to develop a search engine for the (rapidly growing) 200 TB historical database of traffic data hosted by the Dutch National Datawarehouse (NDW).

Original language	English
Pages (from-to)	238-258
Number of pages	21
Journal	Transportation Research Part C: Emerging Technologies
Volume	100
DOIs	https://doi.org/10.1016/j.trc.2019.01.017
Publication status	Published - 1 Mar 2019

Keywords

Clustering analysis
Congestion classification
Image segmentation
Traffic congestion
Watershed

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1016/j.trc.2019.01.017

1-s2.0-S0968090X1830891X-mainFinal published version, 8.81 MBLicence: CC BY-NC-ND

Cite this

@article{cfc0b0e5a7ce4153ac9e875d80709c3c,

title = "Feature extraction and clustering analysis of highway congestion",

abstract = "Classification of congestion patterns is important in many areas in traffic planning and management, ranging from policy appraisal, database design, to prediction and real-time control. One of the key constraints in applying machine learning techniques for classification is the availability of sufficient data (traffic patterns) with clear and undisputed labels, e.g. traffic pattern X or Y. The challenge is that labelling traffic patterns (e.g. combinations of congested and freely flow areas over time and space) is highly subjective. In our view this means that assessment of how well algorithms label the data should also include a qualitative component that focuses on what the found patterns really mean for traffic flow operations and applications. In this study, we investigate the application of clustering analysis to obtain labels automatically from the data, where we indeed first qualitatively assess how meaningful the found labels are, and subsequently test quantitatively how well the labels separate the resulting feature space. By transforming traffic measurements (speeds) into (colored) images, two different approaches are proposed to extract the features of a large number of traffic patterns for clustering: point-based and area-based. The point-based approach is widely applied in the image processing literature, and explores local interest points in images (i.e. where large changes occur in color intensity); whereas a new area-based approach combines domain knowledge with Watershed segmentation to partition the images into different spatial-temporal segments from which domain specific features, such as wide moving jam patterns, are extracted. The results show that the Watershed segmentation separates the traffic (congestion) patterns into more meaningful and separable classes, comparable to those that have been proposed in the literature. Since there is no ground-truth set of labels, the quantitative assessment tests how well both methods are able to separate the respective feature spaces they construct for the (large) database of traffic patterns. We argue that the more crisp this separation is; the better the labelling has turned out. For this quantitative comparison we train a multinomial classifier that maps unseen patterns to the labels discovered by each of the two labeling approaches. The most important result is that the classifier using the area-based feature vector achieves the highest average levels of confidence in its decisions to classify patterns, implying a highly separable feature vector space. We argue this is good news! Not only does the combination of image processing (Watershed) and domain knowledge (traffic flow characteristics) lead to meaningful labels that can be automatically retrieved from large databases of data; this method also leads to a more efficient separation of the resulting feature space. Our next endeavor is to further refine and use this method to develop a search engine for the (rapidly growing) 200 TB historical database of traffic data hosted by the Dutch National Datawarehouse (NDW).",

keywords = "Clustering analysis, Congestion classification, Image segmentation, Traffic congestion, Watershed",

author = "Nguyen, {Tin T.} and Panchamy Krishnakumari and Calvert, {Simeon C.} and Vu, {Hai L.} and {van Lint}, Hans",

year = "2019",

month = mar,

day = "1",

doi = "10.1016/j.trc.2019.01.017",

language = "English",

volume = "100",

pages = "238--258",

journal = "Transportation Research Part C: Emerging Technologies",

issn = "0968-090X",

publisher = "Elsevier",

}

TY - JOUR

T1 - Feature extraction and clustering analysis of highway congestion

AU - Nguyen, Tin T.

AU - Krishnakumari, Panchamy

AU - Calvert, Simeon C.

AU - Vu, Hai L.

AU - van Lint, Hans

PY - 2019/3/1

Y1 - 2019/3/1

N2 - Classification of congestion patterns is important in many areas in traffic planning and management, ranging from policy appraisal, database design, to prediction and real-time control. One of the key constraints in applying machine learning techniques for classification is the availability of sufficient data (traffic patterns) with clear and undisputed labels, e.g. traffic pattern X or Y. The challenge is that labelling traffic patterns (e.g. combinations of congested and freely flow areas over time and space) is highly subjective. In our view this means that assessment of how well algorithms label the data should also include a qualitative component that focuses on what the found patterns really mean for traffic flow operations and applications. In this study, we investigate the application of clustering analysis to obtain labels automatically from the data, where we indeed first qualitatively assess how meaningful the found labels are, and subsequently test quantitatively how well the labels separate the resulting feature space. By transforming traffic measurements (speeds) into (colored) images, two different approaches are proposed to extract the features of a large number of traffic patterns for clustering: point-based and area-based. The point-based approach is widely applied in the image processing literature, and explores local interest points in images (i.e. where large changes occur in color intensity); whereas a new area-based approach combines domain knowledge with Watershed segmentation to partition the images into different spatial-temporal segments from which domain specific features, such as wide moving jam patterns, are extracted. The results show that the Watershed segmentation separates the traffic (congestion) patterns into more meaningful and separable classes, comparable to those that have been proposed in the literature. Since there is no ground-truth set of labels, the quantitative assessment tests how well both methods are able to separate the respective feature spaces they construct for the (large) database of traffic patterns. We argue that the more crisp this separation is; the better the labelling has turned out. For this quantitative comparison we train a multinomial classifier that maps unseen patterns to the labels discovered by each of the two labeling approaches. The most important result is that the classifier using the area-based feature vector achieves the highest average levels of confidence in its decisions to classify patterns, implying a highly separable feature vector space. We argue this is good news! Not only does the combination of image processing (Watershed) and domain knowledge (traffic flow characteristics) lead to meaningful labels that can be automatically retrieved from large databases of data; this method also leads to a more efficient separation of the resulting feature space. Our next endeavor is to further refine and use this method to develop a search engine for the (rapidly growing) 200 TB historical database of traffic data hosted by the Dutch National Datawarehouse (NDW).

AB - Classification of congestion patterns is important in many areas in traffic planning and management, ranging from policy appraisal, database design, to prediction and real-time control. One of the key constraints in applying machine learning techniques for classification is the availability of sufficient data (traffic patterns) with clear and undisputed labels, e.g. traffic pattern X or Y. The challenge is that labelling traffic patterns (e.g. combinations of congested and freely flow areas over time and space) is highly subjective. In our view this means that assessment of how well algorithms label the data should also include a qualitative component that focuses on what the found patterns really mean for traffic flow operations and applications. In this study, we investigate the application of clustering analysis to obtain labels automatically from the data, where we indeed first qualitatively assess how meaningful the found labels are, and subsequently test quantitatively how well the labels separate the resulting feature space. By transforming traffic measurements (speeds) into (colored) images, two different approaches are proposed to extract the features of a large number of traffic patterns for clustering: point-based and area-based. The point-based approach is widely applied in the image processing literature, and explores local interest points in images (i.e. where large changes occur in color intensity); whereas a new area-based approach combines domain knowledge with Watershed segmentation to partition the images into different spatial-temporal segments from which domain specific features, such as wide moving jam patterns, are extracted. The results show that the Watershed segmentation separates the traffic (congestion) patterns into more meaningful and separable classes, comparable to those that have been proposed in the literature. Since there is no ground-truth set of labels, the quantitative assessment tests how well both methods are able to separate the respective feature spaces they construct for the (large) database of traffic patterns. We argue that the more crisp this separation is; the better the labelling has turned out. For this quantitative comparison we train a multinomial classifier that maps unseen patterns to the labels discovered by each of the two labeling approaches. The most important result is that the classifier using the area-based feature vector achieves the highest average levels of confidence in its decisions to classify patterns, implying a highly separable feature vector space. We argue this is good news! Not only does the combination of image processing (Watershed) and domain knowledge (traffic flow characteristics) lead to meaningful labels that can be automatically retrieved from large databases of data; this method also leads to a more efficient separation of the resulting feature space. Our next endeavor is to further refine and use this method to develop a search engine for the (rapidly growing) 200 TB historical database of traffic data hosted by the Dutch National Datawarehouse (NDW).

KW - Clustering analysis

KW - Congestion classification

KW - Image segmentation

KW - Traffic congestion

KW - Watershed

UR - http://www.scopus.com/inward/record.url?scp=85060856987&partnerID=8YFLogxK

U2 - 10.1016/j.trc.2019.01.017

DO - 10.1016/j.trc.2019.01.017

M3 - Article

AN - SCOPUS:85060856987

SN - 0968-090X

VL - 100

SP - 238

EP - 258

JO - Transportation Research Part C: Emerging Technologies

JF - Transportation Research Part C: Emerging Technologies

ER -

Feature extraction and clustering analysis of highway congestion

Abstract

Keywords

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this