Points of interest (POIs) digitally represent real-world amenities as point locations. POI categories (e.g. restaurant, hotel, museum etc.) play a prominent role in several location-based applications such as social media, navigation, recommender systems, geographic information retrieval tools, and travel-related services. The majority of user queries in these applications center around POI categories. For instance, people often search for the closest pub or the best value-for-money hotel in an area. To provide valid answers to such queries, accurate and consistent information on POI categories is an essential requirement. Nevertheless, category-based annotations of POIs are often missing. The task of annotating unlabeled POIs in terms of their categories — known as POI classification — is commonly achieved by means of machine learning (ML) models, often referred to as classifiers. Central to this task is the extraction of known features from pre-labeled POIs in order to train the classifiers and, then, use the trained models to categorize unlabeled POIs. However, the set of features used in this process can heavily influence the classification results. Research on defining the influence of different features on the categorization of POIs is currently lacking. This paper contributes a study of feature importance for the classification of unlabeled POIs into categories. We define five feature sets that address operation based, review-based, topic-based, neighborhood-based, and visual attributes of POIs. Contrary to existing studies that predominantly use multi-class classification approaches, and in order to assess and rank the influence of POI features on the categorization task, we propose both a multi-class and a binary classification approach. These, respectively, predict the place category among a specified set of POI categories, or indicate whether a POI belongs to a certain category. Using POI data from Amsterdam and Athens to implement and evaluate our study approach, we show that operation based features, such as opening or visiting hours throughout the day, are the most important place category predictors. Moreover, we demonstrate that the use of feature combinations, as opposed to the use of individual features, improves the classification performance by an average of 15%, in terms of F1-score.
|Number of pages||12|
|Journal||Computers, Environment and Urban Systems|
|Publication status||Published - 2021|
- Feature extraction
- Feature importance
- POI categories
- Point of interest