A Systematic Review of Artificial Intelligence Public Datasets for Railway Applications

Mauro José Pappaterra, Francesco Flammini, Valeria Vittorini, Nikola Bešinović

Research output: Contribution to journalReview articlepeer-review

20 Citations (Scopus)
111 Downloads (Pure)

Abstract

The aim of this paper is to review existing publicly available and open artificial intelligence (AI) oriented datasets in different domains and subdomains of the railway sector. The contribution of this paper is an overview of AI-oriented railway data published under Creative Commons (CC) or any other copyright type that entails public availability and freedom of use. These data are of great value for open research and publications related to the application of AI in the railway sector. This paper includes insights on the public railway data: we distinguish different subdomains, including maintenance and inspection, traffic planning and management, safety and security and type of data including numerical, string, image and other. The datasets reviewed cover the last three decades, from January 1990 to January 2021. The study revealed that the number of open datasets is very small in comparison with the available literature related to AI applications in the railway industry. Another shortcoming is the lack of documentation and metadata on public datasets, including information related to missing data, collection schemes and other limitations. This study also presents quantitative data, such as the number of available open datasets divided by railway application, type of data and year of publication. This review also reveals that there are openly available APIs—maintained by government organizations and train operating companies (TOCs)—that can be of great use for data harvesting and can facilitate the creation of large public datasets. These data are usually well-curated real-time data that can greatly contribute to the accuracy of AI models. Furthermore, we conclude that the extension of AI applications in the railway sector merits a centralized hub for publicly available datasets and open APIs.
Original languageEnglish
Article number136
Number of pages28
JournalInfrastructures
Volume6
Issue number10
DOIs
Publication statusPublished - 2021

Keywords

  • Intelligent transportation
  • Machine learning
  • Predictive maintenance
  • Public datasets
  • Railways

Fingerprint

Dive into the research topics of 'A Systematic Review of Artificial Intelligence Public Datasets for Railway Applications'. Together they form a unique fingerprint.

Cite this