LogChunks: A Data Set for Build Log Analysis

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

247 Downloads (Pure)

Abstract

Build logs are textual by-products that a software build process creates, often as part of its Continuous Integration (CI) pipeline. Build logs are a paramount source of information for developers when debugging into and understanding a build failure. Recently, attempts to partly automate this time-consuming, purely manual activity have come up, such as rule- or information-retrieval-based techniques. We believe that having a common data set to compare different build log analysis techniques will advance the research area. It will ultimately increase our understanding of CI build failures. In this paper, we present logchunks, a collection of 797 annotated Travis CI build logs from 80 GitHub repositories in 29 programming languages. For each build log, logchunks contains a manually labeled log part (chunk) describing why the build failed. We externally validated the data set with the developers who caused the original build failure. The width and depth of the logchunks data set are intended to make it the default benchmark for automated build log analysis techniques.
Original languageEnglish
Title of host publicationProceedings - 2020 IEEE/ACM 17th International Conference on Mining Software Repositories, MSR 2020
Pages583-587
Number of pages5
ISBN (Electronic)9781450379571
DOIs
Publication statusPublished - 2020
Event17th International Conference on Mining Software Repositories - Seoul, Korea, Republic of
Duration: 5 Oct 20206 Oct 2020
Conference number: 17

Publication series

NameProceedings - 2020 IEEE/ACM 17th International Conference on Mining Software Repositories, MSR 2020

Conference

Conference17th International Conference on Mining Software Repositories
Abbreviated titleMSR 20
CountryKorea, Republic of
CitySeoul
Period5/10/206/10/20
OtherVirtual/online event due to COVID-19

Keywords

  • Continuous Integration
  • Build Log Analysis
  • Build Failure
  • Chunk Retrieval
  • CI

Fingerprint Dive into the research topics of 'LogChunks: A Data Set for Build Log Analysis'. Together they form a unique fingerprint.

Cite this