Automated Recovery of Issue-Commit Links Leveraging Both Textual and Non-textual Data

Pooya Rostami Mazrae, Maliheh Izadi, Abbas Heydarnoori

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

4 Citations (Scopus)

Abstract

An issue report documents the discussions around required changes in issue-tracking systems, while a commit contains the change itself in the version control systems. Recovering links between issues and commits can facilitate many software evolution tasks such as bug localization, defect prediction, software quality measurement, and software documentation. A previous study on over half a million issues from GitHub reports only about 42.2% of issues are manually linked by developers to their pertinent commits. Automating the linking of commit-issue pairs can contribute to the improvement of the said tasks. By far, current state-of-the-art approaches for automated commit-issue linking suffer from low precision, leading to unreliable results, sometimes to the point that imposes human supervision on the predicted links. The low performance gets even more severe when there is a lack of textual information in either commits or issues. Current approaches are also proven computationally expensive. We propose Hybrid-Linker, an enhanced approach that overcomes such limitations by exploiting two information channels; (1) a non-textual-based component that operates on non-textual, automatically recorded information of the commit-issue pairs to predict a link, and (2) a textual-based one which does the same using textual information of the commit-issue pairs. Then, combining the results from the two classifiers, Hybrid-Linker makes the final prediction. Thus, every time one component falls short in predicting a link, the other component fills the gap and improves the results. We evaluate Hybrid-Linker against competing approaches, namely FRLink and DeepLink on a dataset of 12 projects. Hybrid-Linker achieves 90.1%, 87.8%, and 88.9% based on recall, precision, and F-measure, respectively. It also outperforms FRLink and DeepLink by 31.3%, and 41.3%, regarding the F-measure. Moreover, the proposed approach exhibits extensive improvements in terms of performance as well. Finally, our source code and data are publicly available.

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE International Conference on Software Maintenance and Evolution, ICSME 2021
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages263-273
Number of pages11
ISBN (Electronic)9781665428828
DOIs
Publication statusPublished - 2021
Externally publishedYes
Event37th IEEE International Conference on Software Maintenance and Evolution, ICSME 2021 - Luxembourg City, Luxembourg
Duration: 27 Sept 20211 Oct 2021

Publication series

NameProceedings - 2021 IEEE International Conference on Software Maintenance and Evolution, ICSME 2021

Conference

Conference37th IEEE International Conference on Software Maintenance and Evolution, ICSME 2021
Country/TerritoryLuxembourg
CityLuxembourg City
Period27/09/211/10/21

Keywords

  • Commit
  • Ensemble Methods
  • Issue Report
  • Link Recovery
  • Machine Learning
  • Software Maintenance

Fingerprint

Dive into the research topics of 'Automated Recovery of Issue-Commit Links Leveraging Both Textual and Non-textual Data'. Together they form a unique fingerprint.

Cite this