Learning Off-By-One Mistakes: An Empirical Study

Hendrig Sellik, Onno van Paridon, Georgios Gousios, Maurício Aniche

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

2 Downloads (Pure)

Abstract

Mistakes in binary conditions are a source of error in many software systems. They happen when developers use, e.g., < or > instead of <= or >=. These boundary mistakes are hard to find and impose manual, labor-intensive work for software developers. While previous research has been proposing solutions to identify errors in boundary conditions, the problem remains open. In this paper, we explore the effectiveness of deep learning models in learning and predicting mistakes in boundary conditions. We train different models on approximately 1.6M examples with faults in different boundary conditions. We achieve a precision of 85% and a recall of 84% on a balanced dataset, but lower numbers in an imbalanced dataset. We also perform tests on 41 real-world boundary condition bugs found from GitHub, where the model shows only a modest performance. Finally, we test the model on a large-scale Java code base from Adyen, our industrial partner. The model reported 36 buggy methods, but none of them were confirmed by developers.
Original languageEnglish
Title of host publicationProceedings of the Mining Software Repositories Conference (MSR'21)
Publication statusPublished - 2021
EventMining Software Repositories conference (MSR'21) -
Duration: 17 May 202119 May 2021

Conference

ConferenceMining Software Repositories conference (MSR'21)
Abbreviated titleMSR21
Period17/05/2119/05/21

Fingerprint Dive into the research topics of 'Learning Off-By-One Mistakes: An Empirical Study'. Together they form a unique fingerprint.

Cite this