Learning Off-By-One Mistakes: An Empirical Study

Hendrig Sellik; Onno van Paridon; Georgios Gousios; Maurício Aniche

doi:10.1109/MSR52588.2021.00019

Learning Off-By-One Mistakes: An Empirical Study

Hendrig Sellik, Onno van Paridon, Georgios Gousios, Maurício Aniche

Software Engineering

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

3 Citations (Scopus)

38 Downloads (Pure)

Abstract

Mistakes in binary conditions are a source of error in many software systems. They happen when developers use, e.g., < or > instead of <= or >=. These boundary mistakes are hard to find and impose manual, labor-intensive work for software developers. While previous research has been proposing solutions to identify errors in boundary conditions, the problem remains open. In this paper, we explore the effectiveness of deep learning models in learning and predicting mistakes in boundary conditions. We train different models on approximately 1.6M examples with faults in different boundary conditions. We achieve a precision of 85% and a recall of 84% on a balanced dataset, but lower numbers in an imbalanced dataset. We also perform tests on 41 real-world boundary condition bugs found from GitHub, where the model shows only a modest performance. Finally, we test the model on a large-scale Java code base from Adyen, our industrial partner. The model reported 36 buggy methods, but none of them were confirmed by developers.

Original language	English
Title of host publication	2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR)
Editors	L. O'Conner
Place of Publication	Piscataway
Publisher	IEEE
Pages	58-67
Number of pages	10
ISBN (Electronic)	978-1-7281-8710-5
ISBN (Print)	978-1-6654-2985-6
DOIs	https://doi.org/10.1109/MSR52588.2021.00019
Publication status	Published - 2021
Event	2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR) - Virtual at Madrid, Spain Duration: 17 May 2021 → 19 May 2021 Conference number: 18th

Conference

Conference	2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR)
Abbreviated title	MSR21
Country/Territory	Spain
City	Virtual at Madrid
Period	17/05/21 → 19/05/21

Bibliographical note

Accepted author manuscript

Keywords

Boundary testing
Deep learning for software engineering
Machine learning for software engineering
Software testing

Access to Document

10.1109/MSR52588.2021.00019

msr2021_learning_off_by_oneAccepted author manuscript, 314 KB

Cite this

@inproceedings{fb773461fa1b41e387da028b5bff9d8a,

title = "Learning Off-By-One Mistakes: An Empirical Study",

abstract = "Mistakes in binary conditions are a source of error in many software systems. They happen when developers use, e.g., < or > instead of <= or >=. These boundary mistakes are hard to find and impose manual, labor-intensive work for software developers. While previous research has been proposing solutions to identify errors in boundary conditions, the problem remains open. In this paper, we explore the effectiveness of deep learning models in learning and predicting mistakes in boundary conditions. We train different models on approximately 1.6M examples with faults in different boundary conditions. We achieve a precision of 85% and a recall of 84% on a balanced dataset, but lower numbers in an imbalanced dataset. We also perform tests on 41 real-world boundary condition bugs found from GitHub, where the model shows only a modest performance. Finally, we test the model on a large-scale Java code base from Adyen, our industrial partner. The model reported 36 buggy methods, but none of them were confirmed by developers.",

keywords = "Boundary testing, Deep learning for software engineering, Machine learning for software engineering, Software testing",

author = "Hendrig Sellik and {van Paridon}, Onno and Georgios Gousios and Maur{\'i}cio Aniche",

note = "Accepted author manuscript; 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), MSR21 ; Conference date: 17-05-2021 Through 19-05-2021",

year = "2021",

doi = "10.1109/MSR52588.2021.00019",

language = "English",

isbn = "978-1-6654-2985-6",

pages = "58--67",

editor = "L. O'Conner",

booktitle = "2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR)",

publisher = "IEEE",

address = "United States",

}

Sellik, H, van Paridon, O, Gousios, G & Aniche, M 2021, Learning Off-By-One Mistakes: An Empirical Study. in L O'Conner (ed.), 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR)., 9463090, IEEE, Piscataway, pp. 58-67, 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR), Virtual at Madrid, Spain, 17/05/21. https://doi.org/10.1109/MSR52588.2021.00019

Learning Off-By-One Mistakes: An Empirical Study. / Sellik, Hendrig; van Paridon, Onno; Gousios, Georgios et al.
2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). ed. / L. O'Conner. Piscataway: IEEE, 2021. p. 58-67 9463090.

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Learning Off-By-One Mistakes

T2 - 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR)

AU - Sellik, Hendrig

AU - van Paridon, Onno

AU - Gousios, Georgios

AU - Aniche, Maurício

N1 - Conference code: 18th

PY - 2021

Y1 - 2021

N2 - Mistakes in binary conditions are a source of error in many software systems. They happen when developers use, e.g., < or > instead of <= or >=. These boundary mistakes are hard to find and impose manual, labor-intensive work for software developers. While previous research has been proposing solutions to identify errors in boundary conditions, the problem remains open. In this paper, we explore the effectiveness of deep learning models in learning and predicting mistakes in boundary conditions. We train different models on approximately 1.6M examples with faults in different boundary conditions. We achieve a precision of 85% and a recall of 84% on a balanced dataset, but lower numbers in an imbalanced dataset. We also perform tests on 41 real-world boundary condition bugs found from GitHub, where the model shows only a modest performance. Finally, we test the model on a large-scale Java code base from Adyen, our industrial partner. The model reported 36 buggy methods, but none of them were confirmed by developers.

AB - Mistakes in binary conditions are a source of error in many software systems. They happen when developers use, e.g., < or > instead of <= or >=. These boundary mistakes are hard to find and impose manual, labor-intensive work for software developers. While previous research has been proposing solutions to identify errors in boundary conditions, the problem remains open. In this paper, we explore the effectiveness of deep learning models in learning and predicting mistakes in boundary conditions. We train different models on approximately 1.6M examples with faults in different boundary conditions. We achieve a precision of 85% and a recall of 84% on a balanced dataset, but lower numbers in an imbalanced dataset. We also perform tests on 41 real-world boundary condition bugs found from GitHub, where the model shows only a modest performance. Finally, we test the model on a large-scale Java code base from Adyen, our industrial partner. The model reported 36 buggy methods, but none of them were confirmed by developers.

KW - Boundary testing

KW - Deep learning for software engineering

KW - Machine learning for software engineering

KW - Software testing

UR - http://www.scopus.com/inward/record.url?scp=85113643219&partnerID=8YFLogxK

U2 - 10.1109/MSR52588.2021.00019

DO - 10.1109/MSR52588.2021.00019

M3 - Conference contribution

SN - 978-1-6654-2985-6

SP - 58

EP - 67

BT - 2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR)

A2 - O'Conner, L.

PB - IEEE

CY - Piscataway

Y2 - 17 May 2021 through 19 May 2021

ER -

Learning Off-By-One Mistakes: An Empirical Study

Abstract

Conference

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Cite this