Revisiting Test Smells in Automatically Generated Tests: Limitations, Pitfalls, and Opportunities

A. Panichella; Sebastiano Panichella; Gordon Fraser; Anand Ashok Sawant; Vincent J.  Hellendoorn

doi:10.1109/ICSME46990.2020.00056

Revisiting Test Smells in Automatically Generated Tests: Limitations, Pitfalls, and Opportunities

A. Panichella, Sebastiano Panichella, Gordon Fraser, Anand Ashok Sawant, Vincent J. Hellendoorn

Software Engineering

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

19 Citations (Scopus)

153 Downloads (Pure)

Abstract

Test smells attempt to capture design issues in test code that reduce their maintainability. Previous work found such smells to be highly common in automatically generated test-cases, but based this result on specific static detection rules; although these are based on the original definition of “test smells”, a recent empirical study showed that developers perceive these as overly strict and non-representative of the maintainability and quality of test suites. This leads us to investigate how effective such test smell detection tools are on automatically generated test suites. In this paper, we build a dataset of 2,340 test cases automatically generated by EVOSUITE for 100 Java classes. We performed a multi-stage, cross-validated manual analysis to identify six types of test smells and label their instances. We benchmark the performance of two test smell detection tools: one widely used in prior work, and one recently introduced with the express goal to match developer perceptions of test smells. Our results show that these test smell detection strategies poorly characterized the issues in automatically generated test suites; the older tool’s detection strategies, especially, misclassified over 70% of test smells, both missing real instances (false negatives) and marking many smell-free tests as smelly (false positives). We identify common patterns in these tests that can be used to improve the tools, refine and update the definition of certain test smells, and highlight as of yet uncharacterized issues. Our findings suggest the need for (i) more appropriate metrics to match development practice; and (ii) more accurate detection strategies, to be evaluated primarily in industrial contexts.

Original language	English
Title of host publication	Proceedings - 2020 IEEE International Conference on Software Maintenance and Evolution, ICSME 2020
Place of Publication	Adelaide, Australia, Australia
Publisher	IEEE
Pages	523-533
Number of pages	11
ISBN (Electronic)	978-1-7281-5619-4
ISBN (Print)	978-1-7281-5620-0
DOIs	https://doi.org/10.1109/ICSME46990.2020.00056
Publication status	Published - 2020
Event	ICSME 2020: International Conference on Software Maintenance and Evolution - Virtual/online event due to COVID-19 , Adelaide, Australia Duration: 28 Sept 2020 → 2 Oct 2020

Publication series

Name	Proceedings - 2020 IEEE International Conference on Software Maintenance and Evolution, ICSME 2020

Conference

Conference	ICSME 2020: International Conference on Software Maintenance and Evolution
Abbreviated title	ICSME 2020
Country/Territory	Australia
City	Adelaide
Period	28/09/20 → 2/10/20

Bibliographical note

Virtual/online event due to COVID-19

Keywords

Software Quality
Test Generation
Test Smells

Access to Document

10.1109/ICSME46990.2020.00056

ICSME2020Accepted author manuscript, 364 KBLicence: GNU LGPL

Cite this

Panichella, A., Panichella, S., Fraser, G., Sawant, A. A., & Hellendoorn, V. J. (2020). Revisiting Test Smells in Automatically Generated Tests: Limitations, Pitfalls, and Opportunities. In Proceedings - 2020 IEEE International Conference on Software Maintenance and Evolution, ICSME 2020 (pp. 523-533). Article 9240691 (Proceedings - 2020 IEEE International Conference on Software Maintenance and Evolution, ICSME 2020). IEEE. https://doi.org/10.1109/ICSME46990.2020.00056

Panichella, A. ; Panichella, Sebastiano ; Fraser, Gordon et al. / Revisiting Test Smells in Automatically Generated Tests: Limitations, Pitfalls, and Opportunities. Proceedings - 2020 IEEE International Conference on Software Maintenance and Evolution, ICSME 2020. Adelaide, Australia, Australia : IEEE, 2020. pp. 523-533 (Proceedings - 2020 IEEE International Conference on Software Maintenance and Evolution, ICSME 2020).

@inproceedings{3a7a5b888dc8416883ef823e78a0e26f,

title = "Revisiting Test Smells in Automatically Generated Tests: Limitations, Pitfalls, and Opportunities",

abstract = "Test smells attempt to capture design issues in test code that reduce their maintainability. Previous work found such smells to be highly common in automatically generated test-cases, but based this result on specific static detection rules; although these are based on the original definition of “test smells”, a recent empirical study showed that developers perceive these as overly strict and non-representative of the maintainability and quality of test suites. This leads us to investigate how effective such test smell detection tools are on automatically generated test suites. In this paper, we build a dataset of 2,340 test cases automatically generated by EVOSUITE for 100 Java classes. We performed a multi-stage, cross-validated manual analysis to identify six types of test smells and label their instances. We benchmark the performance of two test smell detection tools: one widely used in prior work, and one recently introduced with the express goal to match developer perceptions of test smells. Our results show that these test smell detection strategies poorly characterized the issues in automatically generated test suites; the older tool{\textquoteright}s detection strategies, especially, misclassified over 70% of test smells, both missing real instances (false negatives) and marking many smell-free tests as smelly (false positives). We identify common patterns in these tests that can be used to improve the tools, refine and update the definition of certain test smells, and highlight as of yet uncharacterized issues. Our findings suggest the need for (i) more appropriate metrics to match development practice; and (ii) more accurate detection strategies, to be evaluated primarily in industrial contexts.",

keywords = "Software Quality, Test Generation, Test Smells",

author = "A. Panichella and Sebastiano Panichella and Gordon Fraser and Sawant, {Anand Ashok} and Hellendoorn, {Vincent J.}",

note = "Virtual/online event due to COVID-19 ; ICSME 2020: International Conference on Software Maintenance and Evolution, ICSME 2020 ; Conference date: 28-09-2020 Through 02-10-2020",

year = "2020",

doi = "10.1109/ICSME46990.2020.00056",

language = "English",

isbn = "978-1-7281-5620-0",

series = "Proceedings - 2020 IEEE International Conference on Software Maintenance and Evolution, ICSME 2020",

publisher = "IEEE",

pages = "523--533",

booktitle = "Proceedings - 2020 IEEE International Conference on Software Maintenance and Evolution, ICSME 2020",

address = "United States",

}

Panichella, A, Panichella, S, Fraser, G, Sawant, AA & Hellendoorn, VJ 2020, Revisiting Test Smells in Automatically Generated Tests: Limitations, Pitfalls, and Opportunities. in Proceedings - 2020 IEEE International Conference on Software Maintenance and Evolution, ICSME 2020., 9240691, Proceedings - 2020 IEEE International Conference on Software Maintenance and Evolution, ICSME 2020, IEEE, Adelaide, Australia, Australia, pp. 523-533, ICSME 2020: International Conference on Software Maintenance and Evolution, Adelaide, Australia, 28/09/20. https://doi.org/10.1109/ICSME46990.2020.00056

Revisiting Test Smells in Automatically Generated Tests: Limitations, Pitfalls, and Opportunities. / Panichella, A.; Panichella, Sebastiano; Fraser, Gordon et al.
Proceedings - 2020 IEEE International Conference on Software Maintenance and Evolution, ICSME 2020. Adelaide, Australia, Australia: IEEE, 2020. p. 523-533 9240691 (Proceedings - 2020 IEEE International Conference on Software Maintenance and Evolution, ICSME 2020).

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Revisiting Test Smells in Automatically Generated Tests: Limitations, Pitfalls, and Opportunities

AU - Panichella, A.

AU - Panichella, Sebastiano

AU - Fraser, Gordon

AU - Sawant, Anand Ashok

AU - Hellendoorn, Vincent J.

N1 - Virtual/online event due to COVID-19

PY - 2020

Y1 - 2020

N2 - Test smells attempt to capture design issues in test code that reduce their maintainability. Previous work found such smells to be highly common in automatically generated test-cases, but based this result on specific static detection rules; although these are based on the original definition of “test smells”, a recent empirical study showed that developers perceive these as overly strict and non-representative of the maintainability and quality of test suites. This leads us to investigate how effective such test smell detection tools are on automatically generated test suites. In this paper, we build a dataset of 2,340 test cases automatically generated by EVOSUITE for 100 Java classes. We performed a multi-stage, cross-validated manual analysis to identify six types of test smells and label their instances. We benchmark the performance of two test smell detection tools: one widely used in prior work, and one recently introduced with the express goal to match developer perceptions of test smells. Our results show that these test smell detection strategies poorly characterized the issues in automatically generated test suites; the older tool’s detection strategies, especially, misclassified over 70% of test smells, both missing real instances (false negatives) and marking many smell-free tests as smelly (false positives). We identify common patterns in these tests that can be used to improve the tools, refine and update the definition of certain test smells, and highlight as of yet uncharacterized issues. Our findings suggest the need for (i) more appropriate metrics to match development practice; and (ii) more accurate detection strategies, to be evaluated primarily in industrial contexts.

AB - Test smells attempt to capture design issues in test code that reduce their maintainability. Previous work found such smells to be highly common in automatically generated test-cases, but based this result on specific static detection rules; although these are based on the original definition of “test smells”, a recent empirical study showed that developers perceive these as overly strict and non-representative of the maintainability and quality of test suites. This leads us to investigate how effective such test smell detection tools are on automatically generated test suites. In this paper, we build a dataset of 2,340 test cases automatically generated by EVOSUITE for 100 Java classes. We performed a multi-stage, cross-validated manual analysis to identify six types of test smells and label their instances. We benchmark the performance of two test smell detection tools: one widely used in prior work, and one recently introduced with the express goal to match developer perceptions of test smells. Our results show that these test smell detection strategies poorly characterized the issues in automatically generated test suites; the older tool’s detection strategies, especially, misclassified over 70% of test smells, both missing real instances (false negatives) and marking many smell-free tests as smelly (false positives). We identify common patterns in these tests that can be used to improve the tools, refine and update the definition of certain test smells, and highlight as of yet uncharacterized issues. Our findings suggest the need for (i) more appropriate metrics to match development practice; and (ii) more accurate detection strategies, to be evaluated primarily in industrial contexts.

KW - Software Quality

KW - Test Generation

KW - Test Smells

UR - http://www.scopus.com/inward/record.url?scp=85096693628&partnerID=8YFLogxK

U2 - 10.1109/ICSME46990.2020.00056

DO - 10.1109/ICSME46990.2020.00056

M3 - Conference contribution

SN - 978-1-7281-5620-0

T3 - Proceedings - 2020 IEEE International Conference on Software Maintenance and Evolution, ICSME 2020

SP - 523

EP - 533

BT - Proceedings - 2020 IEEE International Conference on Software Maintenance and Evolution, ICSME 2020

PB - IEEE

CY - Adelaide, Australia, Australia

T2 - ICSME 2020: International Conference on Software Maintenance and Evolution

Y2 - 28 September 2020 through 2 October 2020

ER -

Panichella A, Panichella S, Fraser G, Sawant AA, Hellendoorn VJ. Revisiting Test Smells in Automatically Generated Tests: Limitations, Pitfalls, and Opportunities. In Proceedings - 2020 IEEE International Conference on Software Maintenance and Evolution, ICSME 2020. Adelaide, Australia, Australia: IEEE. 2020. p. 523-533. 9240691. (Proceedings - 2020 IEEE International Conference on Software Maintenance and Evolution, ICSME 2020). doi: 10.1109/ICSME46990.2020.00056

Revisiting Test Smells in Automatically Generated Tests: Limitations, Pitfalls, and Opportunities

Abstract

Publication series

Conference

Bibliographical note

Keywords

Access to Document

Other files and links

Fingerprint

Test Smells 20 Years Later: Detectability, Validity, and Reliability

Cite this