Evaluating classifiers in SE research: the ECSER pipeline and two replication studies

Davide Dell’Anna*, Fatma Başak Aydemir, Fabiano Dalpiaz

*Corresponding author for this work

Research output: Contribution to journalArticleScientificpeer-review

5 Citations (Scopus)
77 Downloads (Pure)


Context: Automated classifiers, often based on machine learning (ML), are increasingly used in software engineering (SE) for labelling previously unseen SE data. Researchers have proposed automated classifiers that predict if a code chunk is a clone, if a requirement is functional or non-functional, if the outcome of a test case is non-deterministic, etc. Objective: The lack of guidelines for applying and reporting classification techniques for SE research leads to studies in which important research steps may be skipped, key findings might not be identified and shared, and the readers may find reported results (e.g., precision or recall above 90%) that are not a credible representation of the performance in operational contexts. The goal of this paper is to advance ML4SE research by proposing rigorous ways of conducting and reporting research. Results: We introduce the ECSER (Evaluating Classifiers in Software Engineering Research) pipeline, which includes a series of steps for conducting and evaluating automated classification research in SE. Then, we conduct two replication studies where we apply ECSER to recent research in requirements engineering and in software testing. Conclusions: In addition to demonstrating the applicability of the pipeline, the replication studies demonstrate ECSER’s usefulness: not only do we confirm and strengthen some findings identified by the original authors, but we also discover additional ones. Some of these findings contradict the original ones.

Original languageEnglish
Article number3
Number of pages40
JournalEmpirical Software Engineering
Issue number1
Publication statusPublished - 2023


  • Automated classification
  • Machine learning
  • Replication study
  • Software engineering


Dive into the research topics of 'Evaluating classifiers in SE research: the ECSER pipeline and two replication studies'. Together they form a unique fingerprint.

Cite this