A Systematic Comparison of Search Algorithms for Topic Modelling—A Study on Duplicate Bug Report Identification

Annibale Panichella*

*Corresponding author for this work

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

13 Citations (Scopus)
195 Downloads (Pure)

Abstract

Latent Dirichlet Allocation (LDA) has been used to support many software engineering tasks. Previous studies showed that default settings lead to sub-optimal topic modeling with a dramatic impact on the performance of such approaches in terms of precision and recall. For this reason, researchers used search algorithms (e.g., genetic algorithms) to automatically configure topic models in an unsupervised fashion. While previous work showed the ability of individual search algorithms in finding near-optimal configurations, it is not clear to what extent the choice of the meta-heuristic matters for SE tasks. In this paper, we present a systematic comparison of five different meta-heuristics to configure LDA in the context of duplicate bug reports identification. The results show that (1) no master algorithm outperforms the others for all software projects, (2) random search and PSO are the least effective meta-heuristics. Finally, the running time strongly depends on the computational complexity of LDA while the internal complexity of the search algorithms plays a negligible role.

Original languageEnglish
Title of host publicationSearch-Based Software Engineering - 11th International Symposium, SSBSE 2019, Proceedings
EditorsShiva Nejati, Gregory Gay
PublisherSpringer
Pages11-26
Number of pages16
ISBN (Print)9783030274542
DOIs
Publication statusPublished - 1 Jan 2019
Event11th International Symposium on Search-Based Software Engineering, SSBSE 2019 - Tallinn, Estonia
Duration: 31 Aug 20191 Sept 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11664 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference11th International Symposium on Search-Based Software Engineering, SSBSE 2019
Country/TerritoryEstonia
CityTallinn
Period31/08/191/09/19

Keywords

  • Duplicate Bug Report
  • Evolutionary Algorithms
  • Latent Dirichlet Allocation
  • Search-based Software Engineering
  • Topic modeling

Fingerprint

Dive into the research topics of 'A Systematic Comparison of Search Algorithms for Topic Modelling—A Study on Duplicate Bug Report Identification'. Together they form a unique fingerprint.
  • SSBSE 2019 Best Paper Award

    Panichella, A. (Recipient), 1 Sept 2019

    Prize: Prize (including medals and awards)

    File

Cite this