CatIss: An Intelligent Tool for Categorizing Issues Reports using Transformers

Maliheh Izadi

doi:10.1145/3528588.3528662

CatIss: An Intelligent Tool for Categorizing Issues Reports using Transformers

Software Engineering

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

16 Downloads (Pure)

Abstract

Users use Issue Tracking Systems to keep track and manage issue reports in their repositories. An issue is a rich source of software information that contains different reports including a problem, a request for new features, or merely a question about the software product. As the number of these issues increases, it becomes harder to manage them manually. Thus, automatic approaches are proposed to help facilitate the management of issue reports. This paper describes CatIss, an automatic Categorizer of Issue reports which is built upon the Transformer-based pre-trained RoBERTa model. CatIss classifies issue reports into three main categories of Bug report, Enhancement/feature request, and Question. First, the datasets provided for the NLBSE tool competition are cleaned and preprocessed. Then, the pre-trained RoBERTa model is fine-tuned on the preprocessed dataset. Evaluating CatIss on about 80 thousand issue reports from GitHub, indicates that it performs very well surpassing the competition baseline, TicketTagger, and achieving 87.2% F1-score (micro average). Additionally, as CatIss is trained on a wide set of repositories, it is a generic prediction model, hence applicable for any unseen software project or projects with little historical data. Scripts for cleaning the datasets, training CatIss and evaluating the model are publicly available.

Original language	English
Title of host publication	Proceedings of the 2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)
Publisher	IEEE
Pages	44-47
Number of pages	4
ISBN (Electronic)	978-1-4503-9343-0
ISBN (Print)	978-1-6654-6231-0
DOIs	https://doi.org/10.1145/3528588.3528662
Publication status	Published - 2022
Event	2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE) - Pittsburgh, United States Duration: 8 May 2022 → 8 May 2022

Publication series

Name	Proceedings - 1st International Workshop on Natural Language-Based Software Engineering, NLBSE 2022

Workshop

Workshop	2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)
Country/Territory	United States
City	Pittsburgh
Period	8/05/22 → 8/05/22

Keywords

Issue report Management
Classification, Repositories
Transformers
Machine Learning
Natural Language Processing

Access to Document

10.1145/3528588.3528662

3528588.3528662Final published version, 244 KBLicence: CC BY-NC-SA

Cite this

Izadi, M. (2022). CatIss: An Intelligent Tool for Categorizing Issues Reports using Transformers. In Proceedings of the 2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE) (pp. 44-47). Article 9808639 (Proceedings - 1st International Workshop on Natural Language-Based Software Engineering, NLBSE 2022). IEEE. https://doi.org/10.1145/3528588.3528662

@inproceedings{a2d52acae1c54fd2beee10a2fd7656e9,

title = "CatIss: An Intelligent Tool for Categorizing Issues Reports using Transformers",

abstract = "Users use Issue Tracking Systems to keep track and manage issue reports in their repositories. An issue is a rich source of software information that contains different reports including a problem, a request for new features, or merely a question about the software product. As the number of these issues increases, it becomes harder to manage them manually. Thus, automatic approaches are proposed to help facilitate the management of issue reports. This paper describes CatIss, an automatic Categorizer of Issue reports which is built upon the Transformer-based pre-trained RoBERTa model. CatIss classifies issue reports into three main categories of Bug report, Enhancement/feature request, and Question. First, the datasets provided for the NLBSE tool competition are cleaned and preprocessed. Then, the pre-trained RoBERTa model is fine-tuned on the preprocessed dataset. Evaluating CatIss on about 80 thousand issue reports from GitHub, indicates that it performs very well surpassing the competition baseline, TicketTagger, and achieving 87.2% F1-score (micro average). Additionally, as CatIss is trained on a wide set of repositories, it is a generic prediction model, hence applicable for any unseen software project or projects with little historical data. Scripts for cleaning the datasets, training CatIss and evaluating the model are publicly available.",

keywords = "Issue report Management, Classification, Repositories, Transformers, Machine Learning, Natural Language Processing",

author = "Maliheh Izadi",

year = "2022",

doi = "10.1145/3528588.3528662",

language = "English",

isbn = "978-1-6654-6231-0",

series = "Proceedings - 1st International Workshop on Natural Language-Based Software Engineering, NLBSE 2022",

publisher = "IEEE",

pages = "44--47",

booktitle = "Proceedings of the 2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)",

address = "United States",

note = "2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE) ; Conference date: 08-05-2022 Through 08-05-2022",

}

Izadi, M 2022, CatIss: An Intelligent Tool for Categorizing Issues Reports using Transformers. in Proceedings of the 2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)., 9808639, Proceedings - 1st International Workshop on Natural Language-Based Software Engineering, NLBSE 2022, IEEE, pp. 44-47, 2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE), Pittsburgh, United States, 8/05/22. https://doi.org/10.1145/3528588.3528662

CatIss: An Intelligent Tool for Categorizing Issues Reports using Transformers. / Izadi, Maliheh.
Proceedings of the 2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE). IEEE, 2022. p. 44-47 9808639 (Proceedings - 1st International Workshop on Natural Language-Based Software Engineering, NLBSE 2022).

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - CatIss

T2 - 2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)

AU - Izadi, Maliheh

PY - 2022

Y1 - 2022

N2 - Users use Issue Tracking Systems to keep track and manage issue reports in their repositories. An issue is a rich source of software information that contains different reports including a problem, a request for new features, or merely a question about the software product. As the number of these issues increases, it becomes harder to manage them manually. Thus, automatic approaches are proposed to help facilitate the management of issue reports. This paper describes CatIss, an automatic Categorizer of Issue reports which is built upon the Transformer-based pre-trained RoBERTa model. CatIss classifies issue reports into three main categories of Bug report, Enhancement/feature request, and Question. First, the datasets provided for the NLBSE tool competition are cleaned and preprocessed. Then, the pre-trained RoBERTa model is fine-tuned on the preprocessed dataset. Evaluating CatIss on about 80 thousand issue reports from GitHub, indicates that it performs very well surpassing the competition baseline, TicketTagger, and achieving 87.2% F1-score (micro average). Additionally, as CatIss is trained on a wide set of repositories, it is a generic prediction model, hence applicable for any unseen software project or projects with little historical data. Scripts for cleaning the datasets, training CatIss and evaluating the model are publicly available.

AB - Users use Issue Tracking Systems to keep track and manage issue reports in their repositories. An issue is a rich source of software information that contains different reports including a problem, a request for new features, or merely a question about the software product. As the number of these issues increases, it becomes harder to manage them manually. Thus, automatic approaches are proposed to help facilitate the management of issue reports. This paper describes CatIss, an automatic Categorizer of Issue reports which is built upon the Transformer-based pre-trained RoBERTa model. CatIss classifies issue reports into three main categories of Bug report, Enhancement/feature request, and Question. First, the datasets provided for the NLBSE tool competition are cleaned and preprocessed. Then, the pre-trained RoBERTa model is fine-tuned on the preprocessed dataset. Evaluating CatIss on about 80 thousand issue reports from GitHub, indicates that it performs very well surpassing the competition baseline, TicketTagger, and achieving 87.2% F1-score (micro average). Additionally, as CatIss is trained on a wide set of repositories, it is a generic prediction model, hence applicable for any unseen software project or projects with little historical data. Scripts for cleaning the datasets, training CatIss and evaluating the model are publicly available.

KW - Issue report Management

KW - Classification, Repositories

KW - Transformers

KW - Machine Learning

KW - Natural Language Processing

UR - http://www.scopus.com/inward/record.url?scp=85135186906&partnerID=8YFLogxK

U2 - 10.1145/3528588.3528662

DO - 10.1145/3528588.3528662

M3 - Conference contribution

SN - 978-1-6654-6231-0

T3 - Proceedings - 1st International Workshop on Natural Language-Based Software Engineering, NLBSE 2022

SP - 44

EP - 47

BT - Proceedings of the 2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE)

PB - IEEE

Y2 - 8 May 2022 through 8 May 2022

ER -

Izadi M. CatIss: An Intelligent Tool for Categorizing Issues Reports using Transformers. In Proceedings of the 2022 IEEE/ACM 1st International Workshop on Natural Language-Based Software Engineering (NLBSE). IEEE. 2022. p. 44-47. 9808639. (Proceedings - 1st International Workshop on Natural Language-Based Software Engineering, NLBSE 2022). doi: 10.1145/3528588.3528662

CatIss: An Intelligent Tool for Categorizing Issues Reports using Transformers

Abstract

Publication series

Workshop

Keywords

Access to Document

Other files and links

Fingerprint

Cite this