TY - GEN
T1 - Document performance prediction for automatic text classification
AU - Penha, Gustavo
AU - Campos, Raphael
AU - Canuto, Sérgio
AU - Gonçalves, Marcos André
AU - Santos, Rodrygo L.T.
PY - 2019
Y1 - 2019
N2 - Query performance prediction (QPP) is a fundamental task in information retrieval, which concerns predicting the effectiveness of a ranking model for a given query in the absence of relevance information. Despite being an active research area, this task has not yet been explored in the context of automatic text classification. In this paper, we study the task of predicting the effectiveness of a classifier for a given document, which we refer to as document performance prediction (DPP). Our experiments on several text classification datasets for both categorization and sentiment analysis attest the effectiveness and complementarity of several DPP inspired by related QPP approaches. Finally, we also explore the usefulness of DPP for improving the classification itself, by using them as additional features in a classification ensemble.
AB - Query performance prediction (QPP) is a fundamental task in information retrieval, which concerns predicting the effectiveness of a ranking model for a given query in the absence of relevance information. Despite being an active research area, this task has not yet been explored in the context of automatic text classification. In this paper, we study the task of predicting the effectiveness of a classifier for a given document, which we refer to as document performance prediction (DPP). Our experiments on several text classification datasets for both categorization and sentiment analysis attest the effectiveness and complementarity of several DPP inspired by related QPP approaches. Finally, we also explore the usefulness of DPP for improving the classification itself, by using them as additional features in a classification ensemble.
KW - Automatic text classification
KW - Performance prediction
UR - http://www.scopus.com/inward/record.url?scp=85064858139&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-15719-7_17
DO - 10.1007/978-3-030-15719-7_17
M3 - Conference contribution
AN - SCOPUS:85064858139
SN - 978-3-030-15718-0
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 132
EP - 139
BT - Advances in Information Retrieval
A2 - Azzopardi, Leif
A2 - Stein, Benno
A2 - Fuhr, Norbert
A2 - Hauff, Claudia
A2 - Mayr, Philipp
A2 - Hiemstra, Djoerd
PB - Springer
CY - Cham
T2 - 41st European Conference on Information Retrieval, ECIR 2019
Y2 - 14 April 2019 through 18 April 2019
ER -