Document performance prediction for automatic text classification

Gustavo Penha, Raphael Campos, Sérgio Canuto, Marcos André Gonçalves, Rodrygo L.T. Santos

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

1 Citation (Scopus)
4 Downloads (Pure)


Query performance prediction (QPP) is a fundamental task in information retrieval, which concerns predicting the effectiveness of a ranking model for a given query in the absence of relevance information. Despite being an active research area, this task has not yet been explored in the context of automatic text classification. In this paper, we study the task of predicting the effectiveness of a classifier for a given document, which we refer to as document performance prediction (DPP). Our experiments on several text classification datasets for both categorization and sentiment analysis attest the effectiveness and complementarity of several DPP inspired by related QPP approaches. Finally, we also explore the usefulness of DPP for improving the classification itself, by using them as additional features in a classification ensemble.

Original languageEnglish
Title of host publicationAdvances in Information Retrieval
Subtitle of host publication41st European Conference on IR Research, ECIR 2019
EditorsLeif Azzopardi, Benno Stein, Norbert Fuhr, Claudia Hauff, Philipp Mayr, Djoerd Hiemstra
Place of PublicationCham
Number of pages8
ISBN (Electronic)978-3-030-15719-7
ISBN (Print)978-3-030-15718-0
Publication statusPublished - 2019
Event41st European Conference on Information Retrieval, ECIR 2019 - Cologne, Germany
Duration: 14 Apr 201918 Apr 2019

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11438 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349


Conference41st European Conference on Information Retrieval, ECIR 2019


  • Automatic text classification
  • Performance prediction

Fingerprint Dive into the research topics of 'Document performance prediction for automatic text classification'. Together they form a unique fingerprint.

Cite this