Abstract
The area of conversational search has gained significant traction in the IR research community, motivated by the widespread use of personal assistants. An often researched task in this setting is conversation response ranking, that is, to retrieve the best response for a given ongoing conversation from a corpus of historic conversations. While this is intuitively an important step towards (retrieval-based) conversational search, the empirical evaluation currently employed to evaluate trained rankers is very far from this setup: typically, an extremely small number (e.g., 10) of non-relevant responses and a single relevant response are presented to the ranker. In a real-world scenario, a retrieval-based system has to retrieve responses from a large (e.g., several millions) pool of responses or determine that no appropriate response can be found. In this paper we aim to highlight these critical issues in the offline evaluation schemes for tasks related to conversational search. With this paper, we argue that the currently in-use evaluation schemes have critical limitations and simplify the conversational search tasks to a degree that makes it questionable whether we can trust the findings they deliver.
Original language | English |
---|---|
Title of host publication | KDD 2020 Workshop on Conversational Systems Towards Mainstream Adoption, KDD-Converse 2020 |
Editors | G. Di Fabbrizio, S. Kallumadi, U. Porwal, T. Taula |
Number of pages | 5 |
Volume | 2666 |
Publication status | Published - 2020 |
Event | KDD 2020 Workshop on Conversational Systems Towards Mainstream Adoption, KDD-Converse 2020 - Virtual, Online, United States Duration: 24 Aug 2020 → 24 Aug 2020 http://ceur-ws.org/Vol-2666/ |
Publication series
Name | CEUR Workshop Proceedings |
---|---|
Publisher | CEUR-WS |
ISSN (Print) | 1613-0073 |
Conference
Conference | KDD 2020 Workshop on Conversational Systems Towards Mainstream Adoption, KDD-Converse 2020 |
---|---|
Abbreviated title | KDD-Converse 2020 |
Country/Territory | United States |
Period | 24/08/20 → 24/08/20 |
Internet address |