Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators

Gustavo Penha; Arthur Câmara; Claudia Hauff

doi:10.1007/978-3-030-99736-6_27

Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators

Gustavo Penha^*, Arthur Câmara, Claudia Hauff

^*Corresponding author for this work

Web Information Systems

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

16 Citations (Scopus)

29 Downloads (Pure)

Abstract

Heavily pre-trained transformers for language modeling, such as BERT, have shown to be remarkably effective for Information Retrieval (IR) tasks, typically applied to re-rank the results of a first-stage retrieval model. IR benchmarks evaluate the effectiveness of retrieval pipelines based on the premise that a single query is used to instantiate the underlying information need. However, previous research has shown that (I) queries generated by users for a fixed information need are extremely variable and, in particular, (II) neural models are brittle and often make mistakes when tested with modified inputs. Motivated by those observations we aim to answer the following question: how robust are retrieval pipelines with respect to different variations in queries that do not change the queries’ semantics? In order to obtain queries that are representative of users’ querying variability, we first created a taxonomy based on the manual annotation of transformations occurring in a dataset (UQV100) of user-created query variations. For each syntax-changing category of our taxonomy, we employed different automatic methods that when applied to a query generate a query variation. Our experimental results across two datasets for two IR tasks reveal that retrieval pipelines are not robust to these query variations, with effectiveness drops of ≈ 20 % on average. The code and datasets are available at https://github.com/Guzpenha/query_variation_generators.

Original language	English
Title of host publication	Advances in Information Retrieval - 44th European Conference on IR Research, ECIR 2022, Proceedings
Editors	Matthias Hagen, Suzan Verberne, Craig Macdonald, Christin Seifert, Krisztian Balog, Kjetil Nørvåg, Vinay Setty
Publisher	Springer
Pages	397-412
Number of pages	16
ISBN (Print)	9783030997359
DOIs	https://doi.org/10.1007/978-3-030-99736-6_27
Publication status	Published - 2022
Event	44th European Conference on Information Retrieval, ECIR 2022 - Stavanger, Norway Duration: 10 Apr 2022 → 14 Apr 2022

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	13185 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	44th European Conference on Information Retrieval, ECIR 2022
Country/Territory	Norway
City	Stavanger
Period	10/04/22 → 14/04/22

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Access to Document

10.1007/978-3-030-99736-6_27

978-3-030-99736-6_27Final published version, 292 KB

Cite this

Penha, G., Câmara, A., & Hauff, C. (2022). Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators. In M. Hagen, S. Verberne, C. Macdonald, C. Seifert, K. Balog, K. Nørvåg, & V. Setty (Eds.), Advances in Information Retrieval - 44th European Conference on IR Research, ECIR 2022, Proceedings (pp. 397-412). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13185 LNCS). Springer. https://doi.org/10.1007/978-3-030-99736-6_27

Penha, Gustavo ; Câmara, Arthur ; Hauff, Claudia. / Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators. Advances in Information Retrieval - 44th European Conference on IR Research, ECIR 2022, Proceedings. editor / Matthias Hagen ; Suzan Verberne ; Craig Macdonald ; Christin Seifert ; Krisztian Balog ; Kjetil Nørvåg ; Vinay Setty. Springer, 2022. pp. 397-412 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{df9e880f61eb4444947c546a787aac8f,

title = "Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators",

abstract = "Heavily pre-trained transformers for language modeling, such as BERT, have shown to be remarkably effective for Information Retrieval (IR) tasks, typically applied to re-rank the results of a first-stage retrieval model. IR benchmarks evaluate the effectiveness of retrieval pipelines based on the premise that a single query is used to instantiate the underlying information need. However, previous research has shown that (I) queries generated by users for a fixed information need are extremely variable and, in particular, (II) neural models are brittle and often make mistakes when tested with modified inputs. Motivated by those observations we aim to answer the following question: how robust are retrieval pipelines with respect to different variations in queries that do not change the queries{\textquoteright} semantics? In order to obtain queries that are representative of users{\textquoteright} querying variability, we first created a taxonomy based on the manual annotation of transformations occurring in a dataset (UQV100) of user-created query variations. For each syntax-changing category of our taxonomy, we employed different automatic methods that when applied to a query generate a query variation. Our experimental results across two datasets for two IR tasks reveal that retrieval pipelines are not robust to these query variations, with effectiveness drops of ≈ 20 % on average. The code and datasets are available at https://github.com/Guzpenha/query_variation_generators.",

author = "Gustavo Penha and Arthur C{\^a}mara and Claudia Hauff",

note = "Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.; 44th European Conference on Information Retrieval, ECIR 2022 ; Conference date: 10-04-2022 Through 14-04-2022",

year = "2022",

doi = "10.1007/978-3-030-99736-6_27",

language = "English",

isbn = "9783030997359",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer",

pages = "397--412",

editor = "Matthias Hagen and Suzan Verberne and Craig Macdonald and Christin Seifert and Krisztian Balog and Kjetil N{\o}rv{\aa}g and Vinay Setty",

booktitle = "Advances in Information Retrieval - 44th European Conference on IR Research, ECIR 2022, Proceedings",

}

Penha, G , Câmara, A & Hauff, C 2022, Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators. in M Hagen, S Verberne, C Macdonald, C Seifert, K Balog, K Nørvåg & V Setty (eds), Advances in Information Retrieval - 44th European Conference on IR Research, ECIR 2022, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13185 LNCS, Springer, pp. 397-412, 44th European Conference on Information Retrieval, ECIR 2022, Stavanger, Norway, 10/04/22. https://doi.org/10.1007/978-3-030-99736-6_27

Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators. / Penha, Gustavo ; Câmara, Arthur; Hauff, Claudia.
Advances in Information Retrieval - 44th European Conference on IR Research, ECIR 2022, Proceedings. ed. / Matthias Hagen; Suzan Verberne; Craig Macdonald; Christin Seifert; Krisztian Balog; Kjetil Nørvåg; Vinay Setty. Springer, 2022. p. 397-412 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13185 LNCS).

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators

AU - Penha, Gustavo

AU - Câmara, Arthur

AU - Hauff, Claudia

N1 - Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

PY - 2022

Y1 - 2022

N2 - Heavily pre-trained transformers for language modeling, such as BERT, have shown to be remarkably effective for Information Retrieval (IR) tasks, typically applied to re-rank the results of a first-stage retrieval model. IR benchmarks evaluate the effectiveness of retrieval pipelines based on the premise that a single query is used to instantiate the underlying information need. However, previous research has shown that (I) queries generated by users for a fixed information need are extremely variable and, in particular, (II) neural models are brittle and often make mistakes when tested with modified inputs. Motivated by those observations we aim to answer the following question: how robust are retrieval pipelines with respect to different variations in queries that do not change the queries’ semantics? In order to obtain queries that are representative of users’ querying variability, we first created a taxonomy based on the manual annotation of transformations occurring in a dataset (UQV100) of user-created query variations. For each syntax-changing category of our taxonomy, we employed different automatic methods that when applied to a query generate a query variation. Our experimental results across two datasets for two IR tasks reveal that retrieval pipelines are not robust to these query variations, with effectiveness drops of ≈ 20 % on average. The code and datasets are available at https://github.com/Guzpenha/query_variation_generators.

AB - Heavily pre-trained transformers for language modeling, such as BERT, have shown to be remarkably effective for Information Retrieval (IR) tasks, typically applied to re-rank the results of a first-stage retrieval model. IR benchmarks evaluate the effectiveness of retrieval pipelines based on the premise that a single query is used to instantiate the underlying information need. However, previous research has shown that (I) queries generated by users for a fixed information need are extremely variable and, in particular, (II) neural models are brittle and often make mistakes when tested with modified inputs. Motivated by those observations we aim to answer the following question: how robust are retrieval pipelines with respect to different variations in queries that do not change the queries’ semantics? In order to obtain queries that are representative of users’ querying variability, we first created a taxonomy based on the manual annotation of transformations occurring in a dataset (UQV100) of user-created query variations. For each syntax-changing category of our taxonomy, we employed different automatic methods that when applied to a query generate a query variation. Our experimental results across two datasets for two IR tasks reveal that retrieval pipelines are not robust to these query variations, with effectiveness drops of ≈ 20 % on average. The code and datasets are available at https://github.com/Guzpenha/query_variation_generators.

UR - http://www.scopus.com/inward/record.url?scp=85128730029&partnerID=8YFLogxK

U2 - 10.1007/978-3-030-99736-6_27

DO - 10.1007/978-3-030-99736-6_27

M3 - Conference contribution

AN - SCOPUS:85128730029

SN - 9783030997359

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 397

EP - 412

BT - Advances in Information Retrieval - 44th European Conference on IR Research, ECIR 2022, Proceedings

A2 - Hagen, Matthias

A2 - Verberne, Suzan

A2 - Macdonald, Craig

A2 - Seifert, Christin

A2 - Balog, Krisztian

A2 - Nørvåg, Kjetil

A2 - Setty, Vinay

PB - Springer

T2 - 44th European Conference on Information Retrieval, ECIR 2022

Y2 - 10 April 2022 through 14 April 2022

ER -

Penha G , Câmara A, Hauff C. Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators. In Hagen M, Verberne S, Macdonald C, Seifert C, Balog K, Nørvåg K, Setty V, editors, Advances in Information Retrieval - 44th European Conference on IR Research, ECIR 2022, Proceedings. Springer. 2022. p. 397-412. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-030-99736-6_27

Evaluating the Robustness of Retrieval Pipelines with Query Variation Generators

Abstract

Publication series

Conference

Bibliographical note

Access to Document

Other files and links

Fingerprint

Cite this