Abstract
We study Label Smoothing (LS), a widely used regularization technique, in the context of neural learning to rank (L2R) models. LS combines the ground-truth labels with a uniform distribution, encouraging the model to be less confident in its predictions. We analyze the relationship between the non-relevant documents—specifically how they are sampled—and the effectiveness of LS, discussing how LS can be capturing “hidden similarity knowledge” between the relevant and non-relevant document classes. We further analyze LS by testing if a curriculum-learning approach, i.e., starting with LS and after a number of iterations using only ground-truth labels, is beneficial. Inspired by our investigation of LS in the context of neural L2R models, we propose a novel technique called Weakly Supervised Label Smoothing (WSLS) that takes advantage of the retrieval scores of the negative sampled documents as a weak supervision signal in the process of modifying the ground-truth labels. WSLS is simple to implement, requiring no modification to the neural ranker architecture. Our experiments across three retrieval tasks—passage retrieval, similar question retrieval and conversation response ranking—show that WSLS for pointwise BERT-based rankers leads to consistent effectiveness gains. The source code is available at https://github.com/Guzpenha/transformer_rankers/tree/wsls.
Original language | English |
---|---|
Title of host publication | Advances in Information Retrieval - 43rd European Conference on IR Research, ECIR 2021, Proceedings |
Subtitle of host publication | 43rd European Conference on IR Research, ECIR 2021, Virtual Event, March 28 – April 1, 2021, Proceedings, Part II |
Editors | Djoerd Hiemstra, Marie-Francine Moens, Josiane Mothe, Raffaele Perego, Martin Potthast, Fabrizio Sebastiani |
Place of Publication | Cham |
Publisher | Springer |
Pages | 334-341 |
Number of pages | 8 |
ISBN (Electronic) | 978-3-030-72240-1 |
ISBN (Print) | 978-3-030-72239-5 |
DOIs | |
Publication status | Published - 2021 |
Event | ECIR 2021: 43rd European Conference on Information Retrieval - Virtual/online event due to COVID-19, Online at Lucca, Italy Duration: 28 Mar 2021 → 1 Apr 2021 Conference number: 43rd |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 12657 LNCS |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | ECIR 2021 |
---|---|
Country/Territory | Italy |
City | Online at Lucca |
Period | 28/03/21 → 1/04/21 |