Abstract
Large language models (LLMs) have recently gained significant attention due to their unparalleled zero-shot performance on various natural language processing tasks. However, the pre-Training data utilized in LLMs is often confined to a specific corpus, resulting in inherent freshness and temporal scope limitations. Consequently, this raises concerns regarding the effectiveness of LLMs for tasks involving temporal intents. In this study, we aim to investigate the underlying limitations of general-purpose LLMs when deployed for tasks that require a temporal understanding. We pay particular attention to handling factual temporal knowledge through three popular temporal QA datasets. Specifically, we observe low performance on detailed questions about the past and, surprisingly, for rather new information. In manual and automatic testing, we find multiple temporal errors and characterize the conditions under which QA performance deteriorates. Our analysis contributes to understanding LLM limitations and offers valuable insights into developing future models that can better cater to the demands of temporally-oriented tasks. The code is available https://github.com/jwallat/temporalblindspots.
Original language | English |
---|---|
Title of host publication | WSDM 2024 - Proceedings of the 17th ACM International Conference on Web Search and Data Mining |
Place of Publication | New York |
Publisher | ACM |
Pages | 683-692 |
Number of pages | 10 |
ISBN (Print) | 979-8-4007-0371-3 |
DOIs | |
Publication status | Published - 2024 |
Event | 17th ACM International Conference on Web Search and Data Mining, WSDM 2024 - Merida, Mexico Duration: 4 Mar 2024 → 8 Mar 2024 |
Conference
Conference | 17th ACM International Conference on Web Search and Data Mining, WSDM 2024 |
---|---|
Country/Territory | Mexico |
City | Merida |
Period | 4/03/24 → 8/03/24 |
Bibliographical note
Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-careOtherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.
Keywords
- large language models
- question answering
- temporal information retrieval
- temporal query intents