Home » Publication » 29102

Dettaglio pubblicazione

2024, Proceedings of the 14th Italian Information Retrieval Workshop (IIR 2024), Pages 95-98 (volume: 3802)

Rethinking Relevance: How Noise and Distractors Impact Retrieval-Augmented Generation (04b Atto di convegno in volume)

Cuconasu Florin, Trappolini Giovanni, Siciliano Federico, Filice Simone, Campagnano Cesare, Maarek Yoelle, Tonellotto Nicola, Silvestri Fabrizio

Retrieval-Augmented Generation (RAG) systems enhance the performance of Large Language Models (LLMs) by incorporating external information fetched from a retriever component. While traditional approaches prioritize retrieving “relevant” documents, our research reveals that these documents can be a double-edged sword. We explore the counterintuitive benefits of integrating noisy, non-relevant documents into the retrieval process. In particular, we conduct an analysis of how different types of retrieved documents—relevant, distracting, and random—affect the overall effectiveness of RAG systems. Our findings reveal that the inclusion of random documents, often perceived as noise, can significantly improve LLM accuracy, with gains up to 35%. Conversely, highly scored but non-relevant documents from the retriever negatively impact performance. These insights challenge conventional retrieval strategies and suggest a paradigm shift towards rethinking information retrieval for neural models.
keywords
© Università degli Studi di Roma "La Sapienza" - Piazzale Aldo Moro 5, 00185 Roma