Multi-Source Knowledge Pruning for Retrieval-Augmented Generation: A Benchmark and Empirical Study
September 03, 2024 ยท Declared Dead ยท ๐ International Conference on Information and Knowledge Management
Authors
Shuo Yu, Mingyue Cheng, Qi Liu, Daoyu Wang, Jiqian Yang, Jie Ouyang, Yucong Luo, Chenyi Lei, Enhong Chen
arXiv ID
2409.13694
Category
cs.CL: Computation & Language
Cross-listed
cs.AI,
cs.IR
Citations
5
Venue
International Conference on Information and Knowledge Management
Repository
https://github.com/USTCAGI/PruningRAG}
Last Checked
1 month ago
Abstract
Retrieval-augmented generation (RAG) is increasingly recognized as an effective approach to mitigating the hallucination of large language models (LLMs) through the integration of external knowledge. While numerous efforts, most studies focus on a single type of external knowledge source. However, in real-world applications, most situations involve diverse knowledge from various sources, yet this area has been less explored. The main dilemma is the lack of a suitable dataset containing multiple knowledge sources and pre-exploration of the associated issues. To address these challenges, we standardize a benchmark dataset that combines structured and unstructured knowledge across diverse and complementary domains. Based on this dataset, we further develop a plug-and-play RAG framework, \textbf{PruningRAG}, whose main characteristic is the use of multi-granularity pruning strategies to optimize the integration of relevant information while minimizing misleading context. It consistently improves performance across various existing RAG variants, demonstrating its robustness and broad applicability. Building upon the standardized dataset and PruningRAG, we also report a series of experimental results, as well as insightful findings. Our dataset and code are publicly available\footnote{https://github.com/USTCAGI/PruningRAG}, with the aim of advancing future research in the RAG community.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computation & Language
๐
๐
Old Age
๐
๐
Old Age
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
RoBERTa: A Robustly Optimized BERT Pretraining Approach
R.I.P.
๐ป
Ghosted
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
R.I.P.
๐ป
Ghosted
Deep contextualized word representations
Died the same way โ ๐ 404 Not Found
R.I.P.
๐
404 Not Found
Deep High-Resolution Representation Learning for Visual Recognition
R.I.P.
๐
404 Not Found
HuggingFace's Transformers: State-of-the-art Natural Language Processing
R.I.P.
๐
404 Not Found
CCNet: Criss-Cross Attention for Semantic Segmentation
R.I.P.
๐
404 Not Found