Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval

Abstract

Mitigating hallucinations is a prerequisite for trusting answers generated by large language models (LLMs) that are prone to making convincing but inaccurate claims. Grounding the answers in data generated and verified by humans provides a natural avenue for improving the reliability of LLMs. However, it can be hard to capture relevant facts for user questions based on just the semantic similarity, especially as questions becomes more complex and the relevant facts become more indirect. What if LLMs could query for relevant facts based on the user question? While this can enable retrieving relevant but indirect facts, zero-shot performance of instruction-tuned LLMs leaves more to be desired and generating supervision on how to retrieve relevant facts can be expensive and retriever dependent. Our key insight is that LLMs can learn to retrieve relevant facts by different queries, learning to upweight queries that result in relevant facts. This leads to our reinforcement learning based framework, Learning to Retrieve by Trying ( LeReT), where the LLM generates queries for multi-hop retrieval and uses preference-based reinforcement learning to improve the LLM queries. Our experimental results demonstrate that LeReT can improve the absolute retrieval accuracy by up to 29% and the downstream generator evaluations by 17%. The simplicity and flexibility of LeReT allows it to be applied to arbitrary retrievers, and makes it a promising technique for improving general LLM pipelines.

Method Overview

LeReT

LeReT utilizes few shot prompting to sample diverse and effective search queries and then uses the retrieval reward to rank queries. The collected preference pairs are then used to fine tune the model.

Prompt Driven Diverse Query Generation

Model training

@misc{hsu2024groundingtryingllmsreinforcement, title={Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval}, author={Sheryl Hsu and Omar Khattab and Chelsea Finn and Archit Sharma}, year={2024} }

Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval

LeReT significantly improves retrieval and generation.

Abstract

Method Overview

LeReT utilizes few shot prompting to sample diverse and effective search queries and then uses the retrieval reward to rank queries. The collected preference pairs are then used to fine tune the model.

Prompt Driven Diverse Query Generation

Model training

Results

LeReT significantly improves retrieval accuracy and generation across both datasets and base models.

BibTeX