RAG vs. GAR: A primer on Generation Augmented Retrieval

Retrieval Augmented Generation (RAG) is a great pattern with many potential uses and an effective way to "ground" LLMs in a specific dataset. However, there are several disadvantages that prevented us from using it in a recent project.

The challenge we had to solve

For a domain-specific search engine (currently in private Beta; I’ll add the link when it goes public), we needed to provide highly accurate results to our users that:

A: Match user queries to complex objects with many conditions and edge case.
B: Display exactly the information we want to show in an attractive design.
C: Give users the ability to interact with the provided results (view additional details, save the results, perform further actions).

Why we decided against RAG

Conversational interfaces do not always offer the best UX: Many people try to opt out of the newly launched Google AI search for a reason. Especially in productive contexts, a list of well-designed, feature-rich, and interactive results is often more effective.
Prompt hacking and PR nightmares: Whenever you launch a chatbot under your name or brand, you are responsible for the answers it generates. There are already many examples of poorly constrained LLMs causing problems.
The same texts get generated repeatedly: When creating answers based on a specific data source, LLMs often duplicate existing content. However, in many cases, the texts within a dataset are already great the way they are. Additionally, there are situations (such as with medical information) where content must be returned without any changes.
Garbage in, garbage out: RAG is only as good as the content fed into the chosen LLM's context. However, since the devil is in the details (think legal texts), retrieving content based on semantic similarity (a common RAG pattern) may not always be sufficient.

Enter Generation Augmented Retrieval (GAR)

Generation Augmented Retrieval turns LLMs into smart, context-aware query engines that tell the application what objects to load from the database in order to display them as rich, interactive UI elements.

The main difference from RAG is that in GAR, LLMs are primarily used for their reasoning capabilities rather than text generation.

The following approach provided us with exactly what we needed:

The user creates a query (plain text plus two drop-downs in our case).
The LLM receives the plain text query together with the most essential properties (ID, summary, detailed edge cases and conditions) of up to 100 database entries (pre-filtered by the drop-downs).
The LLM uses its reasoning capabilities to select which entries best match the provided information. We’ve made good progress with Claude Sonnet and a Chain-of-Thought (CoT) prompting approach (GPT-4o is faster, but the reasoning does not appear to be as strong).
The LLM returns the IDs of the selected entries as JSON. We also opted for including a small quote per entry that explains why the entry is a good fit - CoT actually produces this as a side effect.
The application resolves the IDs to a well-designed, feature-rich, and interactive list of results.

GAR disadvantages

GAR has its downsides and is definitely not a fit for every use case:

Speed: Especially when using more complex Chain-of-Thought prompting patterns, getting an answer might take a while. This can be mitigated by streaming the generated JSON and by using faster models (see GPT-4o, Gemini Flash, Groq, etc.).
Limited context size: While contexts are getting larger, you still might need to filter the content you pass to the LLM (either by using vector-based similarity search or some other retrieval method).
Less chattiness: In some situations, a bubbly chatbot is exactly what you want. However, you can also show rich, interactive objects inside conversational interfaces (and therefore use GAR in combination with RAG).

Final thoughts

With increasing inference speeds and larger context sizes, GAR will become more practical in the future. It is not a replacement for RAG but could be a good alternative when you need tight control over the outputs of an LLM without missing out on its reasoning abilities.

Command Palette

The challenge we had to solve

Why we decided against RAG

Enter Generation Augmented Retrieval (GAR)

GAR disadvantages

Final thoughts

Comments