Post

๐€๐ ๐ž๐ง๐ญ๐ข๐œ ๐‘๐€๐† โœจ new cookbook

Agentic RAG Cookbook: Improving RAG with Agent Systems

Curiosity: What if RAG systems could think more like humansโ€”questioning their own retrievals, reformulating queries, and iterating until they find the right answer? What happens when we give RAG the ability to retrieve, critique, and retrieve again?

A new cookbook demonstrates how to easily improve RAG with an agent system using Transformers Agents. This approach addresses key limitations of vanilla RAG by making systems more intelligent and self-correcting.

Vanilla RAG Limitations

Retrieve: Vanilla RAG systems have fundamental limitations that impact performance.

Key Limitations:

LimitationDescriptionImpact
Single RetrievalRetrieves documents only onceโš ๏ธ Poor quality if initial retrieval fails
Suboptimal SimilarityUses user query as referenceโš ๏ธ Questions vs. statements mismatch
No Self-CorrectionCannot refine or re-retrieveโŒ No improvement mechanism

Problem Details:

  • User queries are typically questions
  • Relevant documents use affirmative statements
  • Similarity scores are downgraded
  • No opportunity for improvement

Vanilla RAG vs. Agentic RAG

AspectVanilla RAGAgentic RAG
Retrieval StrategySingle retrieval passIterative retrieval with critique
Query HandlingDirect user queryQuery reformulation & optimization
Self-CorrectionโŒ Noโœ… Yes - can re-retrieve if needed
PerformanceBaseline (70.0%)Improved (+8.5% = 78.5%)
LatencyLower (1 LLM call)Higher (multiple LLM calls)
Qualityโš ๏ธ Limitedโฌ†๏ธ Better

Agentic RAG Solution

Innovate: Making a RAG agentโ€”simply, an agent armed with a retriever toolโ€”alleviates both problems!

Key Capabilities:

  • โœ… Query Reformulation: Agent formulates optimized queries
  • โœ… Self-Query: Agent critiques content and re-retrieves if needed

Architecture:

graph TD
    A[User Query] --> B[Agent: Query Reformulation]
    B --> C[Retrieve Documents]
    C --> D[Agent: Critique Retrieved Content]
    D --> E{Content<br/>Relevant?}
    E -->|No| B
    E -->|Yes| F[Generate Answer]
    F --> G[Final Response]
    
    style B fill:#e1f5ff
    style D fill:#fff3cd
    style F fill:#d4edda
    style E fill:#f8d7da

Performance Comparison

Retrieve: Evaluation with LLM-as-a-judge (Llama-3-70B) shows significant improvement.

MetricVanilla RAGAgentic RAGImprovement
Accuracy Score70.0%78.5%+8.5% ๐Ÿ’ช
LLM Calls13-5Higher latency
Self-CorrectionโŒโœ…Better quality
Query OptimizationโŒโœ…Better retrieval

Trade-offs:

  • โฌ†๏ธ Better quality (+8.5%)
  • โš ๏ธ Higher latency (multiple LLM calls)
  • โš–๏ธ Balance quality vs. speed needed

Sample Agentic RAG Implementation

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
from transformers import pipeline
from langchain.agents import create_react_agent
from langchain.tools import Tool

# Create retrieval tool
retrieval_tool = Tool(
    name="retrieve_documents",
    func=vector_store.similarity_search,
    description="Retrieves relevant documents for a query"
)

# Create agent with retrieval tool
agent = create_react_agent(
    llm=llm,
    tools=[retrieval_tool],
    prompt=agent_prompt
)

# Agent workflow
def agentic_rag(query):
    # Step 1: Query reformulation
    reformulated_query = agent.run(
        f"Reformulate this query for better retrieval: {query}"
    )
    
    # Step 2: Retrieve documents
    docs = retrieval_tool.run(reformulated_query)
    
    # Step 3: Critique and potentially re-retrieve
    critique = agent.run(
        f"Critique these documents for relevance to: {query}\n{docs}"
    )
    
    if "not relevant" in critique.lower():
        # Re-retrieve with different strategy
        docs = retrieval_tool.run(query, k=10)  # Get more docs
    
    # Step 4: Generate answer
    answer = llm.generate(
        context=docs,
        question=query
    )
    
    return answer

๐——๐—ถ๐˜€๐—ฐ๐—ผ๐˜ƒ๐—ฒ๐—ฟ ๐˜๐—ต๐—ฒ ๐—ฐ๐—ผ๐—ผ๐—ธ๐—ฏ๐—ผ๐—ผ๐—ธ ๐Ÿ‘‡

๐—”๐—ด๐—ฒ๐—ป๐˜๐—ถ๐—ฐ ๐——๐—ฎ๐˜๐—ฎ ๐—ฎ๐—ป๐—ฎ๐—น๐˜†๐˜€๐˜: ๐—ฑ๐—ฟ๐—ผ๐—ฝ ๐˜†๐—ผ๐˜‚๐—ฟ ๐—ฑ๐—ฎ๐˜๐—ฎ ๐—ณ๐—ถ๐—น๐—ฒ, ๐—น๐—ฒ๐˜ ๐˜๐—ต๐—ฒ ๐—Ÿ๐—Ÿ๐—  ๐—ฑ๐—ผ ๐˜๐—ต๐—ฒ ๐—ฎ๐—ป๐—ฎ๐—น๐˜†๐˜€๐—ถ๐˜€ ๐Ÿ“Šโš™๏ธ

Need to make quick exploratory data analysis? โžก๏ธ Get help from an agent.

I was impressed by Llama-3.1โ€™s capacity to derive insights from data. Given a csv file, it makes quick work of exploratory data analysis and can derive interesting insights.

On the data from the Kaggle titanic challenge, that records which passengers survived the Titanic wreckage, it was able by itself to derive interesting trends like โ€œpassengers that paid higher fares were more likely to surviveโ€ or โ€œsurvival rate was much higher for women than menโ€.

The cookbook even lets the agent built its own submission to the challenge, and it ranks under 3,000 out of 17,000 submissions: ๐Ÿ‘ not bad at all!

  • Try it for yourself in this Space demo ๐Ÿ‘‰ https://lnkd.in/gzaqQ3rT
  • Read the cookbook to dive deeper ๐Ÿ‘‰ https://lnkd.in/gXx3-AyH
Translate to Korean

๋ฐฉ๊ธˆ Transformers Agents๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ์œผ๋กœ RAG(Retrieval Augmented Generation)๋ฅผ ์‰ฝ๊ฒŒ ๊ฐœ์„ ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ฃผ๋Š” ์ƒˆ๋กœ์šด ์ฟก๋ถ์„ ์ถœํŒํ–ˆ์Šต๋‹ˆ๋‹ค.

Vanilla RAG์—๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ œํ•œ ์‚ฌํ•ญ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

  • โžค ์†Œ์Šค ๋ฌธ์„œ๋ฅผ ํ•œ ๋ฒˆ๋งŒ ๊ฒ€์ƒ‰ํ•ฉ๋‹ˆ๋‹ค: ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ๊ฐ€ ์ถฉ๋ถ„ํžˆ ๊ด€๋ จ์„ฑ์ด ์—†์œผ๋ฉด ์ƒ์„ฑ์ด ๋‚˜๋น ์งˆ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
  • โžค ์˜๋ฏธ๋ก ์  ์œ ์‚ฌ์„ฑ์€ ์‚ฌ์šฉ์ž ์ฟผ๋ฆฌ๋ฅผ ์ฐธ์กฐ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ๊ณ„์‚ฐ๋˜๋ฉฐ, ์ด๋Š” ์ข…์ข… ์ฐจ์„ ์ฑ…์ž…๋‹ˆ๋‹ค: ์˜ˆ๋ฅผ ๋“ค์–ด, ์‚ฌ์šฉ์ž ์ฟผ๋ฆฌ๋Š” ๋Œ€๋ถ€๋ถ„ ์งˆ๋ฌธ์ด๊ณ  ์‹ค์ œ ๋‹ต๋ณ€์„ ํฌํ•จํ•˜๋Š” ๋ฌธ์„œ๋Š” ๊ธ์ • ์Œ์„ฑ์ด๋ฏ€๋กœ ์œ ์‚ฌ์„ฑ ์ ์ˆ˜๋Š” ์˜๋ฌธ ํ˜•์‹์˜ ๊ด€๋ จ์„ฑ์ด ๋‚ฎ์€ ์†Œ์Šค ๋ฌธ์„œ์— ๋น„ํ•ด ๋‹ค์šด๊ทธ๋ ˆ์ด๋“œ๋˜์–ด ๊ด€๋ จ ๋ฌธ์„œ๋ฅผ ์„ ํƒํ•˜์ง€ ์•Š์„ ์œ„ํ—˜์ด ์žˆ์Šต๋‹ˆ๋‹ค.

RAG ์—์ด์ „ํŠธ๋ฅผ ๋งŒ๋“ค๋ฉด(์•„์ฃผ ๊ฐ„๋‹จํ•˜๊ฒŒ, ๋ฆฌํŠธ๋ฆฌ๋ฒ„ ๋„๊ตฌ๋กœ ๋ฌด์žฅํ•œ ์—์ด์ „ํŠธ) ์ด ๋‘ ๊ฐ€์ง€ ๋ฌธ์ œ๋ฅผ ๋ชจ๋‘ ์™„ํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!

  • โœ… ์ฟผ๋ฆฌ ์ž์ฒด๋ฅผ ๊ณต์‹ํ™”ํ•ฉ๋‹ˆ๋‹ค(์ฟผ๋ฆฌ ์žฌ๊ตฌ์„ฑ).
  • โœ… ํ•„์š”ํ•œ ๊ฒฝ์šฐ ๋‹ค์‹œ ๊ฒ€์ƒ‰ํ•  ์ฝ˜ํ…์ธ  ๋น„ํŒ(์ž์ฒด ์ฟผ๋ฆฌ)Critique the content to re-retrieve if needed (self-query)

์ด ์—์ด์ „ํŠธ ์„ค์ •์ด ๊ฒฐ๊ณผ๋ฅผ ์–ผ๋งˆ๋‚˜ ๊ฐœ์„ ํ•ฉ๋‹ˆ๊นŒ? ์š”๋ฆฌ์ฑ…์— Llama-3-70B๋ฅผ ์‚ฌ์šฉํ•˜๋Š” LLM-as-a-judge์˜ ํ‰๊ฐ€ ๋ถ€๋ถ„์„ ์ถ”๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ฐ”๋‹๋ผ์—์„œ ์—์ด์ „ํŠธ RAG๋กœ ์ „ํ™˜ํ•˜๋ฉด ์ ์ˆ˜๊ฐ€ 8.5% ์ฆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค! ๐Ÿ’ช (70.0%์—์„œ 78.5%๋กœ)

ํ•˜์ง€๋งŒ ํ•œ ๊ฐ€์ง€ ์ค‘์š”ํ•œ ๋‹จ์ ์€, ์‹œ์Šคํ…œ์ด 1์ด ์•„๋‹Œ ์—ฌ๋Ÿฌ LLM ํ˜ธ์ถœ์„ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— RAG ์‹œ์Šคํ…œ์˜ ๋Ÿฐํƒ€์ž„๋„ ์ฆ๊ฐ€ํ•œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ ์ ˆํ•œ ์ ˆ์ถฉ์•ˆ์„ ์ฐพ์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค!

This post is licensed under CC BY 4.0 by the author.