Post

๐Ÿ•น๏ธ NVIDIA introduces RankRAG 8B & 70B

RankRAG: NVIDIAโ€™s Dual-Purpose Re-Ranker/Generation Models

Curiosity: How can a single model handle both context re-ranking and answer generation? What happens when we instruction-tune for both tasks?

NVIDIA introduces RankRAG 8B & 70Bโ€”dual-purpose re-ranker/generation models that outperform GPT-4 across 9 RAG benchmarks.

 NVIDIA RankRAG

Paper: https://arxiv.org/pdf/2407.02485

The Challenge

Retrieve: Traditional RAG limitations.

ProblemDescriptionImpact
Too Many ContextsExceed generation context windowโš ๏ธ Truncation
Too Few ContextsPoor recall when k is smallโš ๏ธ Missing information
Separate ModelsRe-ranker and generator separateโš ๏ธ Complexity

Result: Suboptimal RAG performance.

RankRAG Solution

Innovate: Single model for both tasks.

Key Innovation: Instruction-tune a single LLM for both:

  • Context re-ranking
  • Answer generation

Benefits:

  • โœ… Identify relevant contexts from larger k
  • โœ… Deliver high-quality answers
  • โœ… Simplified architecture
  • โœ… Better performance

RankRAG Architecture

Retrieve: How RankRAG works.

graph TB
    A[Query] --> B[Dragon Retriever]
    B --> C[Top-K Contexts]
    C --> D[RankRAG Model]
    D --> E[Re-Ranking]
    D --> F[Answer Generation]
    E --> G[Ranked Contexts]
    G --> F
    F --> H[Final Answer]
    
    style A fill:#e1f5ff
    style D fill:#fff3cd
    style H fill:#d4edda

Training Method

Retrieve: RankRAGโ€™s training process.

StepProcessPurpose
1. Instruction TuningMultiple datasets (Flan, Dolly)โฌ†๏ธ Base capabilities
2. Data MergingCombine instruction, QA, RAG QA, ranking dataโฌ†๏ธ Specialized training
3. Fine-TuningCombined specialized datasetsโฌ†๏ธ Dual-purpose optimization
4. EvaluationOpen QA, fact verification, conversational QAโฌ†๏ธ Performance assessment
5. DeploymentDragon retriever + RankRAGโฌ†๏ธ Production system

Performance Results

Innovate: RankRAGโ€™s impressive achievements.

Benchmark Performance:

ModelAverage Scorevs. GPT-4
GPT-443.5Baseline
RankRAG 8B52.6+9.1 points
RankRAG 70B56.1+12.6 points

Key Achievements:

  • โœ… Surpasses GPT-4 across 9 RAG benchmarks
  • โœ… Notable gains over ChatQA 1.5
  • โœ… Strong generalization (matches GPT-4 on 5 biomedical benchmarks)
  • โœ… Exceeds specialized re-ranking models
  • โœ… Significant improvements with just 1% ranking data

Key Insights

Retrieve: What makes RankRAG effective.

InsightDescriptionImpact
Dual-PurposeSingle model for both tasksโฌ†๏ธ Efficiency
Instruction TuningSpecialized training dataโฌ†๏ธ Performance
GeneralizationWorks across domainsโฌ†๏ธ Versatility
Data Efficiency1% ranking data helpsโฌ†๏ธ Practical

Key Takeaways

Retrieve: RankRAG demonstrates that a single instruction-tuned LLM can handle both context re-ranking and answer generation, outperforming GPT-4 across 9 RAG benchmarks.

Innovate: By training a dual-purpose model with specialized datasets combining instruction, QA, and ranking data, RankRAG achieves superior performance while simplifying the RAG architecture.

Curiosity โ†’ Retrieve โ†’ Innovation: Start with curiosity about improving RAG performance, retrieve insights from RankRAGโ€™s dual-purpose approach, and innovate by implementing unified re-ranking and generation models in your RAG systems.

Next Steps:

  • Read the full paper
  • Understand RankRAG architecture
  • Experiment with dual-purpose training
  • Deploy RankRAG in your systems
Translate to Korean

8๊ฐœ์˜ RAG ๋ฒค์น˜๋งˆํฌ์—์„œ GPT-4๋ฅผ ๋Šฅ๊ฐ€ํ•˜๋Š” ์ด์ค‘ ๋ชฉ์  ์žฌ๋žญ์ปค/์ƒ์„ฑ ๋ชจ๋ธ ๐Ÿ‘‡๐Ÿ‘‡๐Ÿ‘‡

๊ธฐ์กด์˜ RAG ๋ฐฉ๋ฒ•์€ LLM์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์—์„œ top-k ์ปจํ…์ŠคํŠธ๋ฅผ ๊ฒ€์ƒ‰ํ•˜์ง€๋งŒ, ์ƒ์„ฑ ์ปจํ…์ŠคํŠธ ์ฐฝ์„ ์ดˆ๊ณผํ•˜๋Š” ์ปจํ…์ŠคํŠธ๊ฐ€ ๋„ˆ๋ฌด ๋งŽ๊ฑฐ๋‚˜ k๊ฐ€ ๋„ˆ๋ฌด ์ž‘์„ ๋•Œ ์žฌํ˜„์œจ์ด ๋‚ฎ์„ ๋•Œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

RankRAG ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ์ปจํ…์ŠคํŠธ ์žฌ์ˆœ์œ„ ์ง€์ •๊ณผ ๋‹ต๋ณ€ ์ƒ์„ฑ ๋ชจ๋‘๋ฅผ ์œ„ํ•ด ๋‹จ์ผ LLM์„ ๋ช…๋ น์–ด ํŠœ๋‹ํ•˜์—ฌ ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ๊ทน๋ณตํ•˜๊ณ , ๋” ํฐ ๊ฒ€์ƒ‰๋œ k์—์„œ ๊ด€๋ จ ์ปจํ…์ŠคํŠธ๋ฅผ ์‹๋ณ„ํ•˜๊ณ  ๊ณ ํ’ˆ์งˆ ๋‹ต๋ณ€์„ ์ œ๊ณตํ•˜๋Š” ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.

๋ฉ”์„œ๋“œ:

  • 1๏ธโƒฃ ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ ์„ธํŠธ(์˜ˆ: Flan, Dolly ๋“ฑ)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ช…๋ น์–ด ํŠœ๋‹์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  • 2๏ธโƒฃ ์›๋ณธ ์ง€์นจ ๋ฐ์ดํ„ฐ๋ฅผ QA ๋ฐ์ดํ„ฐ, RAG QA ๋ฐ์ดํ„ฐ, ์ปจํ…์ŠคํŠธ ์ˆœ์œ„ ๋ฐ์ดํ„ฐ ๋ฐ RAG ์ˆœ์œ„ ๋ฐ์ดํ„ฐ์™€ ๋ณ‘ํ•ฉํ•ฉ๋‹ˆ๋‹ค.
  • 3๏ธโƒฃ ์ด๋Ÿฌํ•œ ๊ฒฐํ•ฉ๋œ ํŠน์ˆ˜ ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ ๋ชจ๋ธ์„ ๋‹ค์‹œ ๋ฏธ์„ธ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
  • 4๏ธโƒฃ ๊ฐœ๋ฐฉํ˜• QA, ์‚ฌ์‹ค ํ™•์ธ ๋ฐ ๋Œ€ํ™”ํ˜• QA ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋Œ€ํ•ด ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
  • 5๏ธโƒฃ ์ปจํ…์ŠคํŠธ ๊ฒ€์ƒ‰์—๋Š” ๋“œ๋ž˜๊ณค ๋ฆฌํŠธ๋ฆฌ๋ฒ„๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์ˆœ์œ„ ๋ฐ ๋‹ต๋ณ€ ์ƒ์„ฑ์—๋Š” RankRAG๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

ํ†ต์ฐฐ:

  • ๐Ÿ”ธ RankRAG 8B ๋ฐ 70B ๋ชจ๋ธ์€ 9๊ฐœ์˜ RAG ๋ฒค์น˜๋งˆํฌ์—์„œ GPT-4๋ฅผ ๋Šฅ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
  • ๐Ÿ”ธ ํ‰๊ท  ์ ์ˆ˜: GPT-4 = 43.5, RankRAG 8B = 52.6, RankRAG 70B = 56.1.
  • ๐Ÿ”ธ RankRAG๋Š” ChatQA 1.5์— ๋น„ํ•ด ํŠนํžˆ ์ดˆ๊ธฐ ๊ฒ€์ƒ‰์˜ ์–ด๋ ค์›€์œผ๋กœ ์ธํ•ด ๊นŒ๋‹ค๋กœ์šด ๋ฒค์น˜๋งˆํฌ์—์„œ ๋ˆˆ์— ๋„๋Š” ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
  • ๐Ÿ”ธ RankRAG๋Š” 5๊ฐœ์˜ ์ƒ๋ฌผ์˜ํ•™ RAG ๋ฒค์น˜๋งˆํฌ์—์„œ GPT-4์˜ ์„ฑ๋Šฅ๊ณผ ์ผ์น˜ํ•˜๋Š” ๊ฐ•๋ ฅํ•œ ์ผ๋ฐ˜ํ™”๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
  • ๐Ÿ”ธ RankRAG๋Š” ๋˜ํ•œ ๋” ํฐ ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ ํ›ˆ๋ จ๋œ ํŠน์ˆ˜ ์ˆœ์œ„ ์žฌ์ง€์ • ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๋Šฅ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
  • ๐Ÿ”ธ 1%์˜ ์ˆœ์œ„ ๋ฐ์ดํ„ฐ๋งŒ ์ง€์นจ ๋ฐ์ดํ„ฐ์™€ ํ†ตํ•ฉํ•˜๋ฉด ์ƒ๋‹นํ•œ ๊ฐœ์„ ์ด ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค.
This post is licensed under CC BY 4.0 by the author.