Post

๐Ÿ•น๏ธ NVIDIA introduces RankRAG 8B & 70B

 NVIDIA RankRAG

๐Ÿ‘‰ Pager : https://arxiv.org/pdf/2407.02485

Dual-purpose re-ranker/generation models that outperform GPT-4 across 9 RAG benchmarks ๐Ÿ‘‡๐Ÿ‘‡๐Ÿ‘‡

Traditional RAG methods retrieve the top-k contexts from a database in order to generate answers using an LLM, but face challenges when too many contexts exceed the generation context window or poor recall when k is too small.

The RankRAG framework overcomes these issues by instruction-tuning a single LLM for both context re-ranking and answer generation, enhancing its ability to identify relevant contexts from a larger retrieved k and deliver high-quality answers.

Method:

  • 1๏ธโƒฃ Perform instruction tuning using multiple datasets (e.g., Flan, Dolly, etc.).
  • 2๏ธโƒฃ Merge original instruction data with QA data, RAG QA data, context ranking data, and RAG ranking data.
  • 3๏ธโƒฃ Fine-tune model again on these combined specialized datasets.
  • 4๏ธโƒฃ Evaluate on open QA, fact verification, and conversational QA datasets.
  • 5๏ธโƒฃ Use the Dragon retriever for context retrieval and RankRAG for ranking and answer generation.

Insights:

  • ๐Ÿ”ธ RankRAG 8B and 70B models surpass GPT-4 across 9 RAG benchmarks.
  • ๐Ÿ”ธ Average scores: GPT-4 = 43.5, RankRAG 8B = 52.6, RankRAG 70B = 56.1.
  • ๐Ÿ”ธ RankRAG shows notable performance gains over ChatQA 1.5, particularly on challenging benchmarks due to initial retrieval difficulty.
  • ๐Ÿ”ธ RankRAG demonstrates strong generalization, matching GPT-4โ€™s performance on 5 biomedical RAG benchmarks.
  • ๐Ÿ”ธ RankRAG also exceeds the performance of specialized re-ranking models trained on larger datasets.
  • ๐Ÿ”ธ Incorporating just 1% ranking data with instruction data yields significant improvements.
Translate to Korean

8๊ฐœ์˜ RAG ๋ฒค์น˜๋งˆํฌ์—์„œ GPT-4๋ฅผ ๋Šฅ๊ฐ€ํ•˜๋Š” ์ด์ค‘ ๋ชฉ์  ์žฌ๋žญ์ปค/์ƒ์„ฑ ๋ชจ๋ธ ๐Ÿ‘‡๐Ÿ‘‡๐Ÿ‘‡

๊ธฐ์กด์˜ RAG ๋ฐฉ๋ฒ•์€ LLM์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ต๋ณ€์„ ์ƒ์„ฑํ•˜๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์—์„œ top-k ์ปจํ…์ŠคํŠธ๋ฅผ ๊ฒ€์ƒ‰ํ•˜์ง€๋งŒ, ์ƒ์„ฑ ์ปจํ…์ŠคํŠธ ์ฐฝ์„ ์ดˆ๊ณผํ•˜๋Š” ์ปจํ…์ŠคํŠธ๊ฐ€ ๋„ˆ๋ฌด ๋งŽ๊ฑฐ๋‚˜ k๊ฐ€ ๋„ˆ๋ฌด ์ž‘์„ ๋•Œ ์žฌํ˜„์œจ์ด ๋‚ฎ์„ ๋•Œ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•ฉ๋‹ˆ๋‹ค.

RankRAG ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ์ปจํ…์ŠคํŠธ ์žฌ์ˆœ์œ„ ์ง€์ •๊ณผ ๋‹ต๋ณ€ ์ƒ์„ฑ ๋ชจ๋‘๋ฅผ ์œ„ํ•ด ๋‹จ์ผ LLM์„ ๋ช…๋ น์–ด ํŠœ๋‹ํ•˜์—ฌ ์ด๋Ÿฌํ•œ ๋ฌธ์ œ๋ฅผ ๊ทน๋ณตํ•˜๊ณ , ๋” ํฐ ๊ฒ€์ƒ‰๋œ k์—์„œ ๊ด€๋ จ ์ปจํ…์ŠคํŠธ๋ฅผ ์‹๋ณ„ํ•˜๊ณ  ๊ณ ํ’ˆ์งˆ ๋‹ต๋ณ€์„ ์ œ๊ณตํ•˜๋Š” ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ต๋‹ˆ๋‹ค.

๋ฉ”์„œ๋“œ:

  • 1๏ธโƒฃ ์—ฌ๋Ÿฌ ๋ฐ์ดํ„ฐ ์„ธํŠธ(์˜ˆ: Flan, Dolly ๋“ฑ)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ช…๋ น์–ด ํŠœ๋‹์„ ์ˆ˜ํ–‰ํ•ฉ๋‹ˆ๋‹ค.
  • 2๏ธโƒฃ ์›๋ณธ ์ง€์นจ ๋ฐ์ดํ„ฐ๋ฅผ QA ๋ฐ์ดํ„ฐ, RAG QA ๋ฐ์ดํ„ฐ, ์ปจํ…์ŠคํŠธ ์ˆœ์œ„ ๋ฐ์ดํ„ฐ ๋ฐ RAG ์ˆœ์œ„ ๋ฐ์ดํ„ฐ์™€ ๋ณ‘ํ•ฉํ•ฉ๋‹ˆ๋‹ค.
  • 3๏ธโƒฃ ์ด๋Ÿฌํ•œ ๊ฒฐํ•ฉ๋œ ํŠน์ˆ˜ ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ ๋ชจ๋ธ์„ ๋‹ค์‹œ ๋ฏธ์„ธ ์กฐ์ •ํ•ฉ๋‹ˆ๋‹ค.
  • 4๏ธโƒฃ ๊ฐœ๋ฐฉํ˜• QA, ์‚ฌ์‹ค ํ™•์ธ ๋ฐ ๋Œ€ํ™”ํ˜• QA ๋ฐ์ดํ„ฐ ์„ธํŠธ์— ๋Œ€ํ•ด ํ‰๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
  • 5๏ธโƒฃ ์ปจํ…์ŠคํŠธ ๊ฒ€์ƒ‰์—๋Š” ๋“œ๋ž˜๊ณค ๋ฆฌํŠธ๋ฆฌ๋ฒ„๋ฅผ ์‚ฌ์šฉํ•˜๊ณ  ์ˆœ์œ„ ๋ฐ ๋‹ต๋ณ€ ์ƒ์„ฑ์—๋Š” RankRAG๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

ํ†ต์ฐฐ:

  • ๐Ÿ”ธ RankRAG 8B ๋ฐ 70B ๋ชจ๋ธ์€ 9๊ฐœ์˜ RAG ๋ฒค์น˜๋งˆํฌ์—์„œ GPT-4๋ฅผ ๋Šฅ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
  • ๐Ÿ”ธ ํ‰๊ท  ์ ์ˆ˜: GPT-4 = 43.5, RankRAG 8B = 52.6, RankRAG 70B = 56.1.
  • ๐Ÿ”ธ RankRAG๋Š” ChatQA 1.5์— ๋น„ํ•ด ํŠนํžˆ ์ดˆ๊ธฐ ๊ฒ€์ƒ‰์˜ ์–ด๋ ค์›€์œผ๋กœ ์ธํ•ด ๊นŒ๋‹ค๋กœ์šด ๋ฒค์น˜๋งˆํฌ์—์„œ ๋ˆˆ์— ๋„๋Š” ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
  • ๐Ÿ”ธ RankRAG๋Š” 5๊ฐœ์˜ ์ƒ๋ฌผ์˜ํ•™ RAG ๋ฒค์น˜๋งˆํฌ์—์„œ GPT-4์˜ ์„ฑ๋Šฅ๊ณผ ์ผ์น˜ํ•˜๋Š” ๊ฐ•๋ ฅํ•œ ์ผ๋ฐ˜ํ™”๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
  • ๐Ÿ”ธ RankRAG๋Š” ๋˜ํ•œ ๋” ํฐ ๋ฐ์ดํ„ฐ ์„ธํŠธ์—์„œ ํ›ˆ๋ จ๋œ ํŠน์ˆ˜ ์ˆœ์œ„ ์žฌ์ง€์ • ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ๋Šฅ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค.
  • ๐Ÿ”ธ 1%์˜ ์ˆœ์œ„ ๋ฐ์ดํ„ฐ๋งŒ ์ง€์นจ ๋ฐ์ดํ„ฐ์™€ ํ†ตํ•ฉํ•˜๋ฉด ์ƒ๋‹นํ•œ ๊ฐœ์„ ์ด ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค.
This post is licensed under CC BY 4.0 by the author.