Post

๐€๐ ๐ž๐ง๐ญ๐ข๐œ ๐‘๐€๐† โœจ new cookbook

I just published a new cookbook showing how to easily improve Retrieval Augmented Generation (RAG) with an agent system using Transformers Agents.

Vanilla RAG has the following limitations:

  • โžค ๐—œ๐˜ ๐—ฟ๐—ฒ๐˜๐—ฟ๐—ถ๐—ฒ๐˜ƒ๐—ฒ๐˜€ ๐˜€๐—ผ๐˜‚๐—ฟ๐—ฐ๐—ฒ ๐—ฑ๐—ผ๐—ฐ๐˜‚๐—บ๐—ฒ๐—ป๐˜ ๐—ผ๐—ป๐—น๐˜† ๐—ผ๐—ป๐—ฐ๐—ฒ: if the retrieved docuents are not relevant enough the generation in turn will be bad.
  • โžค Semantic similarity is computed ๐™ฌ๐™ž๐™ฉ๐™ ๐™ฉ๐™๐™š ๐™ช๐™จ๐™š๐™ง ๐™ฆ๐™ช๐™š๐™ง๐™ฎ ๐™–๐™จ ๐™– ๐™ง๐™š๐™›๐™š๐™ง๐™š๐™ฃ๐™˜๐™š, which is often suboptimal: for instance, the user query will mostly be a question and the document containing the true answer will be in affirmative voice, so its similarity score will be downgraded compared to less relevant source documents in the interrogative form, leading to a risk of not selecting the relevant document.

๐™ˆ๐™–๐™ ๐™ž๐™ฃ๐™œ ๐™– ๐™๐˜ผ๐™‚ ๐™–๐™œ๐™š๐™ฃ๐™ฉ - ๐™ซ๐™š๐™ง๐™ฎ ๐™จ๐™ž๐™ข๐™ฅ๐™ก๐™ฎ, ๐™–๐™ฃ ๐™–๐™œ๐™š๐™ฃ๐™ฉ ๐™–๐™ง๐™ข๐™š๐™™ ๐™ฌ๐™ž๐™ฉ๐™ ๐™– ๐™ง๐™š๐™ฉ๐™ง๐™ž๐™š๐™ซ๐™š๐™ง ๐™ฉ๐™ค๐™ค๐™ก - ๐™–๐™ก๐™ก๐™š๐™ซ๐™ž๐™–๐™ฉ๐™š๐™จ ๐™—๐™ค๐™ฉ๐™ ๐™ฉ๐™๐™š๐™จ๐™š ๐™ฅ๐™ง๐™ค๐™—๐™ก๐™š๐™ข๐™จ!

  • โœ… Formulate the query itself (query reformulation)
  • โœ… Critique the content to re-retrieve if needed (self-query)

๐—›๐—ผ๐˜„ ๐—บ๐˜‚๐—ฐ๐—ต ๐—ฑ๐—ผ๐—ฒ๐˜€ ๐˜๐—ต๐—ถ๐˜€ ๐—ฎ๐—ด๐—ฒ๐—ป๐˜๐—ถ๐—ฐ ๐˜€๐—ฒ๐˜๐˜‚๐—ฝ ๐—ถ๐—บ๐—ฝ๐—ฟ๐—ผ๐˜ƒ๐—ฒ ๐—ฟ๐—ฒ๐˜€๐˜‚๐—น๐˜๐˜€? Iโ€™ve added to the cookbook an evaluation part with LLM-as-a-judge using Llama-3-70B. When switching from vanilla to agentic RAG, the ๐˜€๐—ฐ๐—ผ๐—ฟ๐—ฒ ๐—ถ๐—ป๐—ฐ๐—ฟ๐—ฒ๐—ฎ๐˜€๐—ฒ๐˜€ ๐—ฏ๐˜† ๐Ÿด.๐Ÿฑ%! ๐Ÿ’ช (from 70.0% to 78.5%)

One important drawback though: since the system is now doing several LLM calls instead of 1, the runtime of the RAG system also increases. You have to find the right trade-off!

๐——๐—ถ๐˜€๐—ฐ๐—ผ๐˜ƒ๐—ฒ๐—ฟ ๐˜๐—ต๐—ฒ ๐—ฐ๐—ผ๐—ผ๐—ธ๐—ฏ๐—ผ๐—ผ๐—ธ ๐Ÿ‘‡

๐—”๐—ด๐—ฒ๐—ป๐˜๐—ถ๐—ฐ ๐——๐—ฎ๐˜๐—ฎ ๐—ฎ๐—ป๐—ฎ๐—น๐˜†๐˜€๐˜: ๐—ฑ๐—ฟ๐—ผ๐—ฝ ๐˜†๐—ผ๐˜‚๐—ฟ ๐—ฑ๐—ฎ๐˜๐—ฎ ๐—ณ๐—ถ๐—น๐—ฒ, ๐—น๐—ฒ๐˜ ๐˜๐—ต๐—ฒ ๐—Ÿ๐—Ÿ๐—  ๐—ฑ๐—ผ ๐˜๐—ต๐—ฒ ๐—ฎ๐—ป๐—ฎ๐—น๐˜†๐˜€๐—ถ๐˜€ ๐Ÿ“Šโš™๏ธ

Need to make quick exploratory data analysis? โžก๏ธ Get help from an agent.

I was impressed by Llama-3.1โ€™s capacity to derive insights from data. Given a csv file, it makes quick work of exploratory data analysis and can derive interesting insights.

On the data from the Kaggle titanic challenge, that records which passengers survived the Titanic wreckage, it was able by itself to derive interesting trends like โ€œpassengers that paid higher fares were more likely to surviveโ€ or โ€œsurvival rate was much higher for women than menโ€.

The cookbook even lets the agent built its own submission to the challenge, and it ranks under 3,000 out of 17,000 submissions: ๐Ÿ‘ not bad at all!

  • Try it for yourself in this Space demo ๐Ÿ‘‰ https://lnkd.in/gzaqQ3rT
  • Read the cookbook to dive deeper ๐Ÿ‘‰ https://lnkd.in/gXx3-AyH
Translate to Korean

๋ฐฉ๊ธˆ Transformers Agents๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ์œผ๋กœ RAG(Retrieval Augmented Generation)๋ฅผ ์‰ฝ๊ฒŒ ๊ฐœ์„ ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ฃผ๋Š” ์ƒˆ๋กœ์šด ์ฟก๋ถ์„ ์ถœํŒํ–ˆ์Šต๋‹ˆ๋‹ค.

Vanilla RAG์—๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ์ œํ•œ ์‚ฌํ•ญ์ด ์žˆ์Šต๋‹ˆ๋‹ค.

  • โžค ์†Œ์Šค ๋ฌธ์„œ๋ฅผ ํ•œ ๋ฒˆ๋งŒ ๊ฒ€์ƒ‰ํ•ฉ๋‹ˆ๋‹ค: ๊ฒ€์ƒ‰๋œ ๋ฌธ์„œ๊ฐ€ ์ถฉ๋ถ„ํžˆ ๊ด€๋ จ์„ฑ์ด ์—†์œผ๋ฉด ์ƒ์„ฑ์ด ๋‚˜๋น ์งˆ ๊ฒƒ์ž…๋‹ˆ๋‹ค.
  • โžค ์˜๋ฏธ๋ก ์  ์œ ์‚ฌ์„ฑ์€ ์‚ฌ์šฉ์ž ์ฟผ๋ฆฌ๋ฅผ ์ฐธ์กฐ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ๊ณ„์‚ฐ๋˜๋ฉฐ, ์ด๋Š” ์ข…์ข… ์ฐจ์„ ์ฑ…์ž…๋‹ˆ๋‹ค: ์˜ˆ๋ฅผ ๋“ค์–ด, ์‚ฌ์šฉ์ž ์ฟผ๋ฆฌ๋Š” ๋Œ€๋ถ€๋ถ„ ์งˆ๋ฌธ์ด๊ณ  ์‹ค์ œ ๋‹ต๋ณ€์„ ํฌํ•จํ•˜๋Š” ๋ฌธ์„œ๋Š” ๊ธ์ • ์Œ์„ฑ์ด๋ฏ€๋กœ ์œ ์‚ฌ์„ฑ ์ ์ˆ˜๋Š” ์˜๋ฌธ ํ˜•์‹์˜ ๊ด€๋ จ์„ฑ์ด ๋‚ฎ์€ ์†Œ์Šค ๋ฌธ์„œ์— ๋น„ํ•ด ๋‹ค์šด๊ทธ๋ ˆ์ด๋“œ๋˜์–ด ๊ด€๋ จ ๋ฌธ์„œ๋ฅผ ์„ ํƒํ•˜์ง€ ์•Š์„ ์œ„ํ—˜์ด ์žˆ์Šต๋‹ˆ๋‹ค.

RAG ์—์ด์ „ํŠธ๋ฅผ ๋งŒ๋“ค๋ฉด(์•„์ฃผ ๊ฐ„๋‹จํ•˜๊ฒŒ, ๋ฆฌํŠธ๋ฆฌ๋ฒ„ ๋„๊ตฌ๋กœ ๋ฌด์žฅํ•œ ์—์ด์ „ํŠธ) ์ด ๋‘ ๊ฐ€์ง€ ๋ฌธ์ œ๋ฅผ ๋ชจ๋‘ ์™„ํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค!

  • โœ… ์ฟผ๋ฆฌ ์ž์ฒด๋ฅผ ๊ณต์‹ํ™”ํ•ฉ๋‹ˆ๋‹ค(์ฟผ๋ฆฌ ์žฌ๊ตฌ์„ฑ).
  • โœ… ํ•„์š”ํ•œ ๊ฒฝ์šฐ ๋‹ค์‹œ ๊ฒ€์ƒ‰ํ•  ์ฝ˜ํ…์ธ  ๋น„ํŒ(์ž์ฒด ์ฟผ๋ฆฌ)Critique the content to re-retrieve if needed (self-query)

์ด ์—์ด์ „ํŠธ ์„ค์ •์ด ๊ฒฐ๊ณผ๋ฅผ ์–ผ๋งˆ๋‚˜ ๊ฐœ์„ ํ•ฉ๋‹ˆ๊นŒ? ์š”๋ฆฌ์ฑ…์— Llama-3-70B๋ฅผ ์‚ฌ์šฉํ•˜๋Š” LLM-as-a-judge์˜ ํ‰๊ฐ€ ๋ถ€๋ถ„์„ ์ถ”๊ฐ€ํ–ˆ์Šต๋‹ˆ๋‹ค. ๋ฐ”๋‹๋ผ์—์„œ ์—์ด์ „ํŠธ RAG๋กœ ์ „ํ™˜ํ•˜๋ฉด ์ ์ˆ˜๊ฐ€ 8.5% ์ฆ๊ฐ€ํ•ฉ๋‹ˆ๋‹ค! ๐Ÿ’ช (70.0%์—์„œ 78.5%๋กœ)

ํ•˜์ง€๋งŒ ํ•œ ๊ฐ€์ง€ ์ค‘์š”ํ•œ ๋‹จ์ ์€, ์‹œ์Šคํ…œ์ด 1์ด ์•„๋‹Œ ์—ฌ๋Ÿฌ LLM ํ˜ธ์ถœ์„ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— RAG ์‹œ์Šคํ…œ์˜ ๋Ÿฐํƒ€์ž„๋„ ์ฆ๊ฐ€ํ•œ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ ์ ˆํ•œ ์ ˆ์ถฉ์•ˆ์„ ์ฐพ์•„์•ผ ํ•ฉ๋‹ˆ๋‹ค!

This post is licensed under CC BY 4.0 by the author.