Post

RAG or Fine Tuning? Fine-tune Embedding models for Retrieval Augmented Generation (RAG)

RAG or Fine Tuning? A simple feature comparision to decide which technique you should use!

For customizing LLMs, in addition to RAG, another optimization technique is fine-tuning.

๐—ฅ๐—”๐—š is akin to providing a textbook to the model, allowing it to retrieve information based on specific queries. This approach is suitable for scenarios where the model needs to address particular information retrieval tasks. However, RAG is not suitable for teaching the model to understand broad domains or learn new languages, formats, or styles.

๐—™๐—ถ๐—ป๐—ฒ-๐˜๐˜‚๐—ป๐—ถ๐—ป๐—ด is similar to enabling students to internalize knowledge through extensive learning. Fine-tuning can enhance the performance of non-fine-tuned models and make interactions more efficient. It is particularly suitable for emphasizing existing knowledge in the base model, modifying or customizing the modelโ€™s output, and providing complex directives to the model.

Sometimes it may not seem straightforward to choose one approach or the other, thatโ€™s why this guide will help you to differentiate which technique fits better your use case!

 Finetuning or RAG ?

RAG in Production: The importance of a Solid Data Strategy ๐Ÿ’ฅ

Retrieval-Augmented Generation (RAG) has become one of the hottest topics in Generative AI, providing powerful ways to enhance model responses with real-world data. But letโ€™s be honest, without a solid data strategy, youโ€™re setting yourself up for a meme-worthy fail. ๐Ÿ˜‚

๐Ÿ“ˆ ๐—ช๐—ต๐˜† ๐—ฅ๐—”๐—š ๐—ก๐—ฒ๐—ฒ๐—ฑ๐˜€ ๐—ฎ ๐——๐—ฎ๐˜๐—ฎ ๐—ฆ๐˜๐—ฟ๐—ฎ๐˜๐—ฒ๐—ด๐˜†:

  1. ๐——๐—ฎ๐˜๐—ฎ ๐—ค๐˜‚๐—ฎ๐—น๐—ถ๐˜๐˜†: Garbage in, garbage out. Your model is only as good as the data it retrieves.
  2. ๐—ฅ๐—ฒ๐—น๐—ฒ๐˜ƒ๐—ฎ๐—ป๐—ฐ๐—ฒ: Ensure your data is relevant to your use case.
  3. ๐—ฆ๐—ฐ๐—ฎ๐—น๐—ฎ๐—ฏ๐—ถ๐—น๐—ถ๐˜๐˜†: Manage and scale your data efficiently to keep up with growing demands.

Remember, a well-thought-out data strategy is the backbone of any successful RAG implementation.

๐Ÿš€ ๐—–๐—ผ๐—ป๐—ฐ๐—น๐˜‚๐˜€๐—ถ๐—ผ๐—ป: Donโ€™t let your RAG use case fall flat. Invest in your data strategy and watch your AI soar! ๐ŸŒŸ

Fine-tuning can significantly boost retrieval. ๐Ÿ‘€

Embedding models are crucial for Retrieval-Augmented Generation (RAG) applications, but general models often fall short of domain-specific tasks.

Excited to share a new blog on how to fine-tune embedding models for financial RAG applications using NVIDIAโ€™s 2023 SEC Filing dataset using latest research, like Matryoshka Representation Learning:

  • ๐Ÿš€ Fine-tuning boosts performance between 7.4% to 22.55% with just 6.3k samples
  • โœ… Baseline creation + evaluation during training
  • ๐Ÿงฌ Synthetic data generated used for fine-tuning
  • โฑ๏ธ Training on ~10,000 only 5 minutes on consumer-grade GPUs
  • ๐Ÿช† Matryoshka keeps 99% performance at 6x smaller size
  • ๐Ÿ“ˆ Fine-tuned 128-dim model outperforms baseline 768-dim by 6.51%
  • ๐Ÿ†• Uses the new Sentence Transformers v3

๐Ÿ‘‰ Original Article : https://www.philschmid.de/fine-tune-embedding-model-for-rag

๐Ÿ‘‰ Code : https://github.com/philschmid/deep-learning-pytorch-huggingface/blob/main/training/fine-tune-embedding-model-for-rag.ipynb

Go build! ๐Ÿค—

Huggingface RAG : https://huggingface.co/docs/transformers/model_doc/rag

How to Select an Embedding Model for Your RAG Application?

Embeddings form the foundation for achieving precise and contextually relevant LLM outputs across different tasks.

Which encoder you select to generate embeddings is a critical decision, hugely impacting the overall success of the RAG system. Low quality embeddings lead to poor retrieval.

When selecting an embedding model, consider the vector dimension, average retrieval performance, and model size.

Companies such as OpenAI, Cohere, and Voyage consistently release enhanced embedding models.

Different types of embeddings are designed to address unique challenges and requirements in different domains.

 Types of Embedding Models For RAG

โฎ• Dense embeddings are continuous, real-valued vectors that represent information in a high-dimensional space.

In the context of RAG applications, dense embeddings, such as those generated by models like OpenAIโ€™s Ada or sentence transformers, contain non-zero values for every element.

โฎ• Sparse embeddings, on the other hand, are representations where most values are zero, emphasizing only relevant information.

In RAG applications, sparse vectors are essential for scenarios with many rare keywords or specialized terms.

โฎ• Multi-vector embedding models like ColBERT feature late interaction, where the interaction between query and document representations occurs late in the process, after both have been independently encoded.

โฎ• Long documents have always posed a particular challenge for embedding models.

The limitation on maximum sequence lengths, often rooted in architectures like BERT, leads to practitioners segmenting documents into smaller chunks. Unfortunately, this segmentation can result in fragmented semantic meanings and misrepresentation of entire paragraphs.

โฎ• Variable dimension embeddings are a unique concept built on Matryoshka Representation Learning (MRL).

MRL learns lower-dimensional embeddings that are nested into the original embedding, akin to a series of Matryoshka Dolls.

โฎ• Code embeddings are a recent development used to integrate AI-powered capabilities into Integrated Development Environments (IDEs), fundamentally transforming how developers interact with codebases.

There are several factors that need to be considered while selecting an embedding model.

Know more about embeddings and models in this article: https://www.rungalileo.io/blog/mastering-rag-how-to-select-an-embedding-model

Translate to Korean

RAG ๋˜๋Š” ๋ฏธ์„ธ ์กฐ์ •? ์–ด๋–ค ๊ธฐ์ˆ ์„ ์‚ฌ์šฉํ•ด์•ผ ํ•˜๋Š”์ง€ ๊ฒฐ์ •ํ•˜๊ธฐ ์œ„ํ•œ ๊ฐ„๋‹จํ•œ ๊ธฐ๋Šฅ ๋น„๊ต!

LLM์„ ์ปค์Šคํ„ฐ๋งˆ์ด์ง•ํ•˜๊ธฐ ์œ„ํ•ด RAG ์™ธ์—๋„ ๋˜ ๋‹ค๋ฅธ ์ตœ์ ํ™” ๊ธฐ์ˆ ์ด ๋ฏธ์„ธ ์กฐ์ •์ž…๋‹ˆ๋‹ค.

RAG๋Š” ๋ชจ๋ธ์— ๊ต๊ณผ์„œ๋ฅผ ์ œ๊ณตํ•˜๋Š” ๊ฒƒ๊ณผ ์œ ์‚ฌํ•˜์—ฌ ํŠน์ • ์ฟผ๋ฆฌ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ์ •๋ณด๋ฅผ ๊ฒ€์ƒ‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ด ๋ฐฉ๋ฒ•์€ ๋ชจ๋ธ์ด ํŠน์ • ์ •๋ณด ๊ฒ€์ƒ‰ ์ž‘์—…์„ ์ฒ˜๋ฆฌํ•ด์•ผ ํ•˜๋Š” ์‹œ๋‚˜๋ฆฌ์˜ค์— ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ RAG๋Š” ๋ชจ๋ธ์ด ๊ด‘๋ฒ”์œ„ํ•œ ๋„๋ฉ”์ธ์„ ์ดํ•ดํ•˜๊ฑฐ๋‚˜ ์ƒˆ๋กœ์šด ์–ธ์–ด, ํ˜•์‹ ๋˜๋Š” ์Šคํƒ€์ผ์„ ํ•™์Šตํ•˜๋„๋ก ํ•™์Šต์‹œํ‚ค๋Š” ๋ฐ๋Š” ์ ํ•ฉํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.

๋ฏธ์„ธ ์กฐ์ •์€ ํ•™์ƒ๋“ค์ด ๊ด‘๋ฒ”์œ„ํ•œ ํ•™์Šต์„ ํ†ตํ•ด ์ง€์‹์„ ๋‚ด๋ฉดํ™”ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๊ฒƒ๊ณผ ์œ ์‚ฌํ•ฉ๋‹ˆ๋‹ค. ๋ฏธ์„ธ ์กฐ์ •์€ ๋ฏธ์„ธ ์กฐ์ •๋˜์ง€ ์•Š์€ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ณ  ์ƒํ˜ธ ์ž‘์šฉ์„ ๋ณด๋‹ค ํšจ์œจ์ ์œผ๋กœ ๋งŒ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๊ธฐ๋ณธ ๋ชจ๋ธ์˜ ๊ธฐ์กด ์ง€์‹์„ ๊ฐ•์กฐํ•˜๊ณ , ๋ชจ๋ธ์˜ ์ถœ๋ ฅ์„ ์ˆ˜์ •ํ•˜๊ฑฐ๋‚˜ ์‚ฌ์šฉ์ž ์ง€์ •ํ•˜๊ณ , ๋ชจ๋ธ์— ๋ณต์žกํ•œ ์ง€์‹œ๋ฌธ์„ ์ œ๊ณตํ•˜๋Š” ๋ฐ ํŠนํžˆ ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.

๋•Œ๋กœ๋Š” ํ•œ ๊ฐ€์ง€ ์ ‘๊ทผ ๋ฐฉ์‹ ๋˜๋Š” ๋‹ค๋ฅธ ์ ‘๊ทผ ๋ฐฉ์‹์„ ์„ ํƒํ•˜๋Š” ๊ฒƒ์ด ๊ฐ„๋‹จํ•˜์ง€ ์•Š์€ ๊ฒƒ์ฒ˜๋Ÿผ ๋ณด์ผ ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์ด ๊ฐ€์ด๋“œ๋Š” ์‚ฌ์šฉ ์‚ฌ๋ก€์— ๋” ์ ํ•ฉํ•œ ๊ธฐ์ˆ ์„ ๊ตฌ๋ณ„ํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋  ๊ฒƒ์ž…๋‹ˆ๋‹ค!

์ƒ์‚ฐ ํ˜„์žฅ์—์„œ์˜ RAG: ๊ฒฌ๊ณ ํ•œ ๋ฐ์ดํ„ฐ ์ „๋žต๐Ÿ’ฅ์˜ ์ค‘์š”์„ฑ

RAG(Retrieval-Augmented Generation)๋Š” ์ œ๋„ˆ๋ ˆ์ดํ‹ฐ๋ธŒ AI์—์„œ ๊ฐ€์žฅ ์ธ๊ธฐ ์žˆ๋Š” ์ฃผ์ œ ์ค‘ ํ•˜๋‚˜๊ฐ€ ๋˜์—ˆ์œผ๋ฉฐ, ์‹ค์ œ ๋ฐ์ดํ„ฐ๋กœ ๋ชจ๋ธ ์‘๋‹ต์„ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ๊ฐ•๋ ฅํ•œ ๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์†”์งํžˆ ๋งํ•ด์„œ ๊ฒฌ๊ณ ํ•œ ๋ฐ์ดํ„ฐ ์ „๋žต์ด ์—†์œผ๋ฉด ๋ฐˆ์— ์–ด์šธ๋ฆฌ๋Š” ์‹คํŒจ๋ฅผ ๋งž์ดํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๐Ÿ˜‚

๐Ÿ“ˆ RAG์— ๋ฐ์ดํ„ฐ ์ „๋žต์ด ํ•„์š”ํ•œ ์ด์œ :

  1. ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ: ์“ฐ๋ ˆ๊ธฐ ์œ ์ž…, ์“ฐ๋ ˆ๊ธฐ ๋ฐฐ์ถœ. ๋ชจ๋ธ์€ ๊ฒ€์ƒ‰ํ•˜๋Š” ๋ฐ์ดํ„ฐ๋งŒํผ๋งŒ ์šฐ์ˆ˜ํ•ฉ๋‹ˆ๋‹ค.
  2. ๊ด€๋ จ์„ฑ: ๋ฐ์ดํ„ฐ๊ฐ€ ์‚ฌ์šฉ ์‚ฌ๋ก€์™€ ๊ด€๋ จ์ด ์žˆ๋Š”์ง€ ํ™•์ธํ•ฉ๋‹ˆ๋‹ค.
  3. ํ™•์žฅ์„ฑ: ์ฆ๊ฐ€ํ•˜๋Š” ์ˆ˜์š”๋ฅผ ๋”ฐ๋ผ์žก๊ธฐ ์œ„ํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ํšจ์œจ์ ์œผ๋กœ ๊ด€๋ฆฌํ•˜๊ณ  ํ™•์žฅํ•ฉ๋‹ˆ๋‹ค.

์‹ ์ค‘ํ•œ ๋ฐ์ดํ„ฐ ์ „๋žต์€ ์„ฑ๊ณต์ ์ธ RAG ๊ตฌํ˜„์˜ ์ค‘์ถ”๋ผ๋Š” ์ ์„ ๊ธฐ์–ตํ•˜์‹ญ์‹œ์˜ค.

๐Ÿš€ ๊ฒฐ๋ก : RAG ์‚ฌ์šฉ ์‚ฌ๋ก€๊ฐ€ ์‹คํŒจํ•˜์ง€ ์•Š๋„๋ก ํ•˜์‹ญ์‹œ์˜ค. ๋ฐ์ดํ„ฐ ์ „๋žต์— ํˆฌ์žํ•˜๊ณ  AI๊ฐ€ ๊ธ‰์ฆํ•˜๋Š” ๊ฒƒ์„ ์ง€์ผœ๋ณด์‹ญ์‹œ์˜ค! ๐ŸŒŸ

๋ฏธ์„ธ ์กฐ์ •์€ ๊ฒ€์ƒ‰ ์†๋„๋ฅผ ํฌ๊ฒŒ ๋†’์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๐Ÿ‘€

์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ์€ RAG(Retrieval-Augmented Generation) ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์— ๋งค์šฐ ์ค‘์š”ํ•˜์ง€๋งŒ ์ผ๋ฐ˜ ๋ชจ๋ธ์€ ๋„๋ฉ”์ธ๋ณ„ ์ž‘์—…์— ๋ฏธ์น˜์ง€ ๋ชปํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์Šต๋‹ˆ๋‹ค.

Matryoshka Representation Learning๊ณผ ๊ฐ™์€ ์ตœ์‹  ์—ฐ๊ตฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ NVIDIA์˜ 2023 SEC Filing ๋ฐ์ดํ„ฐ ์„ธํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๊ธˆ์œต RAG ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜์šฉ ์ž„๋ฒ ๋”ฉ ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •ํ•˜๋Š” ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์ƒˆ๋กœ์šด ๋ธ”๋กœ๊ทธ๋ฅผ ๊ณต์œ ํ•˜๊ฒŒ ๋˜์–ด ๊ธฐ์ฉ๋‹ˆ๋‹ค.

  • ๐Ÿš€ ๋ฏธ์„ธ ์กฐ์ •์œผ๋กœ ๋‹จ 6.3k ์ƒ˜ํ”Œ๋กœ 7.4%์—์„œ 22.55%๊นŒ์ง€ ์„ฑ๋Šฅ ํ–ฅ์ƒ
  • โœ… ๊ธฐ์ค€ ์ƒ์„ฑ + ํ•™์Šต ์ค‘ ํ‰๊ฐ€
  • ๐Ÿงฌ ๋ฏธ์„ธ ์กฐ์ •์— ์‚ฌ์šฉ๋˜๋Š” ์ƒ์„ฑ๋œ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ
  • โฑ๏ธ ~10,000์— ๋Œ€ํ•œ ๊ต์œก, ์†Œ๋น„์ž์šฉ GPU์—์„œ ๋‹จ 5๋ถ„
  • ๐Ÿช† Matryoshka๋Š” 6๋ฐฐ ๋” ์ž‘์€ ํฌ๊ธฐ๋กœ 99%์˜ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค.
  • ๐Ÿ“ˆ ๋ฏธ์„ธ ์กฐ์ •๋œ 128-dim ๋ชจ๋ธ์€ ๊ธฐ์ค€ 768-dim๋ณด๋‹ค 6.51% ๋” ์šฐ์ˆ˜ํ•ฉ๋‹ˆ๋‹ค.
  • ๐Ÿ†• ์ƒˆ๋กœ์šด ๋ฌธ์žฅ ๋ณ€ํ™˜๊ธฐ v3 ์‚ฌ์šฉ

๋นŒ๋“œํ•˜๋Ÿฌ ๊ฐ€์„ธ์š”! ๐Ÿค—

This post is licensed under CC BY 4.0 by the author.