Post

๐‹๐‹๐Œ ๐‘๐€๐† ๐๐ž๐ฌ๐ญ ๐๐ซ๐š๐œ๐ญ๐ข๐œ๐ž๐ฌ

Generative large language models are prone to producing outdated information or fabricating facts.

Retrieval-augmented generation (RAG) techniques address the LLM limitations by integrating up-to-date information, mitigating hallucinations, and enhancing response quality, particularly in specialized domains.

๐Ÿ˜… The pace of RAG research is super impressive, but not all of it is practical for real-world use cases.

Many latest research works seek to improve performance over basic RAG architectures, however, they struggle with complex implementation and long response times.

This new paper investigates existing RAG approaches and their potential combinations to identify optimal practices. Through extensive experiments, the authors suggest several strategies for deploying RAG that balance both performance and efficiency.

The infographic in itself (below) is extremely useful too. It breaks down the pipeline into smaller stages and lists methods for each stage.

๐‘๐€๐† ๐–๐จ๐ซ๐ค๐Ÿ๐ฅ๐จ๐ฐ

A typical RAG workflow usually contains multiple intervening processing steps:

  • ๐’’๐’–๐’†๐’“๐’š ๐’„๐’๐’‚๐’”๐’”๐’Š๐’‡๐’Š๐’„๐’‚๐’•๐’Š๐’๐’ (determining whether retrieval is necessary for a given input query),
  • ๐’“๐’†๐’•๐’“๐’Š๐’†๐’—๐’‚๐’ (efficiently obtaining relevant documents for the query),
  • ๐’“๐’†๐’“๐’‚๐’๐’Œ๐’Š๐’๐’ˆ (refining theorder of retrieved documents based on their relevance to the query),
  • ๐’“๐’†๐’‘๐’‚๐’„๐’Œ๐’Š๐’๐’ˆ (organizing the retrieved documents into a structured one for better generation),
  • ๐’”๐’–๐’Ž๐’Ž๐’‚๐’“๐’Š๐’›๐’‚๐’•๐’Š๐’๐’ (extracting key information for response generation from the repacked document and eliminating redundancies)

๐‘๐€๐† ๐›๐ž๐ฌ๐ญ ๐ฉ๐ซ๐š๐œ๐ญ๐ข๐œ๐ž๐ฌ

  • ๐‘ฉ๐’†๐’”๐’• ๐‘ท๐’†๐’“๐’‡๐’๐’“๐’Ž๐’‚๐’๐’„๐’† ๐‘ท๐’“๐’‚๐’„๐’•๐’Š๐’„๐’† : To achieve the highest performance, it is recommended to incorporate query classification module, use the โ€œHybrid with HyDEโ€ method for retrieval, employ monoT5 for reranking, opt for Reverse for repacking, and leverage Recomp for summarization

  • ๐‘ฉ๐’‚๐’๐’‚๐’๐’„๐’†๐’… ๐‘ฌ๐’‡๐’‡๐’Š๐’„๐’Š๐’†๐’๐’„๐’š ๐‘ท๐’“๐’‚๐’„๐’•๐’Š๐’„๐’†: In order to achieve a balance between performance and efficiency, it is recommended to incorporate the query classification module, implement the Hybrid method for retrieval, use TILDEv2 for reranking, opt for Reverse for repacking, and employ Recomp for summarization.

RAG Best Practices paper details (refer to the comments)

 LLM RAG Best Practices

Letโ€™s understand RAG with a simple workflow.

RAG can help prevent hallucinations by providing LLMs with the most recent proprietary and contextual data, allowing them to generate a response based on both their inherent external knowledge and up-to-date internal data.

This approach can improve accuracy and reduce hallucinations.


The vector database space

 The Vector DB Landscape

๐Ÿ”Š The vector database space is populated with numerous players! How do you choose the best one for your use-case?

๐Ÿš€ In the last year, there has been a huge surge in the variety of vector database options. Iโ€™ve compiled the most popular ones in the image below, although it may not encompass the entire list.

๐Ÿ˜ต With such a large number of options, how do you navigate and discover the ideal one for your needs? ๐Ÿ’ก Keep in mind that there isnโ€™t a one-size-fits-all โ€œbestโ€ vector databaseโ€”selecting the right one depends on your unique requirements

Here are some factors to consider:

๐Ÿ“ˆ Scalability

Scalability is crucial for determining a vector databaseโ€™s ability to effectively handle rapidly expanding data volumes.

Evaluating scalability involves considering factors such as load balancing, multiple replications, and the databaseโ€™s ability to handle high-dimensional data and growing query loads over time.

๐Ÿ† Performance

Performance is crucial in assessing vector databases, using metrics like QPS, recall and latency. Benchmark tools like ANN-Benchmark and VectorDBBench offer comprehensive evaluations.

๐Ÿ’ฐ Cost

Factor in the total cost of ownership, encompassing licensing fees, cloud hosting charges, and associated infrastructure costs. A cost-effective system should deliver satisfactory speed and accuracy at a reasonable price.

โœ Developer Experience

Evaluate the ease of setup, documentation clarity, and availability of SDKs for smooth development. Ensure compatibility with preferred cloud providers, LLMs, and seamless integration with existing infrastructure.

๐Ÿ“ฒ Support and Ops

Ensure your provider meets security and compliance standards while offering expertise tailored to your needs. Confirm their availability and technical support, and assess their monitoring capabilities for efficient database management.

๐Ÿ’ซ Additional Features

Various vector databases differ in their feature offerings, influencing your decision-making process depending on your applicationโ€™s long-term objectives. For example, while most vector databases support features like multi-tenant and disk index, only a few support ephemeral indexing. However, you might require only specific features from this subset for your application.

Even after factoring in these considerations, it may still be necessary to conduct individual research on each option.

๐Ÿ“– For example, some commonly known information:

  • โ›ณ Pinecone is well known for efficiently handling extensive collections of vectors, particularly in NLP and computer vision applications, but is a bit on the pricier side.
  • โ›ณ Qdrant is an pretty lightweight and works best for geospatial data.
  • โ›ณ Milvus is an is optimized for large-scale ML applications and excels in building search systems
  • โ›ณ pgvector is the most straightforward choice if you have a Postgres database

and so on!


Translate to Korean

๐Ÿ“š โ€˜RAG(Retrieval-Augmented Generation) ์‹œ์Šคํ…œ์˜ ํฌ๊ด„์  ์—ฐ๊ตฌโ€™ - RAG์˜ ์ด์ •๋ฆฌ

๐Ÿ” ์ด ๋…ผ๋ฌธ์˜ ํ•ต์‹ฌ ํฌ์ธํŠธ:

  • RAG ์ „์ฒด ์›Œํฌํ”Œ๋กœ์šฐ๋ฅผ ๋ชจ๋“ˆ๋ณ„๋กœ ์ƒ์„ธ ๋ถ„์„
  • ๊ฐ ๋ชจ๋“ˆ(๊ฒ€์ƒ‰, ์žฌ์ˆœ์œ„ํ™”, ์š”์•ฝ ๋“ฑ)์˜ ์ตœ์  ๊ตฌํ˜„ ๋ฐฉ๋ฒ• ์ œ์‹œ
  • ๋‹ค์–‘ํ•œ NLP ํƒœ์Šคํฌ์—์„œ RAG ์„ฑ๋Šฅ ํ‰๊ฐ€ ๊ฒฐ๊ณผ ๊ณต๊ฐœ
  • ์„ฑ๋Šฅ๊ณผ ํšจ์œจ์„ฑ์„ ๋ชจ๋‘ ๊ณ ๋ คํ•œ ์ตœ์  ๊ตฌํ˜„ ์ „๋žต ์ œ์•ˆ
  • ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ RAG๋กœ์˜ ํ™•์žฅ ๊ฐ€๋Šฅ์„ฑ ํƒ๊ตฌ
  • ์ƒ์„ฑ๊ธฐ ๋ฏธ์„ธ์กฐ์ •์„ ์œ„ํ•œ ์ตœ์ ์˜ ์ ‘๊ทผ๋ฒ• ์ œ์‹œ

๐Ÿ’ก ์ด๋Ÿฐ ๋ถ„๋“ค์—๊ฒŒ ์ถ”์ฒœํ•ฉ๋‹ˆ๋‹ค:

  • AI ๊ฐœ๋ฐœ์ž: RAG ์‹œ์Šคํ…œ ๊ตฌํ˜„ ์‹œ ์‹ค์งˆ์  ๊ฐ€์ด๋“œ๋ผ์ธ์„ ์–ป๊ณ  ์‹ถ์€ ๋ถ„
  • ์—ฐ๊ตฌ์ž: RAG์˜ ์ตœ์‹  ํŠธ๋ Œ๋“œ์™€ ์„ฑ๋Šฅ ๊ฐœ์„  ๋ฐฉ๋ฒ•์„ ํŒŒ์•…ํ•˜๊ณ  ์‹ถ์€ ๋ถ„
  • ๊ธฐ์—… ์˜์‚ฌ๊ฒฐ์ •์ž: RAG ๋„์ž…์„ ๊ณ ๋ ค ์ค‘์ด์‹  ๋ถ„

๐Ÿค” RAG์— ๊ด€์‹ฌ ์žˆ๋Š” ๋ชจ๋“  ๋ถ„๋“ค์ด ์ข‹์•„ํ• ๋งŒํ•œ ์ •๋ฆฌ๊ฐ€ ๋˜์–ด์žˆ๋Š”๋ฐ์š”. ๋งŽ์€ ๊ธฐ์—…์—์„œ ๊ด€์‹ฌ์„ ๊ฐ€์ง€๊ณ  ์žˆ๋Š” ๋งŒํผ ์ข‹์€ ์ž๋ฃŒ๋ผ๊ณ  ์ƒ๊ฐํ•ด์„œ ๊ณต์œ ํ•ฉ๋‹ˆ๋‹ค.

This post is licensed under CC BY 4.0 by the author.