Post

An introduction to LangChain and Agents is provided

LLMs Course Github ๐Ÿ‘‰ https://github.com/peremartra/Large-Language-Model-Notebooks-Course/tree/main/3-LangChain

๐Ÿ”ท The RAG system from the second part is modified to use LangChain.

๐Ÿ”ท A moderation system is created where two Models are linked, with the second one responsible for moderating and providing the response to the user. There are three examples, one with OpenAI Models, and two more with open-source models: Llama and GPT-J.

Article and notebook using OpenAI models:

๐Ÿ”ทThe course continues by creating a Data Analyst using an agent from the langchain_experiments library, capable of interpreting tabular data files from a .csv file.

๐Ÿ”ท Finally, a chatbot is created to serve as a medical assistant using LangChain and ChromaDB with a medical data dataset.

As you can see, this part of the course uses a few examples to introduce you to LangChain and reinforce the knowledge you gained in the second lesson on vector database


I just finished reviewing the Large Language Models Evaluation section of the free course available on GitHub.

๐Ÿ”ท It starts with a brief introduction to n-grams and classic evaluation metrics like Bleu for translations and ROUGE for summaries.

โ–ช๏ธEvaluating translations with BLEU.

โ–ช๏ธEvaluating Summarisations with ROUGE.

๐Ÿ”ท Once introduced to the world of metrics, the course moves on to using a tool like LangSmith, first to monitor the internal calls of an agent created with LangChain, and then to measure the quality of summaries using the distance between embeddings.

This second example is used to introduce LangSmithโ€™s evaluators and shows how to use it to measure more than one metric at a time and detect harmful content in summaries.

โ–ช๏ธEvaluating the quality of summaries using Embedding distance with LangSmith.

๐Ÿ”ท Finally, a very powerful tool called Giskard is presented, which serves, among other things, to evaluate RAG solutions. Like LangSmith, Giskard uses Large Language Models to evaluate other Large Language Models.

This is one of the evaluation trends that seems to be gaining more notoriety.

โ–ช๏ธEvaluating a RAG solution with Giskard.

The evaluation of tools built with Language Models is one of the fastest evolving fields. The complexity of evaluating whether a result is correct or not is often leading to relying on one of the most advanced Large Language Models to evaluate the results of other specialized ones.

In these examples, you see everything from the most classic metrics to the latest tools that not only evaluate the quality of the text produced by the Large Language Model but also all the layers that are part of a RAG solution.

This is just an introduction because several books could be written about this field. But if you go through all the examples, you will have a fairly broad overview and will have learned about different tools.

This post is licensed under CC BY 4.0 by the author.