Introduction to RAG with MLflow and LangChain
Tutorial Overview
Welcome to this tutorial, where we explore the integration of Retrieval Augmented Generation (RAG) with MLflow and LangChain. Our focus is on demonstrating how to create advanced RAG systems and showcasing the unique capabilities enabled by MLflow in these applications.
Understanding RAG and how to develop one with MLflow
Retrieval Augmented Generation (RAG) combines the power of language model generation with information retrieval, allowing language models to access and incorporate external data. This approach significantly enriches the model's responses with detailed and context-specific information.
MLflow is instrumental in this process. As an open-source platform, it facilitates the logging, tracking, and deployment of complex models, including RAG chains. With MLflow, integrating LangChain becomes more streamlined, enhancing the development, evaluation, and deployment processes of RAG models.
NOTE: In this tutorial, we'll be using GPT-3.5 as our base language model. It's important to note that the results obtained from a RAG system will differ from those obtained by interfacing directly with GPT models. RAG's unique approach of combining external data retrieval with language model generation creates more nuanced and contextually rich responses.

Learning Outcomes
By the end of this tutorial, you will learn:
- How to establish a RAG chain using LangChain and MLflow.
- Techniques for scraping and processing documents to feed into a RAG system.
- Best practices for deploying and using RAG models to answer complex queries.
- Understanding the practical implications and differences in responses when using RAG in comparison to direct language model interactions.
Setting up our Retriever Dependencies
In order to have a place to store our vetted data (the information that we're going to be retrieving), we're going to use a Vector Database. The framework that we're choosing to use (due to its simplicity, capabilities, and free-to-use characteristics) is FAISS, from Meta.
FAISS Installation for the Tutorial
Understanding FAISS
For this tutorial, we will be utilizing FAISS (Facebook AI Similarity Search, developed and maintained by the Meta AI research group), an efficient similarity search and clustering library. It's a highly useful library that easily handles large datasets and is capable of performing operations such as nearest neighbor search, which are critical in Retrieval Augmented Generation (RAG) systems. There are numerous other vector database solutions that can perform similar functionality; we are using FAISS in this tutorial due to its simplicity, ease of use, and fantastic performance.
Notebook compatibility
With rapidly changing libraries such as langchain, examples can become outdated rather quickly and will no longer work. For the purposes of demonstration, here are the critical dependencies that are recommended to use to effectively run this notebook:
| Package | Version |
|---|---|
| langchain | 0.1.16 |
| lanchain-community | 0.0.33 |
| langchain-openai | 0.0.8 |
| openai | 1.12.0 |
| tiktoken | 0.6.0 |
| mlflow | 2.12.1 |
| faiss-cpu | 1.7.4 |
If you attempt to execute this notebook with different versions, it may function correctly, but it is recommended to use the precise versions above to ensure that your code executes properly.
Installing Requirements
Before proceeding with the tutorial, ensure that you have FAISS and Beautiful Soup installed via pip. The version specifiers for other packages are guaranteed to work with this notebook. Other versions of these packages may not function correctly due to breaking changes their APIs.
pip install beautifulsoup4 faiss-cpu==1.7.4 langchain==0.1.16 langchain-community==0.0.33 langchain-openai==0.0.8 openai==1.12.0 tiktoken==0.6.0
NOTE: If you'd like to run this using your GPU, you can install
faiss-gpuinstead.
import os
import shutil
import tempfile
import requests
from bs4 import BeautifulSoup
from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain_openai import OpenAI, OpenAIEmbeddings
import mlflow
assert "OPENAI_API_KEY" in os.environ, "Please set the OPENAI_API_KEY environment variable."
NOTE: If you'd like to use Azure OpenAI with LangChain, you need to install
openai>=1.10.0andlangchain-openai>=0.0.6, as well as to specify the following credentials and parameters:
from langchain_openai import AzureOpenAI, AzureOpenAIEmbeddings
# Set this to `azure`
os.environ["OPENAI_API_TYPE"] = "azure"
# The API version you want to use: set this to `2023-05-15` for the released version.
os.environ["OPENAI_API_VERSION"] = "2023-05-15"
assert "AZURE_OPENAI_ENDPOINT" in os.environ, (
"Please set the AZURE_OPENAI_ENDPOINT environment variable. It is the base URL for your Azure OpenAI resource. You can find this in the Azure portal under your Azure OpenAI resource."
)
assert "OPENAI_API_KEY" in os.environ, (
"Please set the OPENAI_API_KEY environment variable. It is the API key for your Azure OpenAI resource. You can find this in the Azure portal under your Azure OpenAI resource."
)
azure_openai_llm = AzureOpenAI(
deployment_name="<your-deployment-name>",
model_name="gpt-4o-mini",
)
azure_openai_embeddings = AzureOpenAIEmbeddings(
azure_deployment="<your-deployment-name>",
)
Scraping Federal Documents for RAG Processing
In this section of the tutorial, we will demonstrate how to scrape content from federal document webpages for use in our RAG system. We'll be focusing on extracting transcripts from specific sections of webpages, which will then be used to feed our Retrieval Augmented Generation (RAG) model. This process is crucial for providing the RAG system with relevant external data.
Function Overview
- The function
fetch_federal_documentis designed to scrape and return the transcript of specific federal documents. - It takes two arguments:
url(the webpage URL) anddiv_class(the class of the div element containing the transcript). - The function handles web requests, parses HTML content, and extracts the desired transcript text.
This step is integral to building a RAG system that relies on external, context-specific data. By effectively fetching and processing this data, we can enrich our model's responses with accurate information directly sourced from authoritative documents.
NOTE: In a real-world scenario, you would have your specific text data located on disk somewhere (either locally or on your cloud provider) and the process of loading the embedded data into a vector search database would be entirely external to this active fetching displayed below. We're simply showing the entire process here for demonstration purposes to show the entire end-to-end workflow for interfacing with a RAG model.
def fetch_federal_document(url, div_class):
"""
Scrapes the transcript of the Act Establishing Yellowstone National Park from the given URL.
Args:
url (str): URL of the webpage to scrape.
Returns:
str: The transcript text of the Act.
"""
# Sending a request to the URL
response = requests.get(url)
if response.status_code == 200:
# Parsing the HTML content of the page
soup = BeautifulSoup(response.text, "html.parser")
# Finding the transcript section by its HTML structure
transcript_section = soup.find("div", class_=div_class)
if transcript_section:
transcript_text = transcript_section.get_text(separator="
", strip=True)
return transcript_text
else:
return "Transcript section not found."
else:
return f"Failed to retrieve the webpage. Status code: {response.status_code}"