Tracing LlamaIndex🦙

LlamaIndex is an open-source framework for building agentic generative AI applications that allow large language models to work with your data in any format.

MLflow Tracing provides automatic tracing capability for LlamaIndex. You can enable tracing for LlamaIndex by calling the mlflow.llama_index.autolog() function, and nested traces are automatically logged to the active MLflow Experiment upon invocation of LlamaIndex engines and workflows.

python
import mlflow

mlflow.llama_index.autolog()

tip

MLflow LlamaIndex integration is not only about tracing. MLflow offers full tracking experience for LlamaIndex, including model tracking, index management, and evaluation. Please checkout the MLflow LlamaIndex Flavor to learn more!

Example Usage

First, let's download a test data to create a toy index:

text
!mkdir -p data
!curl -L https://raw.githubusercontent.com/run-llama/llama_index/main/docs/docs/examples/data/paul_graham/paul_graham_essay.txt -o ./data/paul_graham_essay.txt

Load them into a simple in-memory vector index:

text
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

documents = SimpleDirectoryReader("data").load_data()
index = VectorStoreIndex.from_documents(documents)

Now you can enable LlamaIndex auto tracing and start querying the index:

python
import mlflow

# Enabling tracing for LlamaIndex
mlflow.llama_index.autolog()

# Optional: Set a tracking URI and an experiment
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("LlamaIndex")

# Query the index
query_engine = index.as_query_engine()
response = query_engine.query("What was the first program the author wrote?")

Tracking Token Usage and Cost

MLflow automatically tracks token usage and cost for LlamaIndex. The token usage for each LLM call will be logged in each Trace/Span and the aggregated cost and time trend are displayed in the built-in dashboard. See the Token Usage and Cost Tracking documentation for details on accessing this information programmatically.

LlamaIndex workflow

The Workflow is LlamaIndex's next-generation GenAI orchestration framework. It is designed as a flexible and interpretable framework for building arbitrary LLM applications such as an agent, a RAG flow, a data extraction pipeline, etc. MLflow supports tracking, evaluating, and tracing the Workflow objects, which makes them more observable and maintainable.

Automatic tracing for LlamaIndex workflow works off-the-shelf by calling the same mlflow.llama_index.autolog().

To learn more about MLflow's integration with LlamaIndex Workflow, continue to the following tutorials:

Building Advanced RAG with MLflow and LlamaIndex Workflow

Disable auto-tracing

Auto tracing for LlamaIndex can be disabled globally by calling mlflow.llama_index.autolog(disable=True) or mlflow.autolog(disable=True).

Example Usage​

Tracking Token Usage and Cost​

LlamaIndex workflow​

Disable auto-tracing​

Example Usage

Tracking Token Usage and Cost

LlamaIndex workflow

Disable auto-tracing