Tracing Ollama
Ollama is an open-source platform that enables users to run large language models (LLMs) locally on their devices, such as Llama 3.2, Gemma 2, Mistral, Code Llama, and more.
Since the local LLM endpoint served by Ollama is compatible with the OpenAI API, you can query it via OpenAI SDK and enable tracing for Ollama with mlflow.openai.autolog()
. Any LLM interactions via Ollama will be recorded to the active MLflow Experiment.
import mlflow
mlflow.openai.autolog()
Example Usage​
- Run the Ollama server with the desired LLM model.
ollama run llama3.2:1b
- Enable auto-tracing for OpenAI SDK.
import mlflow
# Enable auto-tracing for OpenAI
mlflow.openai.autolog()
# Optional: Set a tracking URI and an experiment
mlflow.set_tracking_uri("http://localhost:5000")
mlflow.set_experiment("Ollama")
- Query the LLM and see the traces in the MLflow UI.
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:11434/v1", # The local Ollama REST endpoint
api_key="dummy", # Required to instantiate OpenAI client, it can be a random string
)
response = client.chat.completions.create(
model="llama3.2:1b",
messages=[
{"role": "system", "content": "You are a science teacher."},
{"role": "user", "content": "Why is the sky blue?"},
],
)
Disable auto-tracing​
Auto tracing for Ollama can be disabled globally by calling mlflow.openai.autolog(disable=True)
or mlflow.autolog(disable=True)
.