DSPy Quickstart

Download this notebook

DSPy simplifies building language model (LM) pipelines by replacing manual prompt engineering with structured "text transformation graphs." These graphs use flexible, learning modules that automate and optimize LM tasks like reasoning, retrieval, and answering complex questions.

How does it work?

At a high level, DSPy optimizes prompts, selects the best language model, and can even fine-tune the model using training data.

The process follows these three steps, common to most DSPy optimizers:

Candidate Generation: DSPy finds all Predict modules in the program and generates variations of instructions and demonstrations (e.g., examples for prompts). This step creates a set of possible candidates for the next stage.
Parameter Optimization: DSPy then uses methods like random search, TPE, or Optuna to select the best candidate. Fine-tuning models can also be done at this stage.

This Demo

Below we create a simple program that demonstrates the power of DSPy. We will build a text classifier leveraging OpenAI. By the end of this tutorial, we will...

Define a dspy.Signature and dspy.Module to perform text classification.
Leverage dspy.teleprompt.BootstrapFewShotWithRandomSearch to compile our module so it's better at classifying our text.
Analyze internal steps with MLflow Tracing.
Log the compiled model with MLflow.
Load the logged model and perform inference.

%pip install -U openai "dspy>=2.5.17" "mlflow>=2.18.0"

Requirement already satisfied: openai in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (1.102.0)
Requirement already satisfied: dspy>=2.5.17 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (3.0.2)
Requirement already satisfied: mlflow>=2.18.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (3.3.2)
Requirement already satisfied: tqdm>4 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from openai) (4.67.1)
Requirement already satisfied: distro<2,>=1.7.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from openai) (1.9.0)
Requirement already satisfied: pydantic<3,>=1.9.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from openai) (2.11.7)
Requirement already satisfied: sniffio in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from openai) (1.3.1)
Requirement already satisfied: typing-extensions<5,>=4.11 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from openai) (4.15.0)
Requirement already satisfied: anyio<5,>=3.5.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from openai) (4.10.0)
Requirement already satisfied: jiter<1,>=0.4.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from openai) (0.10.0)
Requirement already satisfied: httpx<1,>=0.23.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from openai) (0.28.1)
Requirement already satisfied: xxhash>=3.5.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from dspy>=2.5.17) (3.5.0)
Requirement already satisfied: backoff>=2.2 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from dspy>=2.5.17) (2.2.1)
Requirement already satisfied: gepa[dspy]==0.0.4 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from dspy>=2.5.17) (0.0.4)
Requirement already satisfied: joblib~=1.3 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from dspy>=2.5.17) (1.5.2)
Requirement already satisfied: regex>=2023.10.3 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from dspy>=2.5.17) (2025.7.34)
Requirement already satisfied: requests>=2.31.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from dspy>=2.5.17) (2.32.5)
Requirement already satisfied: rich>=13.7.1 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from dspy>=2.5.17) (14.1.0)
Requirement already satisfied: asyncer==0.0.8 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from dspy>=2.5.17) (0.0.8)
Requirement already satisfied: ujson>=5.8.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from dspy>=2.5.17) (5.11.0)
Requirement already satisfied: optuna>=3.4.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from dspy>=2.5.17) (4.5.0)
Requirement already satisfied: cachetools>=5.5.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from dspy>=2.5.17) (5.5.2)
Requirement already satisfied: numpy>=1.26.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from dspy>=2.5.17) (2.2.6)
Requirement already satisfied: diskcache>=5.6.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from dspy>=2.5.17) (5.6.3)
Requirement already satisfied: litellm>=1.64.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from dspy>=2.5.17) (1.76.0)
Requirement already satisfied: json-repair>=0.30.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from dspy>=2.5.17) (0.50.0)
Requirement already satisfied: cloudpickle>=3.0.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from dspy>=2.5.17) (3.1.1)
Requirement already satisfied: tenacity>=8.2.3 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from dspy>=2.5.17) (9.1.2)
Requirement already satisfied: magicattr>=0.1.6 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from dspy>=2.5.17) (0.1.6)
Ignoring litellm: markers 'extra != "dspy"' don't match your environment
Ignoring datasets: markers 'extra != "dspy"' don't match your environment
Requirement already satisfied: gunicorn<24 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from mlflow>=2.18.0) (23.0.0)
Requirement already satisfied: scipy<2 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from mlflow>=2.18.0) (1.15.3)
Requirement already satisfied: sqlalchemy<3,>=1.4.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from mlflow>=2.18.0) (2.0.43)
Requirement already satisfied: pandas<3 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from mlflow>=2.18.0) (2.3.2)
Requirement already satisfied: mlflow-tracing==3.3.2 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from mlflow>=2.18.0) (3.3.2)
Requirement already satisfied: pyarrow<22,>=4.0.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from mlflow>=2.18.0) (21.0.0)
Requirement already satisfied: Flask<4 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from mlflow>=2.18.0) (3.1.2)
Requirement already satisfied: cryptography<46,>=43.0.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from mlflow>=2.18.0) (45.0.6)
Requirement already satisfied: alembic!=1.10.0,<2 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from mlflow>=2.18.0) (1.16.5)
Requirement already satisfied: scikit-learn<2 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from mlflow>=2.18.0) (1.7.1)
Requirement already satisfied: docker<8,>=4.0.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from mlflow>=2.18.0) (7.1.0)
Requirement already satisfied: matplotlib<4 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from mlflow>=2.18.0) (3.10.5)
Requirement already satisfied: graphene<4 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from mlflow>=2.18.0) (3.4.3)
Requirement already satisfied: mlflow-skinny==3.3.2 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from mlflow>=2.18.0) (3.3.2)
Requirement already satisfied: pyyaml<7,>=5.1 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from mlflow-skinny==3.3.2->mlflow>=2.18.0) (6.0.2)
Requirement already satisfied: protobuf<7,>=3.12.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from mlflow-skinny==3.3.2->mlflow>=2.18.0) (6.32.0)
Requirement already satisfied: databricks-sdk<1,>=0.20.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from mlflow-skinny==3.3.2->mlflow>=2.18.0) (0.64.0)
Requirement already satisfied: packaging<26 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from mlflow-skinny==3.3.2->mlflow>=2.18.0) (25.0)
Requirement already satisfied: sqlparse<1,>=0.4.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from mlflow-skinny==3.3.2->mlflow>=2.18.0) (0.5.3)
Requirement already satisfied: click<9,>=7.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from mlflow-skinny==3.3.2->mlflow>=2.18.0) (8.2.1)
Requirement already satisfied: importlib_metadata!=4.7.0,<9,>=3.7.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from mlflow-skinny==3.3.2->mlflow>=2.18.0) (8.7.0)
Requirement already satisfied: uvicorn<1 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from mlflow-skinny==3.3.2->mlflow>=2.18.0) (0.35.0)
Requirement already satisfied: opentelemetry-api<3,>=1.9.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from mlflow-skinny==3.3.2->mlflow>=2.18.0) (1.36.0)
Requirement already satisfied: opentelemetry-sdk<3,>=1.9.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from mlflow-skinny==3.3.2->mlflow>=2.18.0) (1.36.0)
Requirement already satisfied: gitpython<4,>=3.1.9 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from mlflow-skinny==3.3.2->mlflow>=2.18.0) (3.1.45)
Requirement already satisfied: fastapi<1 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from mlflow-skinny==3.3.2->mlflow>=2.18.0) (0.116.1)
Requirement already satisfied: Mako in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from alembic!=1.10.0,<2->mlflow>=2.18.0) (1.3.10)
Requirement already satisfied: tomli in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from alembic!=1.10.0,<2->mlflow>=2.18.0) (2.2.1)
Requirement already satisfied: exceptiongroup>=1.0.2 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from anyio<5,>=3.5.0->openai) (1.3.0)
Requirement already satisfied: idna>=2.8 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from anyio<5,>=3.5.0->openai) (3.10)
Requirement already satisfied: cffi>=1.14 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from cryptography<46,>=43.0.0->mlflow>=2.18.0) (1.17.1)
Requirement already satisfied: urllib3>=1.26.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from docker<8,>=4.0.0->mlflow>=2.18.0) (2.5.0)
Requirement already satisfied: jinja2>=3.1.2 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from Flask<4->mlflow>=2.18.0) (3.1.6)
Requirement already satisfied: blinker>=1.9.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from Flask<4->mlflow>=2.18.0) (1.9.0)
Requirement already satisfied: markupsafe>=2.1.1 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from Flask<4->mlflow>=2.18.0) (3.0.2)
Requirement already satisfied: werkzeug>=3.1.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from Flask<4->mlflow>=2.18.0) (3.1.3)
Requirement already satisfied: itsdangerous>=2.2.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from Flask<4->mlflow>=2.18.0) (2.2.0)
Requirement already satisfied: python-dateutil<3,>=2.7.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from graphene<4->mlflow>=2.18.0) (2.9.0.post0)
Requirement already satisfied: graphql-core<3.3,>=3.1 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from graphene<4->mlflow>=2.18.0) (3.2.6)
Requirement already satisfied: graphql-relay<3.3,>=3.1 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from graphene<4->mlflow>=2.18.0) (3.2.0)
Requirement already satisfied: certifi in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from httpx<1,>=0.23.0->openai) (2025.8.3)
Requirement already satisfied: httpcore==1.* in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from httpx<1,>=0.23.0->openai) (1.0.9)
Requirement already satisfied: h11>=0.16 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from httpcore==1.*->httpx<1,>=0.23.0->openai) (0.16.0)
Requirement already satisfied: aiohttp>=3.10 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from litellm>=1.64.0->dspy>=2.5.17) (3.12.15)
Requirement already satisfied: python-dotenv>=0.2.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from litellm>=1.64.0->dspy>=2.5.17) (1.1.1)
Requirement already satisfied: tokenizers in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from litellm>=1.64.0->dspy>=2.5.17) (0.21.4)
Requirement already satisfied: jsonschema<5.0.0,>=4.22.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from litellm>=1.64.0->dspy>=2.5.17) (4.25.1)
Requirement already satisfied: tiktoken>=0.7.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from litellm>=1.64.0->dspy>=2.5.17) (0.11.0)
Requirement already satisfied: kiwisolver>=1.3.1 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from matplotlib<4->mlflow>=2.18.0) (1.4.9)
Requirement already satisfied: pillow>=8 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from matplotlib<4->mlflow>=2.18.0) (11.3.0)
Requirement already satisfied: cycler>=0.10 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from matplotlib<4->mlflow>=2.18.0) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from matplotlib<4->mlflow>=2.18.0) (4.59.2)
Requirement already satisfied: contourpy>=1.0.1 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from matplotlib<4->mlflow>=2.18.0) (1.3.2)
Requirement already satisfied: pyparsing>=2.3.1 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from matplotlib<4->mlflow>=2.18.0) (3.2.3)
Requirement already satisfied: colorlog in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from optuna>=3.4.0->dspy>=2.5.17) (6.9.0)
Requirement already satisfied: pytz>=2020.1 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from pandas<3->mlflow>=2.18.0) (2025.2)
Requirement already satisfied: tzdata>=2022.7 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from pandas<3->mlflow>=2.18.0) (2025.2)
Requirement already satisfied: pydantic-core==2.33.2 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from pydantic<3,>=1.9.0->openai) (2.33.2)
Requirement already satisfied: typing-inspection>=0.4.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from pydantic<3,>=1.9.0->openai) (0.4.1)
Requirement already satisfied: annotated-types>=0.6.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from pydantic<3,>=1.9.0->openai) (0.7.0)
Requirement already satisfied: charset_normalizer<4,>=2 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from requests>=2.31.0->dspy>=2.5.17) (3.4.3)
Requirement already satisfied: pygments<3.0.0,>=2.13.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from rich>=13.7.1->dspy>=2.5.17) (2.19.2)
Requirement already satisfied: markdown-it-py>=2.2.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from rich>=13.7.1->dspy>=2.5.17) (4.0.0)
Requirement already satisfied: threadpoolctl>=3.1.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from scikit-learn<2->mlflow>=2.18.0) (3.6.0)
Requirement already satisfied: propcache>=0.2.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from aiohttp>=3.10->litellm>=1.64.0->dspy>=2.5.17) (0.3.2)
Requirement already satisfied: aiosignal>=1.4.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from aiohttp>=3.10->litellm>=1.64.0->dspy>=2.5.17) (1.4.0)
Requirement already satisfied: attrs>=17.3.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from aiohttp>=3.10->litellm>=1.64.0->dspy>=2.5.17) (25.3.0)
Requirement already satisfied: yarl<2.0,>=1.17.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from aiohttp>=3.10->litellm>=1.64.0->dspy>=2.5.17) (1.20.1)
Requirement already satisfied: async-timeout<6.0,>=4.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from aiohttp>=3.10->litellm>=1.64.0->dspy>=2.5.17) (5.0.1)
Requirement already satisfied: multidict<7.0,>=4.5 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from aiohttp>=3.10->litellm>=1.64.0->dspy>=2.5.17) (6.6.4)
Requirement already satisfied: frozenlist>=1.1.1 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from aiohttp>=3.10->litellm>=1.64.0->dspy>=2.5.17) (1.7.0)
Requirement already satisfied: aiohappyeyeballs>=2.5.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from aiohttp>=3.10->litellm>=1.64.0->dspy>=2.5.17) (2.6.1)
Requirement already satisfied: pycparser in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from cffi>=1.14->cryptography<46,>=43.0.0->mlflow>=2.18.0) (2.22)
Requirement already satisfied: google-auth~=2.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from databricks-sdk<1,>=0.20.0->mlflow-skinny==3.3.2->mlflow>=2.18.0) (2.40.3)
Requirement already satisfied: starlette<0.48.0,>=0.40.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from fastapi<1->mlflow-skinny==3.3.2->mlflow>=2.18.0) (0.47.3)
Requirement already satisfied: gitdb<5,>=4.0.1 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from gitpython<4,>=3.1.9->mlflow-skinny==3.3.2->mlflow>=2.18.0) (4.0.12)
Requirement already satisfied: zipp>=3.20 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from importlib_metadata!=4.7.0,<9,>=3.7.0->mlflow-skinny==3.3.2->mlflow>=2.18.0) (3.23.0)
Requirement already satisfied: rpds-py>=0.7.1 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from jsonschema<5.0.0,>=4.22.0->litellm>=1.64.0->dspy>=2.5.17) (0.27.1)
Requirement already satisfied: jsonschema-specifications>=2023.03.6 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from jsonschema<5.0.0,>=4.22.0->litellm>=1.64.0->dspy>=2.5.17) (2025.4.1)
Requirement already satisfied: referencing>=0.28.4 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from jsonschema<5.0.0,>=4.22.0->litellm>=1.64.0->dspy>=2.5.17) (0.36.2)
Requirement already satisfied: mdurl~=0.1 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from markdown-it-py>=2.2.0->rich>=13.7.1->dspy>=2.5.17) (0.1.2)
Requirement already satisfied: opentelemetry-semantic-conventions==0.57b0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from opentelemetry-sdk<3,>=1.9.0->mlflow-skinny==3.3.2->mlflow>=2.18.0) (0.57b0)
Requirement already satisfied: six>=1.5 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from python-dateutil<3,>=2.7.0->graphene<4->mlflow>=2.18.0) (1.17.0)
Requirement already satisfied: datasets>=2.14.6 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from gepa[dspy]==0.0.4->dspy>=2.5.17) (4.0.0)
Requirement already satisfied: huggingface-hub<1.0,>=0.16.4 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from tokenizers->litellm>=1.64.0->dspy>=2.5.17) (0.34.4)
Requirement already satisfied: fsspec[http]<=2025.3.0,>=2023.1.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from datasets>=2.14.6->gepa[dspy]==0.0.4->dspy>=2.5.17) (2025.3.0)
Requirement already satisfied: multiprocess<0.70.17 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from datasets>=2.14.6->gepa[dspy]==0.0.4->dspy>=2.5.17) (0.70.16)
Requirement already satisfied: filelock in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from datasets>=2.14.6->gepa[dspy]==0.0.4->dspy>=2.5.17) (3.19.1)
Requirement already satisfied: dill<0.3.9,>=0.3.0 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from datasets>=2.14.6->gepa[dspy]==0.0.4->dspy>=2.5.17) (0.3.8)
Requirement already satisfied: smmap<6,>=3.0.1 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from gitdb<5,>=4.0.1->gitpython<4,>=3.1.9->mlflow-skinny==3.3.2->mlflow>=2.18.0) (5.0.2)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from google-auth~=2.0->databricks-sdk<1,>=0.20.0->mlflow-skinny==3.3.2->mlflow>=2.18.0) (0.4.2)
Requirement already satisfied: rsa<5,>=3.1.4 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from google-auth~=2.0->databricks-sdk<1,>=0.20.0->mlflow-skinny==3.3.2->mlflow>=2.18.0) (4.9.1)
Requirement already satisfied: hf-xet<2.0.0,>=1.1.3 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from huggingface-hub<1.0,>=0.16.4->tokenizers->litellm>=1.64.0->dspy>=2.5.17) (1.1.9)
Requirement already satisfied: pyasn1<0.7.0,>=0.6.1 in /Users/jacobdanner/Repos/open-source/mlflow/venv/lib/python3.10/site-packages (from pyasn1-modules>=0.2.1->google-auth~=2.0->databricks-sdk<1,>=0.20.0->mlflow-skinny==3.3.2->mlflow>=2.18.0) (0.6.1)

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.0.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.

Setup

Set Up LLM

After installing the relevant dependencies, let's set up access to an OpenAI LLM. Here, will leverage OpenAI's gpt-4o-mini model.

# Set OpenAI API Key to the environment variable. You can also pass the token to dspy.LM()
import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI Key:")

import dspy

# Define your model. We will use OpenAI for simplicity
model_name = "gpt-4o-mini"

# Note that an OPENAI_API_KEY environment must be present. You can also pass the token to dspy.LM()
lm = dspy.LM(
  model=f"openai/{model_name}",
  max_tokens=500,
  temperature=0.1,
)
dspy.settings.configure(lm=lm)

Create MLflow Experiment

Create a new MLflow Experiment to track your DSPy models, metrics, parameters, and traces in one place. Although there is already a "default" experiment created in your workspace, it is highly recommended to create one for different tasks to organize experiment artifacts.

import mlflow

mlflow.set_experiment("DSPy Quickstart")

Turn on Auto Tracing with MLflow

MLflow Tracing is a powerful observability tool for monitoring and debugging what happens inside your DSPy modules, helping you identify potential bottlenecks or issues quickly. To enable DSPy tracing, you just need to call mlflow.dspy.autolog and that's it!

mlflow.dspy.autolog()

Set Up Data

Next, we will download the Reuters 21578 dataset from Huggingface. We also write a utility to ensure that our train/test split has the same labels.

import numpy as np
import pandas as pd
from datasets import load_dataset
from dspy.datasets.dataset import Dataset


def read_data_and_subset_to_categories() -> tuple[pd.DataFrame]:
  """
  Read the reuters-21578 dataset. Docs can be found in the url below:
  https://huggingface.co/datasets/yangwang825/reuters-21578
  """

  # Read train/test split
  dataset = load_dataset("yangwang825/reuters-21578")
  train = pd.DataFrame(dataset["train"])
  test = pd.DataFrame(dataset["test"])

  # Clean the labels
  label_map = {
      0: "acq",
      1: "crude",
      2: "earn",
      3: "grain",
      4: "interest",
      5: "money-fx",
      6: "ship",
      7: "trade",
  }

  train["label"] = train["label"].map(label_map)
  test["label"] = test["label"].map(label_map)

  return train, test


class CSVDataset(Dataset):
  def __init__(
      self, n_train_per_label: int = 20, n_test_per_label: int = 10, *args, **kwargs
  ) -> None:
      super().__init__(*args, **kwargs)
      self.n_train_per_label = n_train_per_label
      self.n_test_per_label = n_test_per_label

      self._create_train_test_split_and_ensure_labels()

  def _create_train_test_split_and_ensure_labels(self) -> None:
      """Perform a train/test split that ensure labels in `dev` are also in `train`."""
      # Read the data
      train_df, test_df = read_data_and_subset_to_categories()

      # Sample for each label
      train_samples_df = pd.concat(
          [group.sample(n=self.n_train_per_label) for _, group in train_df.groupby("label")]
      )
      test_samples_df = pd.concat(
          [group.sample(n=self.n_test_per_label) for _, group in test_df.groupby("label")]
      )

      # Set DSPy class variables
      self._train = train_samples_df.to_dict(orient="records")
      self._dev = test_samples_df.to_dict(orient="records")


# Limit to a small dataset to showcase the value of bootstrapping
dataset = CSVDataset(n_train_per_label=3, n_test_per_label=1)

# Create train and test sets containing DSPy
# Note that we must specify the expected input value name
train_dataset = [example.with_inputs("text") for example in dataset.train]
test_dataset = [example.with_inputs("text") for example in dataset.dev]
unique_train_labels = {example.label for example in dataset.train}

print(len(train_dataset), len(test_dataset))
print(f"Train labels: {unique_train_labels}")
print(train_dataset[0])

24 8
Train labels: {'ship', 'earn', 'crude', 'trade', 'acq', 'grain', 'money-fx', 'interest'}
Example({'text': 'manufacturers hanover mhc raises prime rate manufacturers hanover trust co became the third major u s bank to increase its prime rate to pct from matching a move initiated yesterday by citibank and chase manhattan the bank the main subsidiary of manufacturers hanover corp said the new rate is effective today reuter', 'label': 'interest'}) (input_keys={'text'})

Set up DSPy Signature and Module

Finally, we will define our task: text classification.

There are a variety of ways you can provide guidelines to DSPy signature behavior. Currently, DSPy allows users to specify:

A high-level goal via the class docstring.
A set of input fields, with optional metadata.
A set of output fields with optional metadata.

DSPy will then leverage this information to inform optimization.

In the below example, note that we simply provide the expected labels to output field in the TextClassificationSignature class. From this initial state, we'll look to use DSPy to learn to improve our classifier accuracy.

class TextClassificationSignature(dspy.Signature):
  text = dspy.InputField()
  label = dspy.OutputField(
      desc=f"Label of predicted class. Possible labels are {unique_train_labels}"
  )


class TextClassifier(dspy.Module):
  def __init__(self):
      super().__init__()
      self.generate_classification = dspy.Predict(TextClassificationSignature)

  def forward(self, text: str):
      return self.generate_classification(text=text)

Run it!

Hello World

Let's demonstrate predicting via the DSPy module and associated signature. The program has correctly learned our labels from the signature desc field and generates reasonable predictions.

from copy import copy

# Initilize our impact_improvement class
text_classifier = copy(TextClassifier())

message = "I am interested in space"
print(text_classifier(text=message))

message = "I enjoy ice skating"
print(text_classifier(text=message))

Prediction(
  label='interest'
)
Prediction(
  label='interest'
)

Review Traces

Open the MLflow UI and select the "DSPy Quickstart" experiment.
Go to the "Traces" tab to view the generated traces.

Now, you can observe how DSPy translates your query and interacts with the LLM. This feature is extremely valuable for debugging, iteratively refining components within your system, and monitoring models in production. While the module in this tutorial is relatively simple, the tracing feature becomes even more powerful as your model grows in complexity.

MLflow DSPy Trace

Compilation

Training

To train, we will leverage BootstrapFewShotWithRandomSearch, an optimizer that will take bootstrap samples from our training set and leverage a random search strategy to optimize our predictive accuracy.

Note that in the below example, we leverage a simple metric definition of exact match, as defined in validate_classification, but dspy.Metrics can contain complex and LM-based logic to properly evaluate our accuracy.

from dspy.teleprompt import BootstrapFewShotWithRandomSearch


def validate_classification(example, prediction, trace=None) -> bool:
  return example.label == prediction.label


optimizer = BootstrapFewShotWithRandomSearch(
  metric=validate_classification,
  num_candidate_programs=5,
  max_bootstrapped_demos=2,
  num_threads=1,
)

compiled_pe = optimizer.compile(copy(TextClassifier()), trainset=train_dataset)

Going to sample between 1 and 2 traces per predictor.
Will attempt to bootstrap 5 candidate sets.
Average Metric: 19 / 24  (79.2): 100%|██████████| 24/24 [00:19<00:00,  1.26it/s]
New best score: 79.17 for seed -3
Scores so far: [79.17]
Best score so far: 79.17
Average Metric: 22 / 24  (91.7): 100%|██████████| 24/24 [00:20<00:00,  1.17it/s]
New best score: 91.67 for seed -2
Scores so far: [79.17, 91.67]
Best score so far: 91.67

 17%|█▋        | 4/24 [00:02<00:13,  1.50it/s]

Bootstrapped 2 full traces after 5 examples in round 0.
Average Metric: 21 / 24  (87.5): 100%|██████████| 24/24 [00:19<00:00,  1.21it/s]
Scores so far: [79.17, 91.67, 87.5]
Best score so far: 91.67

 12%|█▎        | 3/24 [00:02<00:18,  1.13it/s]

Bootstrapped 2 full traces after 4 examples in round 0.
Average Metric: 22 / 24  (91.7): 100%|██████████| 24/24 [00:29<00:00,  1.23s/it]
Scores so far: [79.17, 91.67, 87.5, 91.67]
Best score so far: 91.67

  4%|▍         | 1/24 [00:00<00:18,  1.27it/s]

Bootstrapped 1 full traces after 2 examples in round 0.
Average Metric: 22 / 24  (91.7): 100%|██████████| 24/24 [00:20<00:00,  1.18it/s]
Scores so far: [79.17, 91.67, 87.5, 91.67, 91.67]
Best score so far: 91.67

  8%|▊         | 2/24 [00:01<00:20,  1.10it/s]

Bootstrapped 1 full traces after 3 examples in round 0.
Average Metric: 22 / 24  (91.7): 100%|██████████| 24/24 [00:22<00:00,  1.06it/s]
Scores so far: [79.17, 91.67, 87.5, 91.67, 91.67, 91.67]
Best score so far: 91.67

  4%|▍         | 1/24 [00:01<00:30,  1.31s/it]

Bootstrapped 1 full traces after 2 examples in round 0.
Average Metric: 23 / 24  (95.8): 100%|██████████| 24/24 [00:25<00:00,  1.04s/it]
New best score: 95.83 for seed 3
Scores so far: [79.17, 91.67, 87.5, 91.67, 91.67, 91.67, 95.83]
Best score so far: 95.83

  4%|▍         | 1/24 [00:00<00:20,  1.12it/s]

Bootstrapped 1 full traces after 2 examples in round 0.
Average Metric: 22 / 24  (91.7): 100%|██████████| 24/24 [00:24<00:00,  1.03s/it]
Scores so far: [79.17, 91.67, 87.5, 91.67, 91.67, 91.67, 95.83, 91.67]
Best score so far: 95.83
8 candidate programs found.

Compare Pre/Post Compiled Accuracy

Finally, let's explore how well our trained model can predict on unseen test data.

def check_accuracy(classifier, test_data: pd.DataFrame = test_dataset) -> float:
  residuals = []
  predictions = []
  for example in test_data:
      prediction = classifier(text=example["text"])
      residuals.append(int(validate_classification(example, prediction)))
      predictions.append(prediction)
  return residuals, predictions


uncompiled_residuals, uncompiled_predictions = check_accuracy(copy(TextClassifier()))
print(f"Uncompiled accuracy: {np.mean(uncompiled_residuals)}")

compiled_residuals, compiled_predictions = check_accuracy(compiled_pe)
print(f"Compiled accuracy: {np.mean(compiled_residuals)}")

Uncompiled accuracy: 0.625
Compiled accuracy: 0.875

As shown above, our compiled accuracy is non-zero - our base LLM inferred meaning of the classification labels simply via our initial prompt. However, with DSPy training, the prompts, demonstrations, and input/output signatures have been updated to give our model to 88% accuracy on unseen data. That's a gain of 25 percentage points!

Let's take a look at each prediction in our test set.

for uncompiled_residual, uncompiled_prediction in zip(uncompiled_residuals, uncompiled_predictions):
  is_correct = "Correct" if bool(uncompiled_residual) else "Incorrect"
  prediction = uncompiled_prediction.label
  print(f"{is_correct} prediction: {' ' * (12 - len(is_correct))}{prediction}")

Incorrect prediction:    money-fx
Correct prediction:      crude
Correct prediction:      money-fx
Correct prediction:      earn
Incorrect prediction:    interest
Correct prediction:      grain
Correct prediction:      trade
Incorrect prediction:    trade

for compiled_residual, compiled_prediction in zip(compiled_residuals, compiled_predictions):
  is_correct = "Correct" if bool(compiled_residual) else "Incorrect"
  prediction = compiled_prediction.label
  print(f"{is_correct} prediction: {' ' * (12 - len(is_correct))}{prediction}")

Correct prediction:      interest
Correct prediction:      crude
Correct prediction:      money-fx
Correct prediction:      earn
Correct prediction:      acq
Correct prediction:      grain
Correct prediction:      trade
Incorrect prediction:    crude

Log and Load the Model with MLflow

Now that we have a compiled model with higher classification accuracy, let's leverage MLflow to log this model and load it for inference.

import mlflow

with mlflow.start_run():
  model_info = mlflow.dspy.log_model(
      compiled_pe,
      name="model",
      input_example="what is 2 + 2?",
  )

Downloading artifacts:   0%|          | 0/7 [00:00<?, ?it/s]

Open the MLflow UI again and check the complied model is recorded to a new MLflow Run. Now you can load the model back for inference using mlflow.dspy.load_model or mlflow.pyfunc.load_model.

💡 MLflow will remember the environment configuration stored in dspy.settings, such as the language model (LM) used during the experiment. This ensures excellent reproducibility for your experiment.

# Define input text
print("
==============Input Text============")
text = test_dataset[0]["text"]
print(f"Text: {text}")

# Inference with original DSPy object
print("
--------------Original DSPy Prediction------------")
print(compiled_pe(text=text).label)

# Inference with loaded DSPy object
print("
--------------Loaded DSPy Prediction------------")
loaded_model_dspy = mlflow.dspy.load_model(model_info.model_uri)
print(loaded_model_dspy(text=text).label)

# Inference with MLflow PyFunc API
loaded_model_pyfunc = mlflow.pyfunc.load_model(model_info.model_uri)
print("
--------------PyFunc Prediction------------")
print(loaded_model_pyfunc.predict(text)["label"])


==============Input Text============
Text: top discount rate at u k bill tender rises to pct

--------------Original DSPy Prediction------------
interest

--------------Loaded DSPy Prediction------------
interest

--------------PyFunc Prediction------------
interest

Next Steps

This example demonstrates how DSPy works. Below are some potential extensions for improving this project, both with DSPy and MLflow.

DSPy

Use real-world data for the classifier.
Experiment with different optimizers.
For more in-depth examples, check out the tutorials and documentation.

MLflow

Deploy the model using MLflow serving.
Use MLflow to experiment with various optimization strategies.

Happy coding!

How does it work?​

This Demo​

Setup​

Set Up LLM​

Create MLflow Experiment​

Turn on Auto Tracing with MLflow​

Set Up Data​

Set up DSPy Signature and Module​

Run it!​

Hello World​

Review Traces​

Compilation​

Training​

Compare Pre/Post Compiled Accuracy​

Log and Load the Model with MLflow​

Next Steps​

DSPy​

MLflow​