MemAlign Optimizer (Experimental)

Experimental Feature

MemAlign is an experimental optimizer. The API may change in future releases.

MemAlign is an experimental optimizer that uses a dual-memory system inspired by human cognition to learn from natural language feedback. It offers significant speed and cost advantages over traditional prompt optimizers.

Fast Alignment

Up to 100× faster than traditional prompt optimizers like SIMBA, enabling rapid iteration on judge quality.

Lower Cost

Significantly lower cost per alignment cycle compared to traditional prompt optimizers.

Few-Shot Learning

Shows visible improvement with just a handful of examples—no need to front-load massive labeling efforts.

Dual-Memory System

Combines generalizable guidelines (semantic memory) with concrete examples (episodic memory) for robust alignment.

Requirements

For alignment to work:

Traces must contain human assessments (labels) with the same name as the judge name
Natural language feedback (rationale) is highly recommended for better alignment
A mix of positive and negative labels is recommended

How MemAlign Works

MemAlign maintains two types of memory:

Semantic Memory: Stores distilled guidelines extracted from feedback. When an expert explains their decision, MemAlign extracts generalizable rules like "Always evaluate safety based on intent, not just language."
Episodic Memory: Holds specific examples, particularly edge cases where the judge made mistakes. These serve as concrete anchors for situations that resist easy generalization.

When evaluating new inputs, MemAlign constructs a dynamic context by gathering all principles from semantic memory and retrieving the most relevant examples from episodic memory—similar to how human judges reference both a rulebook and case history.

Installation

MemAlign requires additional dependencies:

bash
pip install 'mlflow[genai]' dspy jinja2 tqdm

Basic Usage

See make_judge documentation for details on creating judges.

python
import mlflow
from mlflow.genai.judges import make_judge
from mlflow.genai.judges.optimizers import MemAlignOptimizer

# Create a judge
judge = make_judge(
    name="politeness",
    instructions=(
        "Given a user question, evaluate if the chatbot's response is polite and respectful. "
        "Consider the tone, language, and context of the response.\n\n"
        "Question: {{ inputs }}\n"
        "Response: {{ outputs }}"
    ),
    feedback_value_type=bool,
    model="openai:/gpt-5-mini",
)

# Create the MemAlign optimizer
optimizer = MemAlignOptimizer(
    reflection_lm="openai:/gpt-5-mini",
)

# Retrieve traces with human feedback
all_traces = mlflow.search_traces(return_type="list")
alignment_traces = [
    trace
    for trace in all_traces
    if any(
        feedback.name == "politeness"
        for feedback in trace.info.assessments
        # feedback name must match the judge name for alignment to work
    )
]

# Align the judge
aligned_judge = judge.align(traces=alignment_traces, optimizer=optimizer)

Parameters

Parameter	Type	Default	Description
`reflection_lm`	`str`	See Supported Judge Models	Model used for extracting guidelines from feedback. See Supported Judge Models section for which models are supported.
`retrieval_k`	`int`	`5`	Number of relevant examples to retrieve from episodic memory during inference.
`embedding_model`	`str`	`"openai:/text-embedding-3-small"`	Model for episodic memory retrieval. Must be in `<provider>:/<model-name>` format.

The number of parallel threads for LLM calls during guideline distillation can be configured via the MLFLOW_GENAI_OPTIMIZE_MAX_WORKERS environment variable (default: 8).

Inspecting Learned Knowledge

After alignment, you can inspect what the judge has learned by viewing the updated instructions:

python
# View the updated instructions with distilled guidelines
print(aligned_judge.instructions)
# Output includes appended guidelines like:
# "Distilled Guidelines (7):
#   - Responses must be factually accurate...
#   - Use neutral, descriptive language..."

Removing Feedback (Unalignment)

If requirements change or feedback was incorrect, you can selectively remove learned knowledge:

python
import mlflow

# Retrieve traces with outdated or incorrect feedback
traces_to_forget: list[mlflow.entities.Trace] = mlflow.search_traces(
    filter_string="tag.outdated = 'true'",
    return_type="list",
)

# Remove knowledge derived from those traces
updated_judge = aligned_judge.unalign(traces=traces_to_forget)

Debugging

To debug the optimization process, enable DEBUG logging:

python
import logging

logging.getLogger("mlflow.genai.judges.optimizers.memalign").setLevel(logging.DEBUG)
aligned_judge = judge.align(traces=traces, optimizer=optimizer)

Fast Alignment

Lower Cost

Few-Shot Learning

Dual-Memory System

Requirements​

How MemAlign Works​

Installation​

Basic Usage​

Parameters​

Inspecting Learned Knowledge​

Removing Feedback (Unalignment)​

Debugging​