LLM Playground

The LLM Playground is an in-browser chat surface for testing MLflow AI Gateway endpoints and prompts from the MLflow Prompt Registry without writing any code. Iterate on a system message, dial in sampling parameters, attach tools, request structured output, and replay multi-turn conversations, all from a single page in the MLflow UI.

Multi-turn Chat

Compose system, user, and assistant messages. Submitted replies are appended to the conversation so the next turn carries full context.

Sampling Parameters

Tune temperature, max_tokens, top_p, top_k, presence/frequency penalties, and stop sequences per request. Leave a field blank to use the provider's default.

Tools

Supply tool definitions and pick a tool choice strategy (auto or required) to test function-calling behavior.

Structured Output

Constrain responses to plain text, a free-form JSON object, or a strict JSON schema you paste in.

Template Variables

Use {{ variable }} placeholders inside any message. Values entered into the Variables drawer are substituted on submit, and the templates stay reusable for future runs.

Prompt Registry

Load any registered prompt version into the conversation with one click. A stored model config (model, temperature, max_tokens, response format) is applied automatically.

Prerequisites

Before opening the Playground, you need an AI Gateway chat endpoint. Follow Create and Manage Endpoints to register one. The Playground submits requests against the /gateway/mlflow/v1/chat/completions route.

Quickstart

Open the Playground entry in the left sidebar of the MLflow UI (URL path /playground).
Click the Select an endpoint dropdown in the top bar and pick a chat endpoint.
Type a message in the user composer. Use the role dropdown on a message to switch it to system or assistant, and the add/remove buttons to add more turns.
Click Submit. If anything required is missing (no endpoint, an empty message, an unfilled variable, or an invalid JSON setting), the Submit button is disabled and hovering it shows the exact list of blockers.
On success, the assistant's reply is appended to the conversation and a fresh empty user message is added for the next turn. Submit again to continue the conversation.
Click Clear conversation to reset the chat to a single empty user message.

warning

If a request fails, an inline error alert appears showing the HTTP status and the response body returned by the gateway.

Settings

Parameters, Tools, and Response format live in the Settings drawer in the top bar.

Parameters

Every field is optional. Unset fields are omitted from the request, so the provider's defaults apply.

Parameter	Type	Notes
`temperature`	`float`	Sampling temperature. Higher values are more random.
`max_tokens`	`int`	Maximum tokens to generate in the response.
`top_p`	`float`	Nucleus sampling probability mass.
`top_k`	`int`	Top-k sampling cutoff.
`presence_penalty`	`float`	Penalize tokens already present in the context.
`frequency_penalty`	`float`	Penalize tokens by how often they appear so far.
`stop`	`list[str]`	Stop sequences that end generation.

warning

Not every provider accepts every parameter. Values the provider doesn't understand surface as a request error in the inline alert.

Tools

The Tools section accepts a JSON array of tool definitions plus a tool choice strategy:

None (default) — tools are not sent with the request.
Auto — the model decides whether to call a tool.
Required — the model must call one of the supplied tools.

When tool choice is Auto or Required, at least one valid tool definition is required. The Submit button stays disabled with a "Add at least one tool definition" or "Fix the Tools JSON" blocker until the JSON parses as a non-empty array.

Response format

The Response format section offers three modes for structured output:

Text — natural-language output (default).
JSON — the model returns valid JSON. Sent as response_format: { type: "json_object" }.
JSON schema — the model output conforms to a schema you paste into the editor. The Playground submits it as response_format: { type: "json_schema", json_schema: { name: "response_schema", schema: <your schema>, strict: true } }.

JSON schema mode requires a valid JSON object. An empty or malformed schema blocks submission and the parser error is shown inline.

Variables

Any message can include {{ variable }} placeholders. The Playground scans every message on each keystroke and exposes the detected variables in the Variables drawer. Enter a value for each one. Values are substituted at submit time, while the message templates themselves stay unchanged for reuse.

Submission is blocked until every detected variable has a non-empty value.

Loading prompts from the Prompt Registry

Click Load prompt from registry in the top bar to pick from prompts registered in the Prompt Registry:

Chat-typed prompts populate the full conversation with their system / user / assistant turns.
Text-typed prompts load as a single user message.
If the prompt version has a stored model config, the matching sampling parameters and response-format settings are applied at the same time. A toast confirms Loaded <name> v<version> with settings.

AI Gateway — create and manage the endpoints the Playground talks to.
Prompt Registry — register and version the prompts you load into the Playground.

Multi-turn Chat

Sampling Parameters

Tools

Structured Output

Template Variables

Prompt Registry

Prerequisites​

Quickstart​

Settings​

Parameters​

Tools​

Response format​

Variables​

Loading prompts from the Prompt Registry​

Related docs​

Prerequisites

Quickstart

Settings

Parameters

Tools

Response format

Variables

Loading prompts from the Prompt Registry

Related docs