How experiments work
Create a variant
Navigate to your agent’s Branches tab and click Create branch. The branch starts as a copy of your current configuration. Modify anything — system prompt, voice, tools, knowledge base, LLM, evaluation criteria, or language.
Example: testing a shorter system prompt
Route traffic to the variant
Click Edit traffic split in the Branches panel. Assign a percentage of conversations to your branch. All percentages across all branches must total exactly 100%.Traffic routing is deterministic based on conversation ID, so the same user always reaches the same branch throughout an experiment. This prevents confusing experiences where a user gets a different agent on each call.
Example traffic split
Measure the impact
Click See analytics from the Branches panel to compare metrics between branches side by side. The analytics view shows the same metrics as the main analytics dashboard.
Give the experiment enough time to accumulate statistically meaningful data before drawing conclusions.
| Metric | What to watch for |
|---|---|
| CSAT | Did user satisfaction improve or drop? |
| Containment rate | Is the variant resolving more or fewer conversations? |
| Conversion | Did the change affect goal completion rates? |
| Average handling time | Is the variant faster or slower? |
| Median agent response latency | Did latency change with the new LLM or tools? |
| Cost per agent resolution | Is the variant more or less expensive to operate? |
What you can test
| Configuration area | Examples |
|---|---|
| System prompt | Tone, length, persona, instructions |
| Workflow | Conversation flow logic, branching conditions |
| Voice | Different voices, speaking style, speed |
| Tools | Adding, removing, or reordering tool calls |
| Knowledge base | Different document sets or chunking strategies |
| LLM | GPT vs. Claude vs. Gemini, or different model versions |
| Evaluation criteria | Testing new success definitions before rolling them out |
| Language | Comparing performance across locales |
Best practices
Start with a hypothesis
Start with a hypothesis
Define what you expect to happen and why before you create a branch. For example: “Shortening the system prompt will reduce average handling time without affecting CSAT, because the current prompt contains redundant instructions the model already follows by default.” A hypothesis makes it easier to interpret results and decide whether a change is worth keeping.
Change one thing at a time
Change one thing at a time
If you change the system prompt and the voice in the same branch, you won’t know which change drove the result. Keep each experiment focused on a single variable.
Set up evaluation criteria first
Set up evaluation criteria first
Experiments are most useful when you have evaluation criteria configured before you start. Without success scoring, you can only compare operational metrics like latency and cost — not whether the agent is actually helping users.
Start with small traffic (5–10%)
Start with small traffic (5–10%)
Route a small percentage of traffic to your variant initially. This limits the impact if the change performs worse than expected, and lets you catch obvious regressions before scaling up.
Give experiments enough time
Give experiments enough time
Don’t conclude an experiment after a handful of conversations. Wait until you have enough data to see consistent patterns — typically at least a few hundred conversations, depending on your traffic volume.
Keep experiments short-lived
Keep experiments short-lived
Long-running experiments complicate your version history and make it harder to run follow-up tests. Once you have a clear winner, promote it and close the branch.
Common use cases
Prompt optimization
Test a more concise system prompt to reduce cost and latency without sacrificing answer quality.
Voice selection
Compare two voices on the same conversation flow to find which one users respond to better.
LLM comparison
Measure the cost and quality tradeoffs between different language models for your specific use case.
Knowledge base updates
Validate that a new document set improves containment rate before replacing the existing one.
Next steps
Analytics
Understand the metrics you will use to evaluate experiment results.
Conversation analysis
Set up evaluation criteria before running your first experiment.

