How testing works
Each test consists of a simulated user turn (or a sequence of turns) and a set of assertions about how the agent should respond. When you run a test suite, XUNA AI executes every scenario against the current agent configuration and reports pass/fail results with the agent’s actual responses. Tests run against your agent’s live configuration, so you can catch regressions from prompt edits, knowledge base updates, or tool changes.Create a test
Open the Testing tab
Go to xuna.ai/app/agents, select your agent, and click Testing.
Create a new test scenario
Click New test. Give the test a descriptive name that identifies the scenario — for example,
Refund request - eligible order or Out-of-scope question escalation.Define the user input
Enter the simulated user message or sequence of messages. Write inputs that reflect real conversations from your Call History rather than ideal phrasings.
Example user input
Set the expected outcome
Define what a correct response looks like. You can assert on:
- Contains — the response includes a specific phrase or piece of information
- Does not contain — the response avoids a phrase (useful for brand safety checks)
- Evaluation criterion — the response meets a named success criterion you have already configured
Example assertion
Run tests
From the dashboard
Click Run all tests in the Testing tab to execute every test in the suite. Results appear inline — green for pass, red for fail — with the agent’s actual response shown alongside your assertion. Click a failed test to see the full conversation transcript and understand why the assertion failed.From the CLI
You can trigger test runs from the command line — useful for scripting or automating test checks before you push agent changes.Via the API
run_id you can poll to retrieve results once execution is complete.
Maintain your test suite
Keep tests focused on behavior that matters to your users, not implementation details of the prompt. If a test becomes flaky after a legitimate improvement to the agent, update the assertion rather than deleting the test.Next steps
Conversation analysis
Configure success criteria that your tests can assert against.
Experiments
Run controlled A/B tests to measure the impact of changes on production traffic.

