Test agent behavior before pushing changes live

Testing lets you define conversation scenarios and expected outcomes, then run them automatically against your agent. Use tests to validate behavior after configuration changes — for example, to confirm that a new system prompt still handles your most common queries correctly — before you promote changes to production.

How testing works

Each test consists of a simulated user turn (or a sequence of turns) and a set of assertions about how the agent should respond. When you run a test suite, XUNA AI executes every scenario against the current agent configuration and reports pass/fail results with the agent’s actual responses. Tests run against your agent’s live configuration, so you can catch regressions from prompt edits, knowledge base updates, or tool changes.

Create a test

Open the Testing tab

Go to xuna.ai/app/agents, select your agent, and click Testing.

Create a new test scenario

Click New test. Give the test a descriptive name that identifies the scenario — for example, Refund request - eligible order or Out-of-scope question escalation.

Define the user input

Enter the simulated user message or sequence of messages. Write inputs that reflect real conversations from your Call History rather than ideal phrasings.

Example user input

Hi, I placed an order three days ago but it still hasn't shipped.
Can I get a refund?

Set the expected outcome

Define what a correct response looks like. You can assert on:

Contains — the response includes a specific phrase or piece of information
Does not contain — the response avoids a phrase (useful for brand safety checks)
Evaluation criterion — the response meets a named success criterion you have already configured

Example assertion

Contains: refund policy
Does not contain: I don't know
Evaluation criterion: solved_user_inquiry = success

Save the test

Click Save. Repeat the process to build a test suite that covers your most critical conversation paths.

Run tests

From the dashboard

Click Run all tests in the Testing tab to execute every test in the suite. Results appear inline — green for pass, red for fail — with the agent’s actual response shown alongside your assertion. Click a failed test to see the full conversation transcript and understand why the assertion failed.

From the CLI

You can trigger test runs from the command line — useful for scripting or automating test checks before you push agent changes.

# Install the XUNA AI CLI
npm install -g @xuna-ai/cli

# Run all tests for a specific agent
xuna-ai agents test --agent-id <your-agent-id>

The CLI exits with a non-zero status code if any test fails.

Via the API

curl -X POST https://api.xuna.ai/v1/convai/agents/<agent-id>/tests/run \
  -H "xi-api-key: <your-api-key>"

The response includes a run_id you can poll to retrieve results once execution is complete.

Maintain your test suite

Add a new test whenever you find a conversation in Call History where the agent behaved unexpectedly. This turns real user failures into regression tests that protect against the same issue recurring.

Keep tests focused on behavior that matters to your users, not implementation details of the prompt. If a test becomes flaky after a legitimate improvement to the agent, update the assertion rather than deleting the test.

Next steps

Conversation analysis

Configure success criteria that your tests can assert against.

Experiments

Run controlled A/B tests to measure the impact of changes on production traffic.

Get Started

Configure

Deploy

Monitor & Optimize

Test agent behavior before pushing changes live

How testing works

Create a test

Run tests

From the dashboard

From the CLI

Via the API

Maintain your test suite

Next steps

Conversation analysis

Experiments

Get Started

Configure

Deploy

Monitor & Optimize

​How testing works

​Create a test

​Run tests

​From the dashboard

​From the CLI

​Via the API

​Maintain your test suite

​Next steps

Conversation analysis

Experiments

How testing works

Create a test

Run tests

From the dashboard

From the CLI

Via the API

Maintain your test suite

Next steps