Set up success evaluation
Success evaluation scores each conversation assuccess, failure, or unknown based on criteria you define. The agent’s LLM applies the criteria to the transcript after the call ends, so there is no runtime cost to latency.
Open the Analysis tab
Go to xuna.ai/app/agents, select your agent, and click Analysis.
Add an evaluation criterion
Under Success evaluation, click Add criterion. Give it a short machine-readable name — for example,
solved_user_inquiry.Write the evaluation prompt
Describe what the evaluator should look for. Be specific about what constitutes success and what constitutes failure.
Example criterion
Outcome values
Each evaluation criterion produces one of three outcomes:| Outcome | Meaning |
|---|---|
success | All success criteria were met |
failure | One or more criteria were not met |
unknown | The evaluator could not determine an outcome — for example, the conversation was too short or the user disconnected unexpectedly |
unknown results.
You can define multiple evaluation criteria per agent. Each criterion is scored independently, so you can track success across different dimensions — for example, both
solved_user_inquiry and maintained_brand_tone.Set up data collection
Data collection extracts specific information from the transcript and stores it as structured fields on each conversation record. Use this to track trends, feed downstream systems, or filter conversations in Call History.Configure the field
Set the data type, identifier, and a description that tells the extractor what to look for.
Example field
Choose a data type
Select the type that matches the value you want to capture:
| Type | Use for |
|---|---|
string | Free-text values like questions, complaints, or topics |
number | Numeric values like order amounts or queue positions |
boolean | Yes/no flags like whether the user agreed to terms |
Search conversation history
The Call History tab supports two search modes:- Keyword search — matches exact words or phrases in transcripts.
- Semantic search — finds conversations by meaning, not exact wording. Searching “billing problem” surfaces conversations where users said “I was charged twice” or “my invoice is wrong”.
Next steps
Analytics dashboard
View aggregate metrics and filter conversations by evaluation outcome.
Testing
Run automated tests to validate your evaluation criteria before deploying.

