The system prompt is the primary instruction set your agent follows in every conversation. It defines the agent’s persona, the scope of topics it can discuss, how it should behave, and what it should do when it encounters edge cases. A well-written system prompt is the single most impactful configuration change you can make.

Anatomy of a system prompt

A strong system prompt covers four areas:

Identity — Who the agent is and what it represents.
Goal — The primary task the agent should accomplish.
Constraints — What the agent should and should not do.
Tone — How the agent should communicate.

You are Mason, a client support specialist for Northstar Home Services.

Your role is to assist customers with appointment scheduling, service questions, billing inquiries, and basic troubleshooting for residential services.

Guidelines:

* Only provide information related to Northstar Home Services and its offerings.
* Do not estimate costs or timelines unless they are confirmed in the system.
* If a customer is upset or confused, respond calmly and acknowledge the issue before continuing.
* If the request requires account changes, disputes, or technical support beyond your scope, escalate to a live representative.

Tone: Clear, confident, and conversational. Keep responses brief, natural, and easy to follow in a voice interaction.

Write prompts in the same conversational register you want the agent to use. Because responses are spoken aloud, avoid bullet points and long prose in the prompt — they often leak through into the agent’s speech.

# Personality
You are a support representative for [Company Name]. You are friendly, professional, and focused on delivering results.

#Goal
Guide customers to fast, effective resolutions by retrieving account details and initiating returns or credits when warranted.

#Guardrails
Never disclose personal customer information outside of the active session. Always confirm the caller's identity prior to pulling up any account data.

#Tone
Keep replies brief (three sentences or fewer) unless the customer asks for a more thorough walkthrough.

Reliability improves when instructions are organized into clearly separated sections with intentional hierarchy. AI systems naturally assign greater weight to high-priority categories like operational constraints and safety rules, while distinct formatting helps prevent instructions from blending together or influencing unrelated parts of the workflow.

Prioritize Clarity and Brevity

Write instructions using direct, actionable language with minimal excess wording. Focus only on the information required for the agent to perform correctly and avoid repetitive or unnecessary phrasing. Why this improves performance: Short, well-structured instructions are easier for AI systems to interpret consistently. Reducing unnecessary language lowers the chance of confusion, conflicting behaviors, or unintended outputs.

#Tone

Communicate naturally and approachably while maintaining a polished and professional customer

Reinforce High-Priority Instructions

Draw attention to mission-critical actions by clearly labeling them as essential or high priority within the prompt structure. Reiterating the most important operational rules in multiple sections can improve consistency and reduce the likelihood of them being ignored. Why this improves reliability: In larger prompts, AI models can shift focus toward newer context as conversations evolve. Strategic emphasis and selective repetition help maintain adherence to the most important instructions throughout execution.

#Goals

Complete customer verification before viewing or discussing account-specific records. This is a critical requirement.
Retrieve order information and assist with delivery, tracking, or status-related inquiries.
Initiate refund workflows only when the request qualifies under approved policies.

#Guardrails

Do not access, reference, or modify account information until identity confirmation has been successfully completed. This rule must always be followed.

Text Formatting & Speech Optimization

Voice synthesis systems generally perform best when processing fully written language instead of raw symbols, numbers, or special characters. Inputs such as “#”, “@”, currency symbols, phone numbers, or numeric strings can sometimes lead to inaccurate pronunciations, distorted speech output, or unintended voice behavior. To improve spoken accuracy, XUNA converts complex text patterns into speech-friendly language before audio generation occurs. For example:

2500 → “two thousand five hundred”
support@xuna.ai → “support at xuna dot ai”
$99 → “ninety nine dollars”

XUNA provides multiple text normalization methods depending on your preferred balance between speed, transcript formatting, and speech consistency.

Available Normalization Modes

XUNA supports configurable normalization behavior through the text_normalization_type setting inside the agent configuration panel.

`system_prompt` (Default)

This mode instructs the language model to rewrite symbols, abbreviations, and numerical values into spoken-word format before sending content to the voice engine. Benefits

No additional processing delay
Lightweight and fast
Works well for most conversational use cases

Considerations

AI-generated normalization may occasionally be inconsistent
Conversation transcripts display fully written phrases instead of original formatting
- Example: “five hundred dollars” instead of “$500”

If occasional formatting inconsistencies appear, consider upgrading to a more capable language model or reinforcing formatting instructions directly within the system prompt.

`xuna_normalizer`

This mode applies XUNA’s dedicated speech optimization layer after AI generation and before voice rendering. Benefits

Higher consistency and pronunciation accuracy
Preserves natural transcript formatting
- Example: “$500” remains visible in transcripts
Keeps system prompts cleaner and less instruction-heavy

Considerations

Adds a small amount of processing latency before speech playback

For workflows where transcript readability and professional formatting are important, the XUNA normalization engine is generally the preferred configuration.

Where to Configure This

Inside the XUNA platform, navigate to: Agent Settings → Voice Configuration → Advanced Voice Options From there, locate the text normalization settings near the bottom of the voice configuration panel and select the strategy that best matches your use case.

Structured Input Handling for Tools

When the system_prompt normalization option is enabled, the AI may convert symbols, special characters, and numeric values into spoken-language equivalents during conversation generation. For example:

john@gmail.com may be interpreted as “john at gmail dot com”
404-555-1212 may appear as “four zero four five five five one two one two”

Voice transcriptions can also introduce inconsistent formatting depending on pronunciation, pauses, or speech recognition behavior. As a result, automation tools and workflow actions may receive conversational text instead of properly formatted structured values. To improve reliability, tool parameters should clearly define the exact format required for execution. Include explicit formatting instructions and sample values directly within the parameter description whenever possible.

#Goals

Complete customer verification before viewing or discussing account-specific records. This is a critical requirement.
Retrieve order information and assist with delivery, tracking, or status-related inquiries.
Initiate refund workflows only when the request qualifies under approved policies.

#Guardrails

Do not access, reference, or modify account information until identity confirmation has been successfully completed. This rule must always be followed.

Guardrails

Clearly define all mandatory behavioral rules the model must always follow in a dedicated # Guardrails section. Language models are specifically trained to treat this heading as high-priority instruction space, making it more effective for enforcing critical constraints and compliance requirements. This section should include any non-optional rules related to:

Safety and policy compliance
Restricted or prohibited actions
Response formatting requirements
Privacy and security expectations
Escalation or fallback behavior
Tool usage limitations
Brand or tone restrictions

Centralizing these rules improves maintainability, consistency, and auditability. It also reduces the likelihood of important constraints being overlooked when prompts become large or complex.

# Guardrails

Never share customer data across conversations or reveal sensitive account information without proper verification.
Never process refunds over $500 without supervisor approval.
Never make promises about delivery dates that aren't confirmed in the order system.
Acknowledge when you don't know an answer instead of guessing.
If a customer becomes abusive, politely end the conversation and offer to escalate to a supervisor.

/ For additional guidance on implementing strong behavioral constraints, refer to the Guardrails guide. It covers best practices for structuring rules, handling unsafe requests, enforcing policy boundaries, and improving model reliability through layered instruction design.

Tool Configuration for Reliability

Agents designed to manage transactional or multi-step workflows become significantly more effective when they have access to external tools. These tools allow the agent to retrieve real-time information, interact with third-party systems, and complete actions on the user’s behalf. Just as important as prompt design is the way tools are configured and described. Well-defined, action-focused tool descriptions help the model:

Select the correct tool for a task
Supply accurate parameters
Understand expected outcomes
Recover more effectively from failures or invalid responses

Tool definitions should be explicit, concise, and written from the perspective of what the tool accomplishes rather than how it is implemented. Clear configuration improves reliability, reduces hallucinated tool usage, and increases consistency during complex workflows.

Describe tools precisely with detailed parameters

When creating a tool, add descriptions to all parameters. This helps the LLM construct tool calls accurately. Tool description: “Looks up customer order status by order ID and returns current status, estimated delivery date, and tracking number.” Parameter descriptions:

order_id (required): “The unique order identifier, formatted as written characters (e.g., ‘ORD123456’)”
include_history (optional): “If true, returns full order history including status changes”

Why this matters for reliability: Parameter descriptions act as inline documentation for the model. They clarify format expectations, required vs. optional fields, and acceptable values.

Explain when and how to use each tool in the system prompt

Clearly define in your system prompt when and how each tool should be used. Don’t rely solely on tool descriptions—provide usage context and sequencing logic.

# Tools

You have access to the following tools:

## `getOrderStatus`

Use this tool when a customer asks about their order. Always call this tool before providing order information—never rely on memory or assumptions.

**When to use:**

- Customer asks "Where is my order?"
- Customer provides an order number
- Customer asks about delivery estimates

**How to use:**

1. Collect the order ID from the customer
2. Call `getOrderStatus` with the order ID
3. Present the results to the customer in natural language

**Error handling:**
If the tool returns "Order not found", ask the customer to verify the order number and try again.

## `processRefund`

Use this tool only after verifying:

1. Customer identity has been confirmed
2. Order is eligible for refund (within 30 days, not already refunded)
3. Refund amount is under $500 (escalate to supervisor if over $500)

**Required before calling:**

- Order ID (from `getOrderStatus`)
- Refund reason code
- Customer confirmation

This step is important: Always confirm refund details with the customer before calling this tool.

Specify Expected Formats in Tool Parameter Descriptions

When a tool expects structured inputs such as email addresses, phone numbers, account IDs, confirmation codes, or dates, the required format should be clearly defined within the parameter description. Including an example helps the model provide correctly formatted values consistently. This becomes especially important in voice and conversational systems, where speech-to-text normalization may convert structured data into spoken-language forms. For example:

"john dot smith at gmail dot com" instead of john.smith@gmail.com
"five five five one two three four" instead of 5551234

Explicit formatting guidance improves parameter accuracy, reduces tool invocation failures, and helps the agent normalize user input before execution. Examples:

Email format: user@example.com
Phone number format: +1-555-123-4567
Date format: YYYY-MM-DD
Order ID format: ORD-12345

Providing clear input expectations makes tool usage more reliable and minimizes errors caused by ambiguous or transcription-altered values.

## `lookupAccount` tool parameters

- `email` (required): "The customer's email in standard email format, e.g. 'john.smith@company.com'."

Handle Tool Call Failures Gracefully

External tools may occasionally fail due to network interruptions, invalid parameters, service outages, authentication issues, or missing data. Your system prompt should include explicit recovery instructions so the agent can respond safely and consistently when failures occur. Without clear failure-handling behavior, models may attempt to guess missing information, fabricate successful outcomes, or continue a workflow using incorrect assumptions. Defining recovery behavior helps prevent hallucinations and improves production reliability. Recommended guardrails include:

Never invent tool results when a request fails
Acknowledge the failure clearly and transparently
Retry when appropriate and safe to do so
Request clarification if required inputs are missing or invalid
Offer fallback actions or alternative paths when possible
Preserve workflow context so the interaction can continue smoothly after recovery

Example recovery instructions:

“If a tool returns an error, do not fabricate a response.”
“Explain the failure briefly and ask the user how they would like to proceed.”
“Retry transient failures once before escalating.”
“If required data is unavailable, request the missing information explicitly.”

Clear failure-handling policies make agents more predictable, trustworthy, and resilient in real-world environments.

# Tool error handling

If any tool call fails or returns an error:

1. Acknowledge the issue to the customer: "I'm having trouble accessing that information right now."
2. Do not guess or make up information
3. Offer alternatives:
   - Try the tool again if it might be a temporary issue
   - Offer to escalate to a human agent
   - Provide a callback option
4. If the error persists after 2 attempts, escalate to a supervisor

**Example responses:**

- "I'm having trouble looking up that order right now. Let me try again... [retry]"
- "I'm unable to access the order system at the moment. I can transfer you to a specialist who can help, or we can schedule a callback. Which would you prefer?"

Architecture Patterns for Enterprise Agents

Strong prompts and reliable tools are essential, but enterprise-grade agents also need a well-designed architecture. In production environments, agents often manage workflows that are too complex for a single, all-purpose prompt to handle effectively.

Keep Agents Specialized

Avoid giving one agent too many responsibilities. Broad instructions, oversized context windows, and large knowledge scopes can increase latency, reduce accuracy, and make behavior harder to predict. Each agent should have:

A focused purpose
A clearly defined knowledge base
A limited set of tools
Specific success criteria
Well-scoped responsibilities

Specialized agents are more reliable because they encounter fewer edge cases, make decisions within a narrower domain, and are easier to evaluate. They are also simpler to test, debug, monitor, and improve over time.

A general-purpose agent that tries to handle everything is harder to maintain and more likely to fail in production than a system of specialized agents with clear responsibilities and handoff logic. Focused agents are easier to scale, test, monitor, and optimize because each operates within a smaller, more predictable domain.

Use Dispatcher and Specialist Patterns

Architecture pattern:

Dispatch Agent: Routes incoming requests to appropriate specialist agents based on intent classification
Functional or Specialist Agents: Handle domain-specific tasks (billing, scheduling, technical support, etc.)
Human in the Loop : Defined handoff criteria for complex or sensitive cases

For complex workflows, use a multi-agent architecture built around a Dispatch Agent, Functional or Specialist Agents, and Human in the Loop escalation paths. The Dispatch Agent analyzes intent and routes requests to the appropriate Functional Agents, each responsible for a focused domain such as billing, scheduling, or technical support. For sensitive, high-risk, or unresolved cases, clearly defined escalation rules should transfer control to a Human in the Loop for review or intervention. This architecture improves reliability by reducing prompt complexity, limiting unnecessary context, and allowing each agent to operate within a narrowly defined responsibility set. It also makes systems easier to maintain, evaluate, and optimize through independently managed specialists and domain-specific performance metrics.imize over time through domain-specific metrics and independently updatable specialists.

Define Clear Handoff Criteria

In multi-agent systems, clearly define when control should transfer between agents or escalate to a human operator or Human-in-the-Loop. Handoff rules should be based on specific conditions such as user intent, task complexity, missing information, failed tool calls, security concerns, or low-confidence responses. Well-defined handoff logic improves workflow reliability, prevents unnecessary loops between agents, and ensures sensitive or high-risk interactions are handled appropriately. It also helps maintain context continuity as requests move across different parts of the system.

# Goal

Route customer requests to the appropriate functional agent based on intent.

## Routing logic

**Billing specialist agent:** Customer mentions payment, invoice, refund, charge, subscription, or account balance
**Technical specialist agent:** Customer reports error, bug, issue, not working, broken
**Scheduling specialist:** Customer wants to book, reschedule, cancel, or check appointment
**Human escalation:** Customer is angry, requests supervisor, or issue is unresolved after 2 specialist attempts

## Handoff process

1. Classify customer intent based on first message
2. Provide brief acknowledgment: "I'll connect you with our [billing/technical/scheduling] team."
3. Transfer conversation with context summary:
   - Customer name
   - Primary issue
   - Any account identifiers already collected
4. Do not repeat information collection that already occurred

# Personality

You are a billing specialist for Acme Corp. You handle payment issues, refunds, and subscription changes.

# Goal

Resolve billing inquiries by:

1. Verifying customer identity
2. Looking up account and billing history
3. Processing refunds (under $500) or escalating (over $500)
4. Updating subscription settings when requested

# Guardrails

Never access account information without identity verification.
Never process refunds over $500 without supervisor approval.
If the customer's issue is not billing-related, transfer back to the orchestrator agent.

Large Language Model Selection for Enterprise Reliability

Choosing the right model depends on the specific performance requirements of your application, especially around latency, accuracy, reasoning depth, and tool-calling consistency. Different models provide different tradeoffs between response speed, operational cost, contextual understanding, and workflow reliability. Larger models typically offer stronger reasoning, better instruction adherence, and more reliable tool usage, making them well suited for complex workflows and high-accuracy tasks. Smaller models, however, often provide lower latency and reduced operational costs, making them ideal for high-volume or real-time interactions. Enterprise systems should evaluate models based on:

Response latency requirements
Tool-calling reliability
Instruction-following consistency
Context window needs
Reasoning complexity
Cost efficiency at scale

In many cases, the most effective architecture combines multiple models, using lightweight models for simple tasks and more capable models for advanced reasoning or critical decision-making workflows.

Understand the Tradeoffs

When selecting a model for production systems, it’s important to balance latency, reasoning capability, cost, and tool-calling performance based on the needs of your workflow.

Latency: Smaller models typically respond faster, making them better suited for high-frequency interactions, lightweight workflows, and real-time experiences.
Accuracy: Larger models generally provide stronger reasoning, better instruction adherence, and improved performance on complex or multi-step tasks, though they often come with increased latency and operational cost.
Tool-Calling Reliability: Models vary in how consistently they handle structured outputs and function calls. Some models perform well with minimal guidance, while others may require stricter prompting and more explicit parameter definitions to achieve reliable execution.

Model Recommendations by Use Case

Based on large-scale enterprise deployments, different models perform better depending on the balance between latency, reasoning capability, tool-calling reliability, and operational cost.

GPT-4o / GLM 4.5 Air — Balanced Enterprise Performance
Recommended as a strong default for general-purpose enterprise agents where speed, accuracy, and cost efficiency all matter. These models provide reliable tool-calling performance with moderate latency, making them well suited for customer support, scheduling, order management, and general inquiry workflows.
Gemini 2.5 Flash Lite — Ultra-Low Latency Workloads
Best suited for lightweight, high-frequency interactions where response speed is the primary requirement. These models are highly cost-effective at scale and work well for routing, triage, simple FAQs, appointment confirmations, and basic information collection, though they may be less effective for complex reasoning or advanced tool orchestration.
Claude Sonnet 4 / 4.5 — Advanced Reasoning and Orchestration
Designed for complex workflows that require deeper reasoning, nuanced decision-making, and reliable multi-step tool execution. These models typically deliver stronger performance on technically challenging or high-risk tasks, including troubleshooting, compliance-sensitive operations, financial guidance, and advanced escalation handling, with the tradeoff of higher latency and cost.

Benchmark With Your Actual Prompts

Model performance can vary significantly depending on prompt design, workflow complexity, and tool usage patterns. Before selecting a production model, evaluate multiple candidates using the exact prompts and workflows your system will run in production. A reliable benchmarking process should include:

Testing 2–3 candidate models against the same system prompt
Evaluating real user interactions or high-quality synthetic test cases
Measuring latency, response accuracy, and tool-calling success rates
Comparing operational cost against workflow reliability

The goal is to identify the best balance between speed, reasoning capability, consistency, and cost based on your specific production requirements rather than relying solely on benchmark scores or generalized model comparisons.

A/B Testing

Production reliability comes from continuous iteration and structured testing. Even well-designed prompts can fail in real-world scenarios, so long-term performance depends on identifying weaknesses, refining behavior, and validating improvements over time.

Configure Evaluation Criteria

Define measurable evaluation criteria for each agent to track performance and detect regressions as workflows evolve. Common metrics include:

Task Completion Rate: Percentage of requests successfully resolved
Escalation Rate: Percentage of interactions requiring human intervention
Tool Success Rate: Reliability of tool execution and structured outputs
Response Accuracy: Consistency and correctness of generated responses
User Satisfaction: Feedback scores or resolution quality indicators

Analyze Failure Patterns

When an agent underperforms, review failed or low-satisfaction interactions to identify recurring issues and behavioral gaps. Examples:

Incorrect responses → Strengthen or clarify prompt instructions
Poor intent recognition → Add examples or simplify wording
Edge-case failures → Introduce additional guardrails
Frequent tool errors → Improve parameter definitions and recovery logic

Reviewing real conversation transcripts is one of the most effective ways to uncover hidden reliability issues. Avoid broad prompt rewrites whenever possible. Instead, isolate and improve the specific components causing failures. Best practices:

Identify the exact prompt section or tool definition responsible
Test updates against known failure cases
Make one change at a time to isolate impact
Re-run the same evaluations to confirm improvements
Monitor for unintended regressions after deployment

Incremental, test-driven refinement leads to more stable and predictable agent behavior over time.

Avoid making multiple prompt changes at the same time. Updating several variables simultaneously makes it difficult to determine which modification caused an improvement, introduced a regression, or had no impact at all. Isolated, incremental changes make testing more reliable and simplify debugging and optimization.

Configure Data Collection

Configure your agent to capture structured summaries and key metadata from each conversation. Collecting interaction data makes it easier to identify recurring user intents, analyze workflow performance, detect failure patterns, and improve prompts based on real production usage. Useful data points may include:

User intent categories
Task completion outcomes
Escalation events
Tool usage patterns
Failed interactions
User satisfaction indicators
Common edge cases or unsupported requests

Consistent data collection enables iterative optimization by providing measurable insight into how agents perform over time and where reliability improvements are needed most.

Use Simulation for Regression Testing

Before deploying prompt changes to production, test against a set of known scenarios to catch regressions.

Before deploying prompt updates to production, test changes against a predefined set of known scenarios to identify regressions early. Simulation-based testing helps verify that new prompt modifications improve behavior without breaking existing workflows or introducing unintended side effects. Your regression test set should include:

Common user requests
Previously failed interactions
Edge cases and ambiguous inputs
Tool-calling workflows
Escalation scenarios
Safety and guardrail checks

Consistently testing against the same benchmark scenarios makes it easier to compare results over time and validate whether a change improves overall reliability.

Production Considerations

Enterprise agents need safeguards that go beyond prompt design. Production systems should include clear error handling, compliance controls, monitoring, and fallback behavior so agents can continue operating safely when workflows fail, tools return incomplete data, or sensitive cases require human review.

Handle Errors Across All Tool Integrations

Every external tool or API call introduces a potential point of failure. Production agents should include explicit error-handling instructions to ensure failures are communicated clearly, safely, and consistently. Common failure scenarios include:

Network Failures:
“I’m having trouble connecting to the system right now. Let me try again.”
Missing Data:
“I’m unable to find that information. Please verify the details and try again.”
Timeout Errors:
“The request is taking longer than expected. I can retry or escalate this to a specialist.”
Permission or Access Errors:
“I don’t have access to that information. Let me connect you with someone who can assist further.”

Clear recovery messaging improves user trust, prevents hallucinated responses, and ensures workflows fail gracefully instead of producing incorrect or misleading outputs.

Example Prompts

The following examples demonstrate how the reliability principles covered in this guide can be applied to real-world enterprise workflows. Each example highlights key concepts such as prompt structure, guardrails, tool configuration, escalation handling, and multi-agent coordination to show how reliable production systems are designed in practice.

Example 1: Billing and Subscription support - Functional agent

# Personality

You are a billing support specialist for CloudTech, a B2B SaaS platform.
You are professional, detail-oriented, and focused on resolving account and subscription questions accurately.
You explain billing information clearly and avoid unnecessary financial jargon.

# Environment

You are assisting customers via phone support.
Customers may have questions about invoices, payment failures, plan changes, renewals, or billing discrepancies.
You have access to billing tools and the customer account database.

# Tone

Keep responses clear and concise.
Use a calm, reassuring tone with brief acknowledgments such as “I can help with that” or “Let me review this.”
Explain billing details in plain language.
Confirm understanding before making any account or subscription changes.

# Goal

Resolve billing and subscription requests safely and accurately:

1. Verify customer identity using email and account ID
2. Identify the billing issue or requested account change
3. Review account and invoice details using the appropriate billing tool
4. Explain the outcome clearly or escalate if the issue requires manual review

This step is important: Always verify identity before accessing billing information or discussing account details.

# Guardrails

Never access or disclose billing details without identity verification.
Never make subscription changes without explicit customer confirmation.
Never guess about charges, refunds, taxes, or payment status.
If billing records are unclear or incomplete, escalate to a billing specialist.
Do not collect or repeat full payment card numbers.

# Tools

## `verifyCustomerIdentity`

**When to use:** At the start of every conversation before accessing billing or account data.

**Parameters:**

- `email` (required): Customer email in standard written format, such as `user@company.com`. Convert spoken format into written format: “at” → `@`, “dot” → `.`, and remove unnecessary spaces.
- `account_id` (optional): Account ID if provided by the customer.

**Error handling:**
If verification fails, ask the customer to confirm the spelling of their email and try again.

## `getBillingDetails`

**When to use:** After identity verification and once the customer explains the billing issue.

**Parameters:**

- `account_id` (required): From the `verifyCustomerIdentity` response.
- `billing_topic` (required): Type of billing request, such as `invoice`, `payment_status`, `subscription_plan`, `renewal`, or `refund_review`.

**Usage:**

1. Confirm the billing topic with the customer
2. Retrieve billing details using the verified account ID
3. Review the results before explaining charges or next steps

**Error handling:**
If billing details cannot be retrieved, say: “I’m having trouble accessing those billing details right now. I can try again or escalate this to our billing team.”

# Error Handling

If any tool call fails:

1. Acknowledge: “I’m having trouble accessing that information right now.”
2. Do not guess, estimate, or invent billing details
3. Offer to retry once, then escalate if the issue persists

Example 2: Refund and Support - Functional agent

# Personality

You are a refund support specialist for RetailCo.
You are calm, empathetic, and focused on resolving refund requests efficiently while following company policy.
You prioritize clarity, professionalism, and customer trust throughout every interaction.

# Goal

Handle refund requests using the following workflow:

1. Verify customer identity using order number and email
2. Retrieve order details using the `getOrderDetails` tool
3. Confirm refund eligibility based on return policy requirements
4. Process approved refunds or escalate high-risk cases when necessary

This step is important: Always verify refund eligibility before approving or processing any refund request.

# Guardrails

Never process refunds outside the approved return window without supervisor authorization.
Never approve refunds over $500 without escalation. This step is important.
Never access order or payment details before identity verification.
Never speculate about refund approvals or timelines.
If a customer becomes hostile or requests exceptions outside policy, remain professional and offer escalation to a supervisor.

# Tools

## `verifyIdentity`

**When to use:** At the beginning of every conversation before accessing order information.

**Parameters:**

- `order_id` (required): Order ID in uppercase alphanumeric format, such as `ORD123456`. Convert spoken formatting into written format by spelling letters correctly, converting spoken digits to numbers, and removing spaces.
- `email` (required): Customer email in standard written format, such as `john.smith@retailco.com`. Convert spoken formatting by replacing “at” with `@`, “dot” with `.`, and removing unnecessary spaces.

**Error handling:**
If verification fails, ask the customer to repeat or confirm their order number and email address.

## `getOrderDetails`

**When to use:** After successful identity verification.

**Returns:**
- Order date
- Purchased items
- Total order amount
- Refund eligibility status
- Previous refund history

**Usage:**

1. Retrieve order details after verification
2. Confirm the item and issue being referenced
3. Review refund eligibility before discussing outcomes

**Error handling:**
If the order cannot be located, ask the customer to verify the order information and retry once.

## `processRefund`

**When to use:** Only after confirming the order is eligible for refund processing.

**Required checks before calling:**

- Identity successfully verified
- Order falls within the approved refund period
- Order is eligible for refund
- Refund has not already been issued
- Refund amount is under $500

**Parameters:**

- `order_id` (required): Verified order ID
- `reason_code` (required): One of `defective`, `wrong_item`, `late_delivery`, or `changed_mind`

**Usage:**

1. Confirm refund details with the customer before processing
2. Explain expected refund timing and payment method
3. Wait for customer confirmation
4. Process the refund using the tool

**Error handling:**
If refund processing fails, acknowledge the issue and escalate:  
“I’m unable to process the refund right now. Let me escalate this to a supervisor for further assistance.”

# Error Handling

If any tool call fails:

1. Acknowledge the issue clearly
2. Do not guess or fabricate order or refund details
3. Retry once when appropriate
4. Escalate unresolved issues to a supervisor

Demonstrated Principles

✓ Specialized agent scope focused exclusively on refund handling
✓ Structured workflow instructions defined in the # Goal section
✓ Reinforced critical policies such as verification and refund approval limits
✓ Clear tool configuration with explicit usage requirements and validation steps
✓ Structured parameter formatting guidance for emails and order IDs
✓ Dedicated error-handling instructions for tool failures and recovery
✓ Clearly defined escalation rules for high-risk or policy-sensitive requests

Formatting Best Practices

Prompt formatting plays a major role in how effectively a language model interprets instructions and prioritizes behavior. Recommended best practices include:

Use Markdown headings: Organize prompts using # for primary sections and ## for subsections
Prefer bulleted lists: Break instructions into smaller, scannable steps for better readability and instruction parsing
Use whitespace intentionally: Separate sections and logical instruction groups with blank lines to improve structure
Keep headings in sentence case: Use formats like # Goal instead of # GOAL
Maintain consistent formatting: Apply the same heading styles, spacing, and list patterns throughout the prompt

Well-structured prompts are easier to maintain, easier to audit, and generally produce more reliable model behavior in production environments.

Voice-specific writing tips

System prompts for voice agents differ from those for text chatbots. Keep these in mind:

Keep instructions action-oriented

Tell the agent what to do, not what to avoid. “Confirm the customer’s order number before proceeding” is clearer than “Don’t skip order confirmation.”

Use pronunciation hints for tricky terms

If your brand name or product has unusual pronunciation, spell it phonetically in the prompt. For example: “Our product is called Qwirl (pronounced ‘kwerl’).”

Specify response length

Voice responses should be short. Add an instruction like: “Keep each response to 2-3 sentences. Ask one follow-up question at a time.”

Define escalation behavior explicitly

Agents work best when they know exactly when to stop trying. State the escalation condition clearly: “If you cannot answer after two attempts, transfer the call using the transfer tool.”

Using dynamic variables in the prompt

You can inject per-session data into the prompt using {{variable_name}} placeholders. These are resolved at conversation start before the agent processes anything.

prompt-with-variables.txt

You are a billing assistant for {{company_name}}.

The customer's name is {{customer_name}} and their account tier is {{account_tier}}.

If the customer has a {{account_tier}} plan, they are eligible for priority support.

Pass the variable values when starting the conversation. See Personalization for details.

Prompt size limit

The maximum system prompt size is 2 MB. If you need to supply large amounts of reference information, use the knowledge base instead — it is retrieved dynamically and does not count toward the prompt limit.

Setting the prompt via API

from xuna_ai import XunaAI

client = XunaAI()

agent = client.conversational_ai.agents.update(
    agent_id="your-agent-id",
    conversation_config={
        "agent": {
            "prompt": {
                "prompt": "You are Aria, a customer support agent for Acme Corp..."
            }
        }
    }
)

Changes to the system prompt take effect immediately for new conversations. Conversations already in progress continue with the prompt that was active at their start.

Frequently Asked Questions

How can I maintain consistency across multiple agents?

Use shared prompt templates for common sections such as guardrails, error handling, formatting standards, and response behavior. Centralizing reusable components helps maintain consistent behavior across specialist agents while simplifying updates and long-term maintenance.

What should every production prompt include?

At a minimum, production prompts should define:

The agent’s role or personality
The primary workflow or objective
Core guardrails and behavioral restrictions
Tool instructions and error-handling behavior when tools are used

Even lightweight agents benefit from clear structure and explicit operational rules.

How should tool deprecation be handled?

Introduce replacement tools before removing existing ones. Update prompts to prioritize the new tool while keeping the older version available as a temporary fallback. Monitor usage and remove deprecated tools only after confirming they are no longer actively used.

Do prompts need to change across different models?

Most well-structured prompts transfer effectively across modern models, but model-specific tuning can still improve performance. Differences in reasoning style, latency, and tool-calling behavior may require adjustments to formatting, examples, or instruction clarity.

How long should a system prompt be?

There is no fixed limit, but excessively large prompts increase latency, cost, and complexity. Keep prompts focused and intentional. If prompts become too large, consider splitting responsibilities across multiple specialized agents or moving reference material into an external knowledge source.

How do I balance consistency with adaptability?

Keep core instructions, goals, and guardrails stable while allowing flexibility in tone and response style based on the user’s behavior or communication style. Conditional instructions can help agents adapt dynamically without compromising reliability.

Can prompts be updated after deployment?

Yes. Production prompts should evolve over time as new edge cases, workflows, and failure patterns emerge. Test prompt updates in staging environments before deploying changes to production systems.

How can hallucinations be reduced when tools fail?

Include explicit recovery instructions for every tool integration. Reinforce policies such as “never guess or fabricate information” throughout both the guardrails section and tool-specific error handling. Testing failure scenarios during development is critical for validating safe recovery behavior.

Next Steps

This guide provides the foundation for building reliable enterprise agents through structured prompting, tool configuration, testing, and architectural design. From here, continue expanding your system with:

Workflow Design: Build multi-agent orchestration and specialist routing logic
Evaluation Systems: Configure success metrics and performance monitoring
Data Collection: Capture structured insights from production conversations
Testing Pipelines: Implement regression testing and simulation workflows
Guardrails: Strengthen moderation and behavioral safety systems
Privacy & Compliance: Enforce secure data handling and regulatory requirements
Case Studies: Analyze real production agent implementations and deployment patterns

Configure your XUNA AI Conversational AI agents

Choose and configure a language model for your agent

​Anatomy of a system prompt

​Prioritize Clarity and Brevity

​Reinforce High-Priority Instructions

​Text Formatting & Speech Optimization

​Available Normalization Modes

​system_prompt (Default)

​xuna_normalizer

​Where to Configure This

​Structured Input Handling for Tools

​Guardrails

​Tool Configuration for Reliability

​Describe tools precisely with detailed parameters

​Explain when and how to use each tool in the system prompt

​Specify Expected Formats in Tool Parameter Descriptions

​Handle Tool Call Failures Gracefully

​Architecture Patterns for Enterprise Agents

​Keep Agents Specialized

​Use Dispatcher and Specialist Patterns

​Define Clear Handoff Criteria

​Large Language Model Selection for Enterprise Reliability

​Understand the Tradeoffs

​Model Recommendations by Use Case

​Benchmark With Your Actual Prompts

​A/B Testing

​Configure Evaluation Criteria

​Analyze Failure Patterns

​Make Targeted Refinements

​Configure Data Collection

​

​Use Simulation for Regression Testing

​Production Considerations

​Handle Errors Across All Tool Integrations

​Example Prompts

​Example 1: Billing and Subscription support - Functional agent

​Example 2: Refund and Support - Functional agent

​Demonstrated Principles

​Formatting Best Practices

​Voice-specific writing tips

​Using dynamic variables in the prompt

​Prompt size limit

​Setting the prompt via API

​Frequently Asked Questions

​How can I maintain consistency across multiple agents?

​What should every production prompt include?

​How should tool deprecation be handled?

​Do prompts need to change across different models?

​How long should a system prompt be?

​How do I balance consistency with adaptability?

​Can prompts be updated after deployment?

​How can hallucinations be reduced when tools fail?

​Next Steps

Anatomy of a system prompt

Prioritize Clarity and Brevity

Reinforce High-Priority Instructions

Text Formatting & Speech Optimization

Available Normalization Modes

`system_prompt` (Default)

`xuna_normalizer`

Where to Configure This

Structured Input Handling for Tools

Guardrails

Tool Configuration for Reliability

Describe tools precisely with detailed parameters

Explain when and how to use each tool in the system prompt

Specify Expected Formats in Tool Parameter Descriptions

Handle Tool Call Failures Gracefully

Architecture Patterns for Enterprise Agents

Keep Agents Specialized

Use Dispatcher and Specialist Patterns

Define Clear Handoff Criteria

Large Language Model Selection for Enterprise Reliability

Understand the Tradeoffs

Model Recommendations by Use Case

Benchmark With Your Actual Prompts

A/B Testing

Configure Evaluation Criteria

Analyze Failure Patterns

Make Targeted Refinements

Configure Data Collection

Use Simulation for Regression Testing

Production Considerations

Handle Errors Across All Tool Integrations

Example Prompts

Example 1: Billing and Subscription support - Functional agent

Example 2: Refund and Support - Functional agent

Demonstrated Principles

Formatting Best Practices

Voice-specific writing tips

Using dynamic variables in the prompt

Prompt size limit

Setting the prompt via API

Frequently Asked Questions

How can I maintain consistency across multiple agents?

What should every production prompt include?

How should tool deprecation be handled?

Do prompts need to change across different models?

How long should a system prompt be?

How do I balance consistency with adaptability?

Can prompts be updated after deployment?

How can hallucinations be reduced when tools fail?

Next Steps