Skip to main content
The voice you choose shapes how your agent is perceived. XUNA AI gives you access to over 5,000 voices across 31 languages — from professional and neutral to expressive and character-driven. You can select a voice in the dashboard or via API, and override it per session for personalized experiences.

Choosing a voice

Browse the full voice library in the XUNA AI Voice Library. Voices are organized by:
  • Gender — Male, female, or neutral.
  • Age — Young, middle-aged, or old.
  • Accent — American, British, Australian, and many more.
  • Use case — Narration, conversational, customer service, etc.
  • Language — Filtered by language support.
Once you find a voice, copy its voice ID and use it in your agent configuration.
from xuna_ai import XunaAI

client = XunaAI()

agent = client.conversational_ai.agents.update(
    agent_id="your-agent-id",
    conversation_config={
        "tts": {
            "voice_id": "JBFqnCBsd6RMkjVDRZzb",  # your chosen voice ID
        }
    }
)

Supported languages

XUNA AI Conversational AI supports 31 languages for both speech recognition and synthesis. Set the agent’s primary language to ensure the ASR model is tuned for the correct language.
English, Spanish, French, German, Italian, Portuguese, Polish, Dutch, Russian, Japanese, Korean, Chinese (Mandarin), Arabic, Hindi, Turkish, Swedish, Norwegian, Danish, Finnish, Czech, Slovak, Romanian, Hungarian, Ukrainian, Greek, Bulgarian, Croatian, Catalan, Hebrew, Malay, and Indonesian.
agent = client.conversational_ai.agents.update(
    agent_id="your-agent-id",
    conversation_config={
        "agent": {
            "language": "en",  # ISO 639-1 language code
        }
    }
)

Automatic language detection

If your agent serves multilingual users, you can enable automatic language detection. The agent detects the user’s language from their first utterance and switches to matching speech recognition and synthesis automatically. Enable language detection through the tools configuration using the built-in language detection system tool.
Automatic language detection works best when users speak a full sentence. Short utterances like “hi” may not provide enough signal to detect language reliably.

Voice settings

Fine-tune how the voice sounds using these settings:
SettingDescriptionRange
StabilityHow consistent the voice sounds across sentences. Higher = more consistent, lower = more expressive.0.0 – 1.0
Similarity boostHow closely the synthesized voice matches the original voice clone.0.0 – 1.0
Style exaggerationAmplifies the style of the voice. Use sparingly — high values can distort quality.0.0 – 1.0
For customer support agents, use a stability of 0.7–0.8 and similarity boost of 0.75. This produces clear, consistent speech without sounding robotic.

Overriding voice per session

You can change the voice for a specific conversation at session start without modifying the agent’s default configuration. This is useful when you want to personalize voices by user preference or locale. See Personalization for how to pass per-session overrides.