Stream conversations with the WebSocket API

The WebSocket API is the lowest-level integration point for XUNA AI Conversational AI. Use it when you need to build integrations outside the browser — on a server, embedded hardware, or a platform not covered by the official SDKs. You send raw PCM audio and receive transcripts, agent audio, and conversation events over a persistent connection.

Connection

Connect to the WebSocket endpoint:

wss://api.xuna.ai/v1/convai/conversation

Public agents — append ?agent_id=YOUR_AGENT_ID to the URL:

wss://api.xuna.ai/v1/convai/conversation?agent_id=YOUR_AGENT_ID

Private agents — use a signed URL instead of the base endpoint. Generate the signed URL on your server (see Authentication) and connect to it directly:

wss://api.xuna.ai/v1/convai/conversation?token=SIGNED_TOKEN

Audio format

All audio sent to and received from the WebSocket uses the same format:

Parameter	Value
Encoding	PCM 16-bit signed integer
Sample rate	16,000 Hz
Channels	Mono
Byte order	Little-endian
Transport encoding	Base64

Message types — client to server

Your client sends JSON messages to control the session and stream audio.

conversation_initiation_client_data

Send this as the first message after connecting to configure the session. All fields are optional.

{
  "type": "conversation_initiation_client_data",
  "conversation_config_override": {
    "agent": {
      "prompt": { "prompt": "You are a helpful assistant." },
      "first_message": "Hello! How can I help?",
      "language": "en"
    },
    "tts": {
      "voice_id": "your_custom_voice_id"
    }
  }
}

audio

Stream microphone audio to the server. Send chunks continuously while the user is speaking.

{
  "type": "audio",
  "audio_event": {
    "audio_base_64": "<base64-encoded PCM audio>"
  }
}

user_activity

Send this when you detect user activity (e.g., a keystroke or gesture) to signal that the user is present. Useful for non-audio interactions.

{
  "type": "user_activity"
}

pong

Respond to server ping messages to keep the connection alive.

{
  "type": "pong",
  "event_id": 123
}

Message types — server to client

The server sends JSON messages for conversation events and agent audio.

conversation_initiation_metadata

Sent immediately after the connection is established. Contains the conversation ID and the audio format the agent will use for output.

{
  "type": "conversation_initiation_metadata",
  "conversation_initiation_metadata_event": {
    "conversation_id": "conv_abc123",
    "agent_output_audio_format": "pcm_16000"
  }
}

audio

Agent speech as a base64-encoded PCM chunk. Decode and play it back in order.

{
  "type": "audio",
  "audio_event": {
    "audio_base_64": "<base64-encoded PCM audio>",
    "event_id": 1
  }
}

agent_response

The agent’s response text. Arrives alongside or slightly before the corresponding audio chunks.

{
  "type": "agent_response",
  "agent_response_event": {
    "agent_response": "Hello! How can I help you today?"
  }
}

user_transcript

Transcription of the user’s speech. Use this to display what the user said in your UI.

{
  "type": "user_transcript",
  "user_transcription_event": {
    "user_transcript": "What are your opening hours?"
  }
}

ping

Sent by the server periodically to verify the connection is alive. Respond with a pong message using the same event_id.

{
  "type": "ping",
  "ping_event": {
    "event_id": 123
  }
}

interruption

Sent when the user interrupts the agent mid-response. Stop playing any buffered audio chunks when you receive this message.

{
  "type": "interruption"
}

Example: Node.js client

The following example shows a minimal WebSocket client that connects, sends audio from a file, and logs transcripts and agent responses.

import WebSocket from 'ws';
import fs from 'fs';

const AGENT_ID = 'YOUR_AGENT_ID';
const ws = new WebSocket(
  `wss://api.xuna.ai/v1/convai/conversation?agent_id=${AGENT_ID}`
);

ws.on('open', () => {
  console.log('Connected');

  // Send session configuration
  ws.send(JSON.stringify({ type: 'conversation_initiation_client_data' }));

  // Stream audio from a file (replace with live microphone input)
  const audio = fs.readFileSync('input.raw');
  const chunkSize = 4096;
  let offset = 0;

  const interval = setInterval(() => {
    if (offset >= audio.length) {
      clearInterval(interval);
      return;
    }
    const chunk = audio.slice(offset, offset + chunkSize);
    ws.send(
      JSON.stringify({
        type: 'audio',
        audio_event: { audio_base_64: chunk.toString('base64') },
      })
    );
    offset += chunkSize;
  }, 100);
});

ws.on('message', (data) => {
  const message = JSON.parse(data.toString());

  switch (message.type) {
    case 'conversation_initiation_metadata':
      console.log('Conversation ID:', message.conversation_initiation_metadata_event.conversation_id);
      break;
    case 'user_transcript':
      console.log('User:', message.user_transcription_event.user_transcript);
      break;
    case 'agent_response':
      console.log('Agent:', message.agent_response_event.agent_response);
      break;
    case 'ping':
      ws.send(JSON.stringify({ type: 'pong', event_id: message.ping_event.event_id }));
      break;
    case 'interruption':
      console.log('Interrupted — clear audio buffer');
      break;
  }
});

ws.on('close', () => console.log('Disconnected'));
ws.on('error', (err) => console.error('Error:', err));

Summary of message types

Direction	Type	Purpose
Client → Server	`conversation_initiation_client_data`	Configure session overrides
Client → Server	`audio`	Stream microphone audio
Client → Server	`user_activity`	Signal user presence
Client → Server	`pong`	Respond to server ping
Server → Client	`conversation_initiation_metadata`	Session ID and output audio format
Server → Client	`audio`	Agent speech chunks
Server → Client	`agent_response`	Agent response text
Server → Client	`user_transcript`	User speech transcript
Server → Client	`ping`	Keepalive check
Server → Client	`interruption`	User interrupted the agent

Next steps

For React apps, use the React SDK which handles the WebSocket protocol for you.
For phone deployments, see Phone & Telephony.
Explore the full API reference at API Reference.

Get Started

Configure

Deploy

Monitor & Optimize

Stream conversations with the WebSocket API

Connection

Audio format

Message types — client to server

conversation_initiation_client_data

audio

user_activity

pong

Message types — server to client

conversation_initiation_metadata

audio

agent_response

user_transcript

ping

interruption

Example: Node.js client

Summary of message types

Next steps

Get Started

Configure

Deploy

Monitor & Optimize

​Connection

​Audio format

​Message types — client to server

​conversation_initiation_client_data

​audio

​user_activity

​pong

​Message types — server to client

​conversation_initiation_metadata

​audio

​agent_response

​user_transcript

​ping

​interruption

​Example: Node.js client

​Summary of message types

​Next steps

Connection

Audio format

Message types — client to server

conversation_initiation_client_data

audio

user_activity

pong

Message types — server to client

conversation_initiation_metadata

audio

agent_response

user_transcript

ping

interruption

Example: Node.js client

Summary of message types

Next steps