Connection
Connect to the WebSocket endpoint:?agent_id=YOUR_AGENT_ID to the URL:
Audio format
All audio sent to and received from the WebSocket uses the same format:| Parameter | Value |
|---|---|
| Encoding | PCM 16-bit signed integer |
| Sample rate | 16,000 Hz |
| Channels | Mono |
| Byte order | Little-endian |
| Transport encoding | Base64 |
Message types — client to server
Your client sends JSON messages to control the session and stream audio.conversation_initiation_client_data
Send this as the first message after connecting to configure the session. All fields are optional.audio
Stream microphone audio to the server. Send chunks continuously while the user is speaking.user_activity
Send this when you detect user activity (e.g., a keystroke or gesture) to signal that the user is present. Useful for non-audio interactions.pong
Respond to serverping messages to keep the connection alive.
Message types — server to client
The server sends JSON messages for conversation events and agent audio.conversation_initiation_metadata
Sent immediately after the connection is established. Contains the conversation ID and the audio format the agent will use for output.audio
Agent speech as a base64-encoded PCM chunk. Decode and play it back in order.agent_response
The agent’s response text. Arrives alongside or slightly before the corresponding audio chunks.user_transcript
Transcription of the user’s speech. Use this to display what the user said in your UI.ping
Sent by the server periodically to verify the connection is alive. Respond with apong message using the same event_id.
interruption
Sent when the user interrupts the agent mid-response. Stop playing any buffered audio chunks when you receive this message.Example: Node.js client
The following example shows a minimal WebSocket client that connects, sends audio from a file, and logs transcripts and agent responses.Summary of message types
| Direction | Type | Purpose |
|---|---|---|
| Client → Server | conversation_initiation_client_data | Configure session overrides |
| Client → Server | audio | Stream microphone audio |
| Client → Server | user_activity | Signal user presence |
| Client → Server | pong | Respond to server ping |
| Server → Client | conversation_initiation_metadata | Session ID and output audio format |
| Server → Client | audio | Agent speech chunks |
| Server → Client | agent_response | Agent response text |
| Server → Client | user_transcript | User speech transcript |
| Server → Client | ping | Keepalive check |
| Server → Client | interruption | User interrupted the agent |
Next steps
- For React apps, use the React SDK which handles the WebSocket protocol for you.
- For phone deployments, see Phone & Telephony.
- Explore the full API reference at API Reference.

