Voice Agents overview | Speechify API

Voice Agents let you put a talking, listening AI in your product in under five minutes. An agent is a reusable definition — prompt, voice, tools, evaluation criteria — that your users can hold a conversation with over the web or (coming soon) a phone line.

What you get

Speechify voices — every voice from our catalog is available, including cloned voices.
Low-latency realtime pipeline — sub-2s perceived per-turn latency across the full conversation loop (speech in → agent response → speech out).
Tools — let the agent call your backend (webhook tools), run code on the caller’s device (client tools), or invoke built-ins like end_call and transfer_to_number.
Full transcripts — every turn persisted with timestamps and tool traces.
Post-call evaluation — LLM-graded criteria and structured data extraction run automatically after hang-up.

How it fits together

  Browser / SDK                Speechify API                 Speechify Realtime
  ─────────────                ─────────────                 ──────────────────
       │  POST /v1/agents/{id}/conversations
       │─────────────────────────────────────▶│
       │                                       │  provision session + dispatch agent
       │                                       │───────────────────────────────────▶│
       │                                       │                                    │
       │  { conversation, room, token, url }  │                                    │
       │◀─────────────────────────────────────│                                    │
       │                                                                            │
       │  wss:// connect (token)                                                    │
       │───────────────────────────────────────────────────────────────────────────▶│
       │                                                                            │
       │◀═════════ real-time audio + transcript ═══════════════════════════════════▶│
       │                                                                            │
       │                                       │  session.ended                      │
       │                                       │◀───────────────────────────────────│
       │                                       │
       │                                       │  evaluate + mark completed

Your server calls POST /v1/agents/{id}/conversations — we provision a realtime voice session, dispatch the agent, and return a short-lived token + URL. Your browser or SDK connects to the session using that token. Audio, transcripts, and tool calls all flow over the session; our server receives the lifecycle events and persists the transcript and evaluation.

What to read next

Quickstart

Create an agent and place your first test call.

Tools

Give your agent access to your backend, the caller’s device, or built-in actions like end_call.

Webhooks

Receive conversation.started, conversation.ended, message.created webhooks.

API Reference

Full schemas for /v1/agents, /v1/tools, /v1/conversations.