Voice Agents overview

Real-time voice conversations powered by the Speechify API

Voice Agents let you put a talking, listening AI in your product in under five minutes. An agent is a reusable definition — prompt, voice, tools, evaluation criteria — that your users can hold a conversation with over the web or (coming soon) a phone line.

What you get

  • Speechify voices — every voice from our catalog is available, including cloned voices.
  • Low-latency realtime pipeline — sub-2s perceived per-turn latency across the full conversation loop (speech in → agent response → speech out).
  • Tools — let the agent call your backend (webhook tools), run code on the caller’s device (client tools), or invoke built-ins like end_call and transfer_to_number.
  • Full transcripts — every turn persisted with timestamps and tool traces.
  • Post-call evaluation — LLM-graded criteria and structured data extraction run automatically after hang-up.

How it fits together

Browser / SDK Speechify API Speechify Realtime
───────────── ───────────── ──────────────────
│ POST /v1/agents/{id}/conversations
│─────────────────────────────────────▶│
│ │ provision session + dispatch agent
│ │───────────────────────────────────▶│
│ │ │
│ { conversation, room, token, url } │ │
│◀─────────────────────────────────────│ │
│ │
│ wss:// connect (token) │
│───────────────────────────────────────────────────────────────────────────▶│
│ │
│◀═════════ real-time audio + transcript ═══════════════════════════════════▶│
│ │
│ │ session.ended │
│ │◀───────────────────────────────────│
│ │
│ │ evaluate + mark completed

Your server calls POST /v1/agents/{id}/conversations — we provision a realtime voice session, dispatch the agent, and return a short-lived token + URL. Your browser or SDK connects to the session using that token. Audio, transcripts, and tool calls all flow over the session; our server receives the lifecycle events and persists the transcript and evaluation.