Streaming | Speechify API

Overview

The streaming endpoint delivers audio chunks as they’re generated, so your application can start playback before the full audio is ready. This is ideal for long-form content and low-latency applications.

	Speech endpoint	Stream endpoint
Character limit	2,000	20,000
Response format	Base64 JSON + metadata	Raw audio chunks
Playback start	After full generation	Immediately

Usage

Python

TypeScript

cURL

1 import subprocess
2 from speechify import Speechify
3 
4 client = Speechify()
5 
6 with client.tts.audio.stream(
7     input="Your long-form text content here...",
8     voice_id="george",
9     audio_format="mp3",
10 ) as stream:
11     # Write chunks to file as they arrive
12     with open("output.mp3", "wb") as f:
13         for chunk in stream:
14             f.write(chunk)

Supported audio formats

Format	Content type	Notes
MP3	`audio/mpeg`	Best compatibility
OGG	`audio/ogg`	Good compression, open format
AAC	`audio/aac`	Apple ecosystem
PCM	`audio/pcm`	Raw audio, lowest latency

WAV format is not available for streaming. Use the speech endpoint for WAV output.

Use cases

Automated podcast generation

Transform articles or blog posts into spoken audio for distribution

Assistive technology

Convert on-screen text to spoken audio in real-time

Voice agents

Generate conversational responses with minimal latency

Audiobook production

Process full chapters without hitting the 2K character limit

Error handling

If an error occurs during synthesis after the stream has started, the connection closes without an error message — this is a limitation of HTTP chunked responses. Errors before streaming starts return standard HTTP status codes.

To handle mid-stream failures:

Check the total bytes received against expected audio length
Implement retry logic for the remaining text

Example projects

See our Examples Repository for complete browser and server-side streaming demos.