Streaming

Generate and play audio in real-time using chunked transfer encoding

Overview

The streaming endpoint delivers audio chunks as they’re generated, so your application can start playback before the full audio is ready. This is ideal for long-form content and low-latency applications.

Speech endpointStream endpoint
Character limit2,00020,000
Response formatBase64 JSON + metadataRaw audio chunks
Playback startAfter full generationImmediately

Usage

1import subprocess
2from speechify import Speechify
3
4client = Speechify()
5
6with client.tts.audio.stream(
7 input="Your long-form text content here...",
8 voice_id="george",
9 audio_format="mp3",
10) as stream:
11 # Write chunks to file as they arrive
12 with open("output.mp3", "wb") as f:
13 for chunk in stream:
14 f.write(chunk)

Supported audio formats

FormatContent typeNotes
MP3audio/mpegBest compatibility
OGGaudio/oggGood compression, open format
AACaudio/aacApple ecosystem
PCMaudio/pcmRaw audio, lowest latency

WAV format is not available for streaming. Use the speech endpoint for WAV output.

Use cases

Automated podcast generation

Transform articles or blog posts into spoken audio for distribution

Assistive technology

Convert on-screen text to spoken audio in real-time

Voice agents

Generate conversational responses with minimal latency

Audiobook production

Process full chapters without hitting the 2K character limit

Error handling

If an error occurs during synthesis after the stream has started, the connection closes without an error message — this is a limitation of HTTP chunked responses. Errors before streaming starts return standard HTTP status codes.

To handle mid-stream failures:

  • Check the total bytes received against expected audio length
  • Implement retry logic for the remaining text

Example projects

See our Examples Repository for complete browser and server-side streaming demos.