Streaming
Generate audio using HTTP chunked transfer encoding
Overview
The streaming endpoint allows you to generate audio using HTTP chunked transfer encoding. This approach enables clients to play audio as soon as the first chunk is received, making it ideal for large documents or when minimizing initial playback latency is critical.
Key benefits:
- Start playback immediately when the first chunk arrives
- Process larger text inputs (up to 20,000 characters)
- Receive continuous audio delivery until synthesis completes
Use cases
Transform written content like news articles or blog posts into audio for distribution.
Convert on-screen text to spoken audio in real-time for users with visual impairments.
API endpoint
The streaming endpoint functions similarly to the /v1/audio/speech endpoint but with two key differences:
- It only returns audio data (no metadata)
- It uses HTTP chunked transfer encoding for progressive delivery
For detailed endpoint documentation, including accepted payload and output formats, refer to our API Reference.
Technical specifications
If your text exceeds 20,000 characters, you’ll need to split it into multiple requests.
If an error occurs during synthesis, the connection will close without sending an error message due to limitations of HTTP chunked responses. For other errors, appropriate HTTP status codes with error messages will be returned.
Browser streaming demos
We provide complete demo projects in our Speechify AI API Examples Repository on GitHub. These examples demonstrate:
- Authenticating end-users
- Using Speechify API Key to issue access tokens
- Authorizing requests to the Speechify AI API
- Playing streaming audio in the browser