Streaming

Generate audio using HTTP chunked transfer encoding

This feature is currently experimental

Overview

The streaming endpoint allows you to generate audio using HTTP chunked transfer encoding. This approach enables clients to play audio as soon as the first chunk is received, making it ideal for large documents or when minimizing initial playback latency is critical.

Key benefits:

  • Start playback immediately when the first chunk arrives
  • Process larger text inputs (up to 20,000 characters)
  • Receive continuous audio delivery until synthesis completes

Use cases

Automated podcast generation

Transform written content like news articles or blog posts into audio for distribution.

Assistive technology

Convert on-screen text to spoken audio in real-time for users with visual impairments.

API endpoint

The streaming endpoint functions similarly to the /v1/audio/speech endpoint but with two key differences:

  1. It only returns audio data (no metadata)
  2. It uses HTTP chunked transfer encoding for progressive delivery

For detailed endpoint documentation, including accepted payload and output formats, refer to our API Reference.

Technical specifications

SpecificationDetails
Character limit20,000 characters per request
(approximately 30 paragraphs or 2-5 book chapters)
Supported formatsaudio/mpeg, audio/ogg, audio/aac

If your text exceeds 20,000 characters, you’ll need to split it into multiple requests.

If an error occurs during synthesis, the connection will close without sending an error message due to limitations of HTTP chunked responses. For other errors, appropriate HTTP status codes with error messages will be returned.

Browser streaming demos

We provide complete demo projects in our Speechify AI API Examples Repository on GitHub. These examples demonstrate:

  • Authenticating end-users
  • Using Speechify API Key to issue access tokens
  • Authorizing requests to the Speechify AI API
  • Playing streaming audio in the browser