🚧

Experimental

This guide provides comprehensive instructions on how to integrate HTTP chunked transfer streaming for generating audio. The endpoint accepts text input and returns an audio stream using HTTP chunked transfer encoding. This allows clients to play audio as soon as the first chunk is received, making it suitable for large documents or when latency until the first chunk is critical.

API Endpoint

The API endpoint is similar to the /v1/audio/speech endpoint, but it only returns audio data and uses HTTP Chunked Transfer encoding for returning results. The audio will be returned as soon as the first chunk is available and will be sent continuously until the whole provided text is synthesized. Also, this endpoint can accept more input in one request compared to /v1/audio/speech, the current limit is 20,000 characters. That's around 30 paragraphs or 2-5 book chapters.

For detailed endpoint documentation, including accepted payload and output, please refer to our API Reference.

Limitations

  • Max characters: 20,000. If the input text is longer, it has to be split into two requests.
  • Supported Formats:
    • audio/mpeg
    • audio/ogg
    • audio/wav
    • audio/aac
  • Error handling: Errors will return appropriate HTTP status codes with error messages. If an error occurs during synthesis, the connection will be closed without sending an error message due to the limitations of HTTP chunked responses.

Example Applications:

  • Automated Podcast Generation: Transforming written content such as news articles or blog posts into podcasts automatically saved for later distribution.
  • Assistive Technology Tools: For users with visual impairments, tools can convert text on the screen into spoken word in real-time, enhancing accessibility.

Node.js Streaming Recipe

Check the Recipes section for the list of examples that demonstrate various usages of the API.

Browser Streaming Demo Projects

We have prepared full-fledged demo projects that can be found in the Speechify AI API Examples Repository on GitHub. They cover all the core topics, including:

  • authenticating your end-users,
  • using Speechify API Key to issue access tokens to your public client application,
  • using access tokens to authorize the requests to the Speechify AI API,
  • playing streaming audio in the browser.