Text to Speech API

Lifelike speech in 50+ languages from a single API call. Stream long-form audio, clone any voice from a 10-30 second sample, and control delivery with SSML.

Examples Console

Text to Speech API

Lifelike speech in 50+ languages from a single API call. Stream long-form audio, clone any voice from a 10-30 second sample, and control delivery with SSML.

Your first request

Python

TypeScript

cURL

1 from speechify import Speechify
2 
3 client = Speechify()  # reads SPEECHIFY_API_KEY from the environment
4 response = client.tts.audio.speech(
5     input="Welcome to Speechify.",
6     voice_id="george",
7     audio_format="mp3",
8 )
9 
10 with open("welcome.mp3", "wb") as f:
11     f.write(response.audio_data)

Grab a key at console.speechify.ai/api-keys and set SPEECHIFY_API_KEY in your environment. Then walk through the Quickstart for the full five-minute tour.

Set up

Install an SDK

pip install speechify-api for Python, npm install @speechify/api for TypeScript. Both read SPEECHIFY_API_KEY from the environment automatically.

Authenticate

A single Authorization: Bearer key works for every endpoint. Manage and rotate keys in the console.

Build with TTS

Streaming

Start playback before the full audio is generated. Up to 20,000 characters per request.

Voice cloning

Clone any voice from a 10-30 second sample. Cloned voices work across every supported language.

SSML and emotion

Fine-grained control over pitch, rate, pauses, emphasis, and 13 emotion presets.

Speech marks

Word-level timestamps for highlighting, captions, and audio-text sync.

Models and languages

Two models cover every use case. simba-english is the flagship English model: highest quality, lowest streaming latency, and full SSML + emotion control. simba-multilingual handles 50+ languages with mixed-language input - the same voice IDs work across every language, no separate cloning required.

See Models and Language Support for the full matrix.

Resources

API Reference

Full endpoint schemas, parameters, and response shapes.

Examples

End-to-end demo projects on GitHub.

Console

Manage API keys, voices, and billing.

Your first request

Python

TypeScript

cURL

1 from speechify import Speechify
2 
3 client = Speechify()  # reads SPEECHIFY_API_KEY from the environment
4 response = client.tts.audio.speech(
5     input="Welcome to Speechify.",
6     voice_id="george",
7     audio_format="mp3",
8 )
9 
10 with open("welcome.mp3", "wb") as f:
11     f.write(response.audio_data)

Grab a key at console.speechify.ai/api-keys and set SPEECHIFY_API_KEY in your environment. Then walk through the Quickstart for the full five-minute tour.

Set up

Install an SDK

pip install speechify-api for Python, npm install @speechify/api for TypeScript. Both read SPEECHIFY_API_KEY from the environment automatically.

Authenticate

A single Authorization: Bearer key works for every endpoint. Manage and rotate keys in the console.

Build with TTS

Streaming

Start playback before the full audio is generated. Up to 20,000 characters per request.

Voice cloning

Clone any voice from a 10-30 second sample. Cloned voices work across every supported language.

SSML and emotion

Fine-grained control over pitch, rate, pauses, emphasis, and 13 emotion presets.

Speech marks

Word-level timestamps for highlighting, captions, and audio-text sync.

Models and languages

Two models cover every use case. simba-english is the flagship English model: highest quality, lowest streaming latency, and full SSML + emotion control. simba-multilingual handles 50+ languages with mixed-language input - the same voice IDs work across every language, no separate cloning required.

See Models and Language Support for the full matrix.

Resources

API Reference

Full endpoint schemas, parameters, and response shapes.

Examples

End-to-end demo projects on GitHub.

Console

Manage API keys, voices, and billing.

1	from speechify import Speechify
2
3	client = Speechify() # reads SPEECHIFY_API_KEY from the environment
4	response = client.tts.audio.speech(
5	input="Welcome to Speechify.",
6	voice_id="george",
7	audio_format="mp3",
8	)
9
10	with open("welcome.mp3", "wb") as f:
11	f.write(response.audio_data)