Voice Cloning | Speechify API

Overview

Clone any voice from a 10-30 second audio sample. The cloned voice captures accent, speaking style, and tone. There’s no limit to the number of voices you can create.

Requirements

Parameter	Requirement
Duration	10-30 seconds (under 1 minute)
File size	Under 5MB
Quality	Clear speech, minimal background noise
Language	Any supported language

Use a good microphone and a quiet room. The clone quality depends directly on sample quality.

Create a cloned voice

Python

TypeScript

cURL

1 voice = client.tts.voices.create(
2     name="My Voice",
3     sample=open("voice-sample.mp3", "rb"),
4     consent=True,
5 )
6 
7 print(f"Created voice: {voice.id}")

You must confirm consent that the voice belongs to you or someone you represent.

Use the cloned voice

Pass the voice ID from the creation response to any speech endpoint:

Python

TypeScript

1 response = client.tts.audio.speech(
2     input="This is my cloned voice speaking.",
3     voice_id=voice.id,
4     audio_format="mp3",
5 )

Cloned voices also appear in the voices list alongside Speechify’s built-in voices.

Delete a voice

Remove voices you no longer need:

Python

TypeScript

cURL

1 client.tts.voices.delete(voice.id)

Console UI

You can also clone voices through the Speechify Console without writing code:

Upload a sample

Import an existing audio file containing the voice

Record a sample

Record directly from your browser

Sample recording tips

If you need text to read while recording, try this:

Listening is like riding a storytelling rollercoaster, where you can lean back and enjoy the ride without having to steer. The speaker’s voice becomes your trusty guide, leading you through twists and turns. It’s like having a personal audiobook adventure just for you! So, buckle up, and let the fun begin!

For best results:

Speak naturally at a consistent pace
Avoid whispering or shouting
Minimize pauses longer than 2 seconds
Record in a quiet environment without echo