Models
Speechify's advanced text-to-speech models are designed to meet specific user needs, from simple text reading to complex multilingual and emotional tone integration. This page describes the models that are available through the API.
Simba English
Speechify's Simba English text-to-speech model offers standard capabilities designed to deliver clear and natural voice output for reading texts. The model focuses on delivering a consistent user experience, supporting fine-tuning, and zero-shot voice cloning. The audio output of this model is distinctively different from other models.
Key Features
- Voice Clarity: Produces clear and natural speech.
- Consistency: Maintains uniform quality across all outputs.
- Zero-shot voice cloning: Creates a voice clone from a short audio sample.
- Fine-tuning: Creates a voice clone from hours of the speaker's audio, providing significantly better results than zero-shot voice cloning.
Supported Languages
- English
Simba Multilingual
Experimental
Simba Multilingual allows the usage of all supported languages and supports using multiple languages within a single sentence. The audio output of this model is distinctively different from other models.
Key Features
- Language Flexibility: Supports multiple languages within a single sentence.
- Zero-shot voice cloning: Creates a voice clone from a short audio sample.
- Fine-tuning: Creates a voice clone from hours of the speaker's audio, providing significantly better results than zero-shot voice cloning.
Supported Languages
- English
- Spanish
- French
Simba Turbo
Experimental
Simba Turbo is a text-to-speech model that emphasizes faster processing speeds and the ability to control emotional tones in the voice output. Tailored for users who require quick response times and prefer to adjust the emotional undertones to better match the context of the text being read. The audio output of this model is distinctively different from other models.
Key Features
- Speed: Delivers faster processing to reduce wait times.
- Emotional Control: Enables control over emotional expressions to match the context of the text.
- Speech Cadence Control: Allows adjustment of speech flow for dynamic presentations.
- Zero-shot voice cloning: Creates a voice clone from a short audio sample.
- Fine-tuning: Creates a voice clone from hours of the speaker's audio, providing significantly better results than zero-shot voice cloning.
Supported Languages
- English
Updated 7 months ago