For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
ExamplesConsole
OverviewText to SpeechAPI ReferenceChangelog
OverviewText to SpeechAPI ReferenceChangelog
  • Get Started
    • Overview
    • Quickstart
    • Authentication
    • Models
    • API Limits
    • Official SDKs
  • Features
    • Voice Cloning
    • Language Support
    • Streaming
    • Emotion Control
    • SSML
    • Speech Marks
LogoLogo
ExamplesConsole
On this page
  • Available models
  • Simba English
  • Simba Multilingual
  • Voice cloning
  • FAQ
Get Started

Models

Choose the right text-to-speech model for your use case

Was this page helpful?
Previous

API Limits

Character limits, rate limits, and concurrency limits
Next
Built with

Available models

ModelIDLanguagesVoice CloningBest for
Simba Englishsimba-englishEnglish onlyZero-shot + fine-tuningProduction English TTS with highest quality
Simba Multilingualsimba-multilingual50+ languagesZero-shot + fine-tuningMulti-language or mixed-language content

Pass the model ID as the model parameter in your API calls. If omitted, the API defaults to simba-english.

Python
TypeScript
cURL
1response = client.tts.audio.speech(
2 input="Bonjour, comment allez-vous?",
3 voice_id="george",
4 audio_format="mp3",
5 model="simba-multilingual",
6)

Simba English

Optimized for English text-to-speech with the highest quality output.

  • Clear, natural-sounding speech
  • Consistent quality across outputs
  • Full support for SSML and emotion control
  • Zero-shot voice cloning from short audio samples
  • Fine-tuned voice cloning from hours of speaker audio (contact sales)

Simba Multilingual

This model is currently experimental and may be subject to changes.

Supports multiple languages, including mixing languages within a single sentence.

  • 6 fully supported languages, 17 in beta, 26 coming soon
  • Automatic language detection when the language parameter is omitted
  • Zero-shot voice cloning works across all supported languages
  • Fine-tuned voice cloning available (contact sales)

See Language Support for the full list.

Voice cloning

Both models support two tiers of voice cloning:

TierInputQualityAvailability
Zero-shot10-30 second audio sampleGoodSelf-serve via API or Console
Fine-tunedHours of speaker audioBestContact sales

See Voice Cloning for implementation details.

FAQ

Which model should I use?

Use Simba English if your content is English-only — it produces the highest quality output. Use Simba Multilingual if you need non-English languages or mixed-language content.

Can I switch models without changing my code?

Yes. Just change the model parameter. All other parameters (voice, format, SSML) work the same across models.

Do both models support the same voices?

Built-in system voices may differ between models. Cloned voices work with both models.