Language Support

Languages supported by Speechify Text-to-Speech API

Speechify Text-to-Speech Models support synthesizing speech in multiple languages. Our API can handle both single-language texts and mixed-language inputs.

Fully Supported Languages

The following languages are fully supported for speech synthesis:

LanguageCode
Englishen
Frenchfr-FR
Germande-DE
Spanishes-ES
Portuguese (Brazil)pt-BR
Portuguese (Portugal)pt-PT

Beta Languages

The following languages are currently in beta (we’re actively improving them and welcome feedback):

LanguageCode
Arabicar-AE
Danishda-DK
Dutchnl-NL
Estonianet-EE
Finnishfi-FI
Greekel-GR
Hebrewhe-IL
Hindihi-IN
Italianit-IT
Japaneseja-JP
Norwegiannb-NO
Polishpl-PL
Russianru-RU
Swedishsv-SE
Turkishtr-TR
Ukrainianuk-UA
Vietnamesevi-VN

Coming Soon

We will soon support these additional languages:

LanguageCode
Belarusianbe-BY
Bengalibn-IN
Bulgarianbg-BG
Cantonesezh-HK
Catalanca-ES
Croatianhr-HR
Czechcs-CZ
Filipinofil-PH
Georgianka-GE
Gujaratigu-IN
Hungarianhu-HU
Indonesianid-ID
Japaneseja-JP
Koreanko-KR
Malayms-MY
Mandarinzh-CH
Marathimr-IN
Nepaline-NP
Persianfa-IR
Romanianro-RO
Serbiansr-RS
Slovaksk-SK
Tamilta-IN
Telugute-IN
Thaith-TH
Urduur-PK

We’re actively working on expanding this list and will update this document as new languages are added to the platform.

Using the language Parameter

Our speech synthesis endpoints (/v1/audio/speech and /v1/audio/stream) support the optional language parameter, which should follow the locale naming standard (e.g., en-US, fr-FR).

When to specify the language:

  • Known single language: If you know the input text is entirely in one language, providing the language parameter will result in better audio quality.
  • Unknown or mixed language: If you’re unsure of the input language or the text contains multiple languages, omit the language parameter. Speechify models will automatically detect and handle the language(s) in the input.

Voice Cloning Languages

There are no language limitations for voice cloning. Speechify can produce high-quality cloned voices from short samples (approximately 1 minute of speech is recommended) and use the same voice to synthesize speech in any supported language.