Speech Synthesis Markup Language (SSML)
Speech Synthesis Markup Language (SSML)
Speech Synthesis Markup Language (SSML)
SSML is an XML-based markup language for controlling pitch, rate, pauses, emphasis, and emotion in synthesized speech. Wrap your content in a <speak> tag:
Transforming text into SSML requires escaping certain characters to ensure correct interpretation:
The prosody tag controls the expressiveness of synthesized speech by manipulating pitch, rate, and volume.
Parameters
Adjusts the pitch of speech delivery.
Values:
x-low, low, medium (default), high, x-high-83% to +100% (e.g., +20%, -30%)Alters speech speed.
Values:
x-slow, slow, medium (default), fast, x-fast-50% to +9900% (e.g., +20%, -30%)Controls speech loudness.
Values:
silent, x-soft, medium (default), loud, x-louddB suffix (e.g., -6dB)+20%, -30%)The break tag controls pausing between words, following W3 specifications.
Parameters
Specifies pause strength.
Values:
none: 0msx-weak: 250msweak: 500msmedium: 750msstrong: 1000msx-strong: 1250msSpecifies pause duration (0-10 seconds).
Values:
ms suffix (e.g., 100ms)s suffix (e.g., 1s)The emphasis tag adds or removes emphasis from text, modifying speech similarly to prosody but without setting individual attributes.
Parameters
Specifies emphasis level.
Values:
reducedmoderatestrongThe sub tag replaces pronunciation for contained text, following W3 specifications.
Parameters
Specifies text to be spoken instead of enclosed text.
The speechify:style tag controls emotion of the voice. See Emotion Control for the full list of 13 supported emotions and best practices.
Parameters
Sets the voice emotion. Values: angry, cheerful, sad, terrified, relaxed, fearful, surprised, calm, assertive, energetic, warm, direct, bright.