Speech Synthesis Markup Language (SSML)

Control speech synthesis with markup language

Speech Synthesis Markup Language (SSML)

Control speech synthesis with markup language

SSML is an XML-based markup language for controlling pitch, rate, pauses, emphasis, and emotion in synthesized speech. Wrap your content in a <speak> tag:

1 <speak>Your content to be synthesized here</speak>

Escaping Characters

Transforming text into SSML requires escaping certain characters to ensure correct interpretation:

Character	Escaped Form
`&`	`&`
`>`	`>`
`<`	`<`
`"`	`"`
`'`	`'`

1 <!-- Original: Some "text" with 5 < 6 & 4 > 8 in it -->
2 <speak>Some &quot;text&quot; with 5 &lt; 6 &amp; 4 &gt; 8 in it</speak>

Supported SSML Tags

prosody

The prosody tag controls the expressiveness of synthesized speech by manipulating pitch, rate, and volume.

1 <speak>
2     This is a normal speech pattern.
3     <prosody pitch="high" rate="fast" volume="+20%">
4         I'm speaking with a higher pitch, faster than usual, and louder!
5     </prosody>
6     Back to normal speech pattern.
7 </speak>

Parameters

pitch

string

Adjusts the pitch of speech delivery.

Values:

x-low, low, medium (default), high, x-high
Percentage adjustments: -83% to +100% (e.g., +20%, -30%)

rate

string

Alters speech speed.

Values:

x-slow, slow, medium (default), fast, x-fast
Percentage adjustments: -50% to +9900% (e.g., +20%, -30%)

volume

string

Controls speech loudness.

Values:

silent, x-soft, medium (default), loud, x-loud
Decibel adjustments: Number with dB suffix (e.g., -6dB)
Percentage adjustments (e.g., +20%, -30%)

break

The break tag controls pausing between words, following W3 specifications.

1 <speak>
2     Sometimes it can be useful to add a longer pause at the end of the sentence.
3     <break strength="medium" />
4     Or <break time="100ms" /> sometimes in the <break time="1s" /> middle.
5 </speak>

Parameters

strength

string

Specifies pause strength.

Values:

none: 0ms
x-weak: 250ms
weak: 500ms
medium: 750ms
strong: 1000ms
x-strong: 1250ms

time

string

Specifies pause duration (0-10 seconds).

Values:

Milliseconds: ms suffix (e.g., 100ms)
Seconds: s suffix (e.g., 1s)

emphasis

The emphasis tag adds or removes emphasis from text, modifying speech similarly to prosody but without setting individual attributes.

1 <speak>
2     I already told you I <emphasis level="strong">really like</emphasis> that person.
3 </speak>

Parameters

level

string

Specifies emphasis level.

Values:

reduced
moderate
strong

sub

The sub tag replaces pronunciation for contained text, following W3 specifications.

1 <speak>
2     For detailed information, please read the <sub alias="Frequently Asked Questions">FAQ</sub> section.
3 </speak>

Parameters

alias

stringRequired

Specifies text to be spoken instead of enclosed text.

speechify:style

The speechify:style tag controls emotion of the voice. See Emotion Control for the full list of 13 supported emotions and best practices.

1 <speak>
2     <speechify:style emotion="cheerful">Great news! Your order shipped!</speechify:style>
3 </speak>

Parameters

emotion

string

Sets the voice emotion. Values: angry, cheerful, sad, terrified, relaxed, fearful, surprised, calm, assertive, energetic, warm, direct, bright.

Examples

Basic SSML

Prosody Control

Pauses & Emphasis

Emotional Styling

1 <speak>Welcome to Speechify's Text-to-Speech service.</speak>

SSML is an XML-based markup language for controlling pitch, rate, pauses, emphasis, and emotion in synthesized speech. Wrap your content in a <speak> tag:

1 <speak>Your content to be synthesized here</speak>

Escaping Characters

Transforming text into SSML requires escaping certain characters to ensure correct interpretation:

Character	Escaped Form
`&`	`&`
`>`	`>`
`<`	`<`
`"`	`"`
`'`	`'`

1 <!-- Original: Some "text" with 5 < 6 & 4 > 8 in it -->
2 <speak>Some &quot;text&quot; with 5 &lt; 6 &amp; 4 &gt; 8 in it</speak>

Supported SSML Tags

prosody

The prosody tag controls the expressiveness of synthesized speech by manipulating pitch, rate, and volume.

1 <speak>
2     This is a normal speech pattern.
3     <prosody pitch="high" rate="fast" volume="+20%">
4         I'm speaking with a higher pitch, faster than usual, and louder!
5     </prosody>
6     Back to normal speech pattern.
7 </speak>

Parameters

pitch

string

Adjusts the pitch of speech delivery.

Values:

x-low, low, medium (default), high, x-high
Percentage adjustments: -83% to +100% (e.g., +20%, -30%)

rate

string

Alters speech speed.

Values:

x-slow, slow, medium (default), fast, x-fast
Percentage adjustments: -50% to +9900% (e.g., +20%, -30%)

volume

string

Controls speech loudness.

Values:

silent, x-soft, medium (default), loud, x-loud
Decibel adjustments: Number with dB suffix (e.g., -6dB)
Percentage adjustments (e.g., +20%, -30%)

break

The break tag controls pausing between words, following W3 specifications.

1 <speak>
2     Sometimes it can be useful to add a longer pause at the end of the sentence.
3     <break strength="medium" />
4     Or <break time="100ms" /> sometimes in the <break time="1s" /> middle.
5 </speak>

Parameters

strength

string

Specifies pause strength.

Values:

none: 0ms
x-weak: 250ms
weak: 500ms
medium: 750ms
strong: 1000ms
x-strong: 1250ms

time

string

Specifies pause duration (0-10 seconds).

Values:

Milliseconds: ms suffix (e.g., 100ms)
Seconds: s suffix (e.g., 1s)

emphasis

The emphasis tag adds or removes emphasis from text, modifying speech similarly to prosody but without setting individual attributes.

1 <speak>
2     I already told you I <emphasis level="strong">really like</emphasis> that person.
3 </speak>

Parameters

level

string

Specifies emphasis level.

Values:

reduced
moderate
strong

sub

The sub tag replaces pronunciation for contained text, following W3 specifications.

1 <speak>
2     For detailed information, please read the <sub alias="Frequently Asked Questions">FAQ</sub> section.
3 </speak>

Parameters

alias

stringRequired

Specifies text to be spoken instead of enclosed text.

speechify:style

The speechify:style tag controls emotion of the voice. See Emotion Control for the full list of 13 supported emotions and best practices.

1 <speak>
2     <speechify:style emotion="cheerful">Great news! Your order shipped!</speechify:style>
3 </speak>

Parameters

emotion

string

Sets the voice emotion. Values: angry, cheerful, sad, terrified, relaxed, fearful, surprised, calm, assertive, energetic, warm, direct, bright.

Examples

Basic SSML

Prosody Control

Pauses & Emphasis

Emotional Styling

1 <speak>Welcome to Speechify's Text-to-Speech service.</speak>

1	<!-- Original: Some "text" with 5 < 6 & 4 > 8 in it -->
2	<speak>Some "text" with 5 < 6 & 4 > 8 in it</speak>

1	<speak>
2	This is a normal speech pattern.
3	<prosody pitch="high" rate="fast" volume="+20%">
4	I'm speaking with a higher pitch, faster than usual, and louder!
5	</prosody>
6	Back to normal speech pattern.
7	</speak>

1	<speak>
2	Sometimes it can be useful to add a longer pause at the end of the sentence.
3	<break strength="medium" />
4	Or <break time="100ms" /> sometimes in the <break time="1s" /> middle.
5	</speak>

1	<speak>
2	I already told you I <emphasis level="strong">really like</emphasis> that person.
3	</speak>

1	<speak>
2	For detailed information, please read the <sub alias="Frequently Asked Questions">FAQ</sub> section.
3	</speak>

1	<speak>
2	<speechify:style emotion="cheerful">Great news! Your order shipped!</speechify:style>
3	</speak>