Emotion Control

Precisely control the emotion of the voice used in speech synthesis.

With Speechify API it is possible to precisely control the emotion of the voice used in speech synthesis. By leveraging this feature, users can create more natural and expressive speech tailored to specific scenarios. This document focuses on how to effectively use the emotion attribute to enhance expressiveness.

Overview

The <speechify:style> tag allows you to control the emotion of the voice, creating more expressive and natural-sounding speech synthesis.

1<speak>
2 <speechify:style emotion="angry">
3 How many times do I have to tell you this?
4 </speechify:style>
5</speak>

Supported Emotions

Speechify API supports a range of emotions to enhance your speech synthesis:

angry

Forceful, intense expression

cheerful

Upbeat, positive tone

sad

Downcast, melancholic delivery

terrified

Extreme fear expression

relaxed

Calm, at-ease delivery

fearful

Anxious, worried tone

surprised

Astonished, unexpected reaction

calm

Tranquil, peaceful delivery

assertive

Confident, authoritative tone

energetic

Dynamic, lively expression

warm

Friendly, inviting delivery

direct

Straightforward, clear tone

bright

Optimistic, cheerful delivery

Best Practices for Emotion Control

Match Text with Emotion

The chosen text should align with the selected emotion for natural-sounding speech.

1<speak>
2 <speechify:style emotion="angry">
3 I told you not to do that!
4 </speechify:style>
5</speak>

If the text contradicts the emotion, the output may feel unnatural or less expressive.

Sentence Length Matters

Shorter sentences yield better emotional expressiveness compared to longer, complex ones.

Consider breaking long sentences into smaller ones to maximize the emotional effect:

1<speak>
2 <speechify:style emotion="fearful">
3 No! This can't be happening. I can't believe it.
4 </speechify:style>
5</speak>

Use Expressive Punctuations

Punctuations play a critical role in enhancing emotional delivery:

Use for heightened emotions like anger, excitement, or surprise.

Use for uncertainty, curiosity, or disbelief.

Use for hesitation, sadness, or suspense.

Example:

1<speak>
2 <speechify:style emotion="fearful">
3 What... what was that sound?
4 </speechify:style>
5</speak>

Emotion Examples

Below are practical examples demonstrating how text, punctuation, and emotion interact to produce desired results.

1<speak>
2 <speechify:style emotion="angry">Stop it! Right now!</speechify:style>
3</speak>

Advanced Considerations

For more advanced functionality, refer to the SSML Documentation to explore how <speechify:style> integrates with broader SSML capabilities, such as <prosody>, <break>, and <emphasis>.

Common Pitfalls

Avoid these common mistakes when using emotion control

Using emotions like “angry” or “cheerful” with neutral or contradictory text can result in awkward speech. Sometimes that is a desired outcome, for example to make speech sound sarcastic, otherwise try to keep the emotion setting and the text aligned.

While punctuation enhances expressiveness, overusing it can make speech sound unnatural.

Avoid long-winded sentences, as they can dilute the emotional emphasis.