Emotion Control
Precisely control the emotion of the voice used in speech synthesis.
With Speechify API it is possible to precisely control the emotion of the voice used in speech synthesis. By leveraging this feature, users can create more natural and expressive speech tailored to specific scenarios. This document focuses on how to effectively use the emotion attribute to enhance expressiveness.
Overview
The <speechify:style>
tag allows you to control the emotion of the voice, creating more expressive and natural-sounding speech synthesis.
Supported Emotions
Speechify API supports a range of emotions to enhance your speech synthesis:
angry
Forceful, intense expression
cheerful
Upbeat, positive tone
sad
Downcast, melancholic delivery
terrified
Extreme fear expression
relaxed
Calm, at-ease delivery
fearful
Anxious, worried tone
surprised
Astonished, unexpected reaction
calm
Tranquil, peaceful delivery
assertive
Confident, authoritative tone
energetic
Dynamic, lively expression
warm
Friendly, inviting delivery
direct
Straightforward, clear tone
bright
Optimistic, cheerful delivery
Best Practices for Emotion Control
Match Text with Emotion
The chosen text should align with the selected emotion for natural-sounding speech.
If the text contradicts the emotion, the output may feel unnatural or less expressive.
Sentence Length Matters
Shorter sentences yield better emotional expressiveness compared to longer, complex ones.
Consider breaking long sentences into smaller ones to maximize the emotional effect:
Use Expressive Punctuations
Punctuations play a critical role in enhancing emotional delivery:
Exclamation Points (!)
Use for heightened emotions like anger, excitement, or surprise.
Question Marks (?)
Use for uncertainty, curiosity, or disbelief.
Ellipses (...)
Use for hesitation, sadness, or suspense.
Example:
Emotion Examples
Below are practical examples demonstrating how text, punctuation, and emotion interact to produce desired results.
Advanced Considerations
For more advanced functionality, refer to the SSML Documentation to explore how <speechify:style>
integrates with broader SSML capabilities, such as <prosody>
, <break>
, and <emphasis>
.
Common Pitfalls
Emotion Misalignment
Using emotions like “angry” or “cheerful” with neutral or contradictory text can result in awkward speech. Sometimes that is a desired outcome, for example to make speech sound sarcastic, otherwise try to keep the emotion setting and the text aligned.
Overuse of Punctuation
While punctuation enhances expressiveness, overusing it can make speech sound unnatural.
Long Sentences
Avoid long-winded sentences, as they can dilute the emotional emphasis.