Emotion Control
With Speechify API it is possible to precisely control the emotion of the voice used in speech synthesis. By leveraging this feature, users can create more natural and expressive speech tailored to specific scenarios. This document focuses on how to effectively use the emotion attribute to enhance expressiveness.
The <speechify:style>
tag allows to control the emotion of the voice.
<speak>
<speechify:style emotion="angry">
How many times do I have to tell you this?
</speechify:style>
</speak>
Supported emotions are:
- angry
- cheerful
- sad
- terrified
- relaxed
- fearful
- surprised
- calm
- assertive
- energetic
- warm
- direct
- bright
Best Practices for Emotion Control
1. Match Text with Emotion
The chosen text should align with the selected emotion. For example:
- For angry, use forceful and direct language like: "I told you not to do that!"
- For cheerful, use positive and uplifting language like: "What a wonderful surprise!"
If the text contradicts the emotion, the output may feel unnatural or less expressive.
2. Sentence Length Matters
Shorter sentences yield better emotional expressiveness compared to longer, complex ones. Consider breaking long sentences into smaller ones to maximize the emotional effect.
Example:
- Better: "No! This can't be happening. I can't believe it."
- Less Expressive: "I cannot believe this is happening to me at this very moment."
3. Use Expressive Punctuation
Punctuation plays a critical role in enhancing emotional delivery. The following marks are especially effective:
- Exclamation Points (
!
) for heightened emotions like anger, excitement, or surprise. - Question Marks (
?
) for uncertainty, curiosity, or disbelief. - Ellipses (
...
) for hesitation, sadness, or suspense.
Example:
<speak>
<speechify:style emotion="fearful">
What... what was that sound?
</speechify:style>
</speak>
Emotion Examples
Below are practical examples demonstrating how text, punctuation, and emotion interact to produce desired results:
Angry
<speak>
<speechify:style emotion="angry">
Stop it! Right now!
</speechify:style>
</speak>
Cheerful
<speak>
<speechify:style emotion="cheerful">
Congratulations! You did it!
</speechify:style>
</speak>
Sad
<speak>
<speechify:style emotion="sad">
I can't believe it's over...
</speechify:style>
</speak>
Surprised
<speak>
<speechify:style emotion="surprised">
Wait, what? Are you serious?
</speechify:style>
</speak>
Advanced Considerations
For more advanced functionality, refer to the SSML Documentation to explore how <speechify:style>
integrates with broader SSML capabilities, such as <prosody>
, <break>
, and <emphasis>
.
Common Pitfalls
- Emotion Misalignment: Using emotions like "angry" or "cheerful" with neutral or contradictory text can result in awkward speech. Sometimes that is a desired outcome, for example to make speech sound sarcastic, otherwise try to keep it emotion setting and the text aligned.
- Overuse of Punctuation: While punctuation enhances expressiveness, overusing it can make speech sound unnatural.
- Long Sentences: Avoid long-winded sentences, as they can dilute the emotional emphasis.
Updated about 2 months ago