Production, Technology

Creating a Natural Voice using Text to Speech

Author: Sara
/ November 8, 2019

If you’ve used one of the newer text to speech services, you’ve witnessed the huge improvements this industry has seen in the past decade. The voices we have today are much more lifelike than those most people associate with “text to speech.” When you’re working with TTS, you can produce even better quality files when you follow these few simple steps.

Work sentence by sentence

Most high-quality TTS editors can generate several sentences at once, but if you’re determined to get the best sound, try creating one sentence at a time. Often you’ll see a huge improvement in both intonation and pausing when you work through each sentence individually. Plus, you can add silences between sentences more easily by working with your clips post-production (more on this below).

Add silences

Silences between words and sentences create rhythmic, natural-sounding speech. As living, breathing beings, voice actors take natural pauses to inhale. In your TTS editor, you can cue the artificial intelligence (AI) to replicate these pauses by adding commas, periods, dashes, and ellipses. Think of these punctuation marks as percussive notes, not as grammatical tools, and you’ll be well on your way to generating natural AI voice recordings.

Let me give a brief example. In this first clip, I entered the following text into WellSaid Studio. I used punctuation in a grammatically-minded way:

Text to speech is a scalable alternative to traditional voice acting.

Created using WellSaid Studio

Now, listen to the same sentence with percussive punctuation marks added to create an appealing rhythm. Notice how the sentence, while grammatically incorrect, has a natural-sounding cadence to it:

Text to speech, is a scalable alternative, to traditional voice acting.

Created using WellSaid Studio

Use inventive spelling

Modern TTS services train on neural networks. As a result, they work predictively, and this means they sometimes mispronounce words. Often this happens with words that are spelled the same but are pronounced differently. Think about the homonyms “read” as in, “I can read!” and “read,” as in “I haven’t read this book yet.” Other words that are frequently mispronounced include abbreviations like “CEO” or “USC.” A neural-trained AI voice will read these as funny short words rather than pronouncing the letters.

To get the right results, spell phonetically. You’ll sometimes need to be explicit with the text to voice editor about how you want a word pronounced, just as you would do with a voice actor. “Read” might need to be entered as “reed,” and “CEO” as “see eeh oh.”

Play with intonation

Punctuation marks not only add pausing, they also change intonation and play an important role when building a voiceover track for an eLearning course. If you want a specific word emphasized, try putting it in quotation marks. If you want a different intonation than the one you’re hearing, try seLECTive caps or ALL caps. You can also insert commas and periods before or after the word you want emphasized, as long as the resulting pause is acceptable.

Using the same example sentence I showed you above, I added some intonation marks to achieve a more lively rendering. “Scalable” is unusual enough that the editor needs a little help, so I entered “scaelable” to prompt the right phonemes.

Here’s the sentence and the audio result:

Text to speech, is a scaelable alternative, to “traditional” VOIce acting.

Created using WellSaid Studio

Edit post-production

You don’t need to be an expert to get the final polish to your WAV files with a sound editor. Many basic, inexpensive audio editing apps let you add post-production pauses. Add some silence at the start of your clips to mimic a voice actor’s inhale. Add a small amount of silence between your clips as well, and you’ve got quality, human-sounding audio production on your hands.

Credits

Photo by palesa on Unsplash
Music by purple-planet

Try WellSaid Studio

Create engaging learning experiences, trainings, and product tours.

Try WellSaid Studio

Create engaging learning experiences, trainings, and product tours.

Production, Storytelling, Voiceover Content Creation

AI Voiceovers and Instagram: How to Use Voiceovers for Reels, Stories, and More

November 13, 2024

In the last decade, not only has Instagram has become a major hub for creativity and self-expression, but a powerful marketing tool for both big and small businesses. With Reels

No audio file found.

Business, Technology, Voiceover Content Creation, Voiceover Generation

10 Ways AI Can Improve Workflows and Efficiency

October 31, 2024

Audio by Lorenzo D. using WellSaid As AI finds its way more and more into everyday life, many fear what the future holds and how the rapidly growing industry may

Guides, User Guides, Voiceover Content Creation

WellSaid Studio Tips and Tricks: Optimizing WellSaid For the Best AI Voiceover

October 30, 2024

Audio by Tilda C. using WellSaid High-quality voiceovers can be expensive and time-consuming to achieve, but with WellSaid, users can have dynamic, production-ready voiceovers in seconds. Whether you’re new to

Join the WellSaid mailing list

Get the latest news, updates and releases

Creating a Natural Voice using Text to Speech

Work sentence by sentence

Add silences

Use inventive spelling

Play with intonation

Edit post-production

Credits

Try WellSaid Studio

Try WellSaid Studio

TABLE OF CONTENTS

Related Articles

Join the WellSaid mailing list

Beautiful voices, on-demand.