Text-To-Speech Application

McMahon Bolton

Dec 29, 2022 • 5 min read

Creating speech software is hard. No matter what you are trying to say, it's difficult to convey it in such a way that a computer understands without feeling like it was somehow programmed to do so.

You've probably noticed that text to speech (TTS) is used commonly in modern technology. Apps such as Slack, WhatsApp, and Facebook Messenger all use it, and so do websites and online services. The most notable uses of text to speech are automated phone calls from businesses and brands. When a user interacts with an automated phone call, they are often presented with options of what to say next using text to speech. For example, if you are calling to make an order, you might be asked "Would you like to order today or tomorrow?" Your options are "Order now" or "I'll come back later." With so much demand for TTS solutions, it is no wonder that there are various platforms, devices, and applications focused on creating a natural sounding voice.

If you're looking to tackle this problem yourself, consider using an open-source project called [Alfred], which stands for “Alfred is a nimble text-to-speech engine that can be built in a few hours.”[1]

Alfred was designed to be simple to use and extendable. It was built with accessibility in mind and provides users with a variety of features including:

Text-to-Speech Synthesis
Voice Acting
Pause
Reduce
Reverse
Adjust Pitch
Volume Control
Multiple Voices
Custom Envelopes
Read Text
Spell Check
Text Content Categorization
Document Classification
Custom Tone
Auto-complete
Auto-save
And More!

With all of its extensibility and power comes a steep learning curve. If you're looking to dive into the world of text to speech, here are a few tips on how to get started.

The Good, The Bad, And The Ugly Of Text-to-Speech

When it comes to text-to-speech, there are three main things to keep in mind: the good, the bad, and the ugly. We will now discuss each one separately.

The Good

When developing a text to speech application, the first thing to keep in mind is that not all text is created equal. Some texts are easier to convert into speech than others. For example, plain text with no formatting, such as “Hello world,” is relatively easy to speak. With just a few hours of work, you could have a fully functional text-to-speech application.

If, however, you're working with an experienced developer, you might consider using a template or samples. Templates are simply pre-made chunks of text containing example sentences or phrases for you to cut and paste into your text. This can make the process of converting text into speech a lot more efficient and less error-prone. Samples are similar but contain a few words in place of examples, leaving more room for you to add flavor to the text.

The Bad

Just because your text is easy to convert into speech does not mean that it will sound good when spoken by a computer. The sound of a text-to-speech application depends on many factors, one of which is the voice used to utter the text.

If you're trying to create a human-like voice, consider using natural-sounding samples or recordings of real humans speaking. You'll also need to use speech recognition software to make sure that what the computer "hears" is what you intended. There are various free and open-source speech recognition applications available online, many of which can be installed on a mobile device.

The Ugly

While not necessarily a bad thing in and of itself, the fact that your text-to-speech application is “ugly” means that it does not look like a human would speak. One way to create a more lifelike experience for your users is by adding visual effects such as pitch shifting, speed changes, and changes in volume.

If you're looking for a fully featured text-to-speech application, consider investing in a premium version of a text-to-speech engine from a brand name company. The good news is that these companies usually offer very good SDKs (software development kits) that make integrating their products into your application a seamless process.

Getting Started With Text-to-Speech In Android

As mentioned above, there are various free and open-source alternatives to purchasing a premium text-to-speech product. One great option for Android users is Open Source TTS (Text-to-Speech Synthesis for Android).

This application allows you to generate speech using the Google-built text-to-speech engine, Open Accessory. The app itself is free, and the only “extras” you'll need to purchase are the voice types you're interested in using (such as female, child, or masculine voices).

To install Open Source TTS on your Android device, launch the Google Play Store and search for “Text-to-Speech Synthesis”. Once the app's page loads, click the “Install” button to begin the process.

When the download finishes, open up the app and you'll be presented with a screen asking you to choose a voice.

From here, you can choose from one of the voices available on Google Assistant or download additional voices from the Google Play Store.

You can also use the Google Search bar to find and listen to samples of any voice.

Getting Started With Text-to-Speech In iOS

On the other hand, iOS users can opt to use either the built-in text-to-speech engine or [Alfred] to create custom voices. With [Alfred], you can choose from over 60 different voices, including ones that replicate celebrity accents, and download them all directly to your iPhone.

If you decide to go with the built-in voice, you'll need to enable the “Speech” option under the “Settings” icon in the “Language and Text” area of iOS' “Personal” Settings tab.

You can then go into the “Voice” section of Settings and choose a voice that you'd like to use (or build your own custom voice using one of the many different languages offered by Apple).

Once you've made your selection, click the “Use That Voice” button to begin using the built-in text-to-speech engine whenever you want. You can also download additional voices from the Apple App Store.

Which One Should You Go With?

Deciding which text-to-speech application to use can be difficult. The choice largely depends on your needs and what kind of experience you're looking for.

If you plan on using the text-to-speech application on a regular basis, Open Source TTS is a great option as it's free and offers a variety of voices, including ones that can be used for legal documents, such as those found on the Regulus platform. If the idea of creating your own voice is appealing, you might choose to explore Alfred, which offers an easy-to-use template feature that makes creating a custom voice a snap.

Ultimately, not all texts are created equal, and choosing the right text can be a difficult task. If you're looking for a solution that's easy to use and offers a good quality output, you can't go wrong with Open Source TTS.

The Good, The Bad, And The Ugly Of Text-to-Speech

Getting Started With Text-to-Speech In Android

Getting Started With Text-to-Speech In iOS

Which One Should You Go With?

Sign up for more like this.