Go to Menu
Celebrating 25 Years of Voice! 🎉

SSML, TTS, and STEM Accessibility: What Educators Need To Know

SSML can be a powerful tool for making STEM lessons accessible, but you don’t have to be an expert. Here’s what you need to know before you dive in.

July 16, 2024 by Amy Foxwell
Teacher asking a question to the class - SSML, TTS, and STEM accessibility: what educators need to know

If you’ve dipped your toes into the world of text-to-speech (TTS) accessibility, you might have come across something called SSML. Short for Speech Synthesis Markup Language, SSML provides a way for web developers to improve the quality of TTS outputs when web readers or screen readers speak their content aloud.

If you’re an educator building STEM lessons for your students and you want to make sure those lessons can be read smoothly by a TTS engine, you might be wondering at this point whether you need to get familiar with SSML. On paper, it sounds like a great solution for your needs—and in some situations, it can be.

However, if this seems intimidating, don’t panic. There are two super simple ways to make your math expressions TTS-ready that should work for most of your use cases. Then you can use SSML—if you choose—to fine tune your users’ listening experience.

But let’s start with the basics first.

Getting started with TTS for math can feel overwhelming, but our experts at ReadSpeaker are always happy to share their guidance. Start with our two STEM accessibility webinars: You can find them here and here.

Turning Speech Synthesis Markup Language (SSML) to Voice for STEM Accessibility

What is SSML?

SSML is a markup language based on the XML system, which you may have heard of. Just as XML (and its close relative, HTML) allows developers to create web pages with customized visual elements, SSML provides developers with a way to customize how their content sounds when verbally spoken aloud by a TTS reader.

If you’ve ever seen behind-the-scenes XML-based code, SSML code will probably look familiar to you. Here’s an example of how a developer might use a handful of SSML tags to tell a text-to-speech program how to pronounce certain elements on the page:

A TTS program will recognize the tags used above and follow the instructions. The final speech output would be:

Here are S S M L samples. I can pause [3 second pause]. I can play a sound [audio file plays].
I can speak in cardinals. Your number is ten.
Or I can speak in ordinals. You are tenth in line.
Or I can even speak in digits. The digits for ten are one oh.
I can also substitute phrases, like the World Wide Web Consortium.
Finally, I can speak a paragraph with two sentences. This is sentence one. This is sentence two.

For a helpful beginner’s tutorial of SSML—including a breakdown and explanation of the code behind this sample text—check out Google’s SSML documentation. Microsoft offers similar SSML documentation that can help you learn the most useful SSML elements and how to customize pronunciation to your liking.

Amazon’s SSML developer documentation is geared toward programmers writing apps for AWS Alexa, so you’ll want to steer clear of the Amazon-specific API commands. However, we’ll include it here as a valuable resource because a quick skim-through of the docs should give you an idea of everything that SSML is capable of—if you’re willing to put in the time and effort to learn it.

Try not to get bogged down by the complexity of those tutorials.

While you could spend hours mastering SSML concepts like prosody, speaking rate, phonemes, and customization of different voices (if that sort of thing piques your interest), you really only need to get the gist of how to accomplish your accessibility goals. Spend a few minutes glancing through the tutorials and making notes of the basic functionality and tags that seem helpful for your situation, and then you’ll be off to the races.

We recommend jotting down the syntax for the following, as you’ll likely use them frequently:

  • embedding an audio file: <audio src= “(the source file for your audio clip)”> </audio>
  • the “say as” command: <say-as interpret as= “(how you’d like the text to be interpreted)”> </say-as>
  • and the different interpretations you could use within the “say as” command, such as “characters,” “ordinal,” and “cardinal” as in the example above

Why is SSML useful for STEM accessibility?

Math expressions are notoriously difficult for TTS engines to understand and speak. Part of the reason is because many people simply don’t understand how to create accessible math content.

Typically, educators create a web page or a document with images of math equations embedded throughout the text. The problem is that TTS programs can’t read the content inside images. They’ll read through the plain text just fine but then skip over the image entirely, like it doesn’t even exist. This leaves visually impaired and dyslexic students to try and interpret the image themselves with no assistive technology.

SSML solves this problem because it gives lesson creators a way to tell the TTS program exactly what to say. If there’s a complex math expression displayed in a visual image within your lesson, you can use SSML to encode pauses, characters, and the correct phonetic pronunciation of symbols as they appear in the image exactly as they would sound if you were speaking them aloud in the classroom.

In fact, you could even create high-quality recordings of your own voice and have the code fetch your custom .wav audio files using the <audio> tag that appears in the example code above. This is particularly helpful if you ever encounter a situation where the synthesized speech of a TTS program is difficult to understand.

SSML also helps educators maintain a level of control over how their material is presented to students. For example, if you want students to hear x3 as “x cubed” or “x to the third power” rather than “x superscript three” or “x three” as would likely happen by default, you can customize the audio content to your liking.

You’d simply write…

<speak> x cubed </speak>

…and the program will say “x cubed.”

Do you need to learn SSML to make your math content accessible for students who need audio assistance?

No, you don’t need to learn SSML.

If you want to make your math and science content accessible without having to learn any new coding syntax or languages, you can do that.

The simplest way to make your STEM lessons accessible is to use the “what you see is what you get” WYSIWYG equation editor in your LMS of choice.

The equation editor will help you build math expressions that retain the proper formatting when viewed as part of the lesson. Then, all you have to do is find your LMS’s export options and choose MathML, LaTeX, or MathType. If you’re using a STEM-focused TTS engine such as ours at ReadSpeaker, the program will automatically convert your work to MathJax behind the scenes so it can be read aloud easily.

Alternatively, you could create your lesson in a document editor such as Microsoft Word.

Use the built-in equation editor in Word or upload your own custom images to make your math expressions look how you want them to. Next, right click on the uploaded or generated image and edit the alt text. Input the phrases, characters, symbols, and any formatting notes you’d like the TTS program to read. Be sure to give it a listen before you finalize the lesson just to make sure your expressions make sense when spoken out loud. And then you’re good to go!

Either of these options will be sufficient for most use cases. However, you may encounter some instances where you’d like more control over the voice output. You may want to add extra pauses, change how the program reads acronyms and abbreviations, or insert a particular audio clip that you think would be helpful. In these situations, it can be a huge time-saver if you have a little bit of SSML knowledge in your back pocket.

It’s worth noting that SSML offers the option to translate your content into different languages. If your lessons will be available online in English, you can make them accessible to a global audience by including other language selections. You can add Chinese, French, Spanish, Arabic, German, Italian, Portuguese, Japanese, and plenty of other language selections simply by adding the correct tags. The default English tag is <en-us> for English-U.S., but there are vast lists of country codes you can add to make sure anyone, anywhere can obtain value from your lessons.

Also remember: Your TTS platform can make or break the user experience!

ReadSpeaker was designed with STEM in mind. That’s because most other TTS platforms fall extremely short when it comes to science and math content. With ReadSpeaker, you can use our customizable pronunciation library to make sure your students have the best learning experience possible. And with our broad support of MathML, LaTeX, MathType, SSML, and a number of popular LMS editors, you can build lessons however you want and make them audio accessible with ease.

Contact us to ask questions or get started right away.

Related articles
Start using Text to Speech today

Make your products more engaging with our voice solutions.

Contact us