EAI Faculty Member’s Startup Uses AI to Create a Voice for the Voiceless

EAI Faculty member's research is using AI to give the voiceless a voice.

By: Tyler Wells Lynch

For Rupal Patel, AI offers an opportunity to endow speech-impaired individuals with a voice of their own. Patel, core faculty member at the Institute for Experiential AI (EAI), is the founder of VocalID, a company that creates custom synthetic voices using a combination of machine learning and speech blending. Put simply, the technology allows people who have lost or are in the process of losing their voice to reclaim it.

“When we first started this work there were only generic sounding voices, like what an ATM sounds like,” says Patel, who is also a computer science professor at Northeastern University. “So we created a way to make a synthetic voice sound more personalized by crowdsourcing voices from everyday healthy people. And then we had this technique where we would mix voices together to create a unique voice for someone who needed it.”


Your (Synthetic) Voice

Speech synthesis has been around since the 1930s with the invention of Bell Labs’ vocoder synthesizer. But as something more akin to a musical instrument than a vocal prosthesis, the vocoder had limited commercial use. In 1968, scientists unveiled the first genuine “text-to-speech” device that could convert text inputs into spoken language. It soon found its way to the most famous scientist in the world, Stephen Hawking.

But until recently, these tools were prohibitively expensive, and the degree of personalization they allowed was minimal: just a small selection of generic, “robotic-sounding” voices. In Stephen Hawking, one of those voices became so attached to his celebrity that it became, in effect, “his voice.” 

That had more to do with the lack of available speech data than a genuine preference for robotic voices. In the digital era, speech data is abundant. All that data has allowed companies like VocalID to sample real voices in order to model new ones that sound both natural and unique. Better yet, it allows people to sample their own voices in order to create one that is wholly their own.

People with severe speech disorders are not always entirely speechless. They can often still, for example, modulate or control the melody of their voice. In building VocalID, Patel and her team created a system that made use of whatever remaining control a patient had over their voice. That way, even severely impaired individuals could retain some of what made their voice unique.


Beyond Speechlessness

While Patel’s research interests are in better understanding speech production and the reasons why it breaks down, her technology has applications that extend beyond speech therapy and text-to-speech.

Last year, VocalID was acquired by Veritone, another synthetic voice company, with the idea of developing branded voices for creative purposes. The company is also working on a project with Cameo, a personalized video app, to make videos for kids featuring the voices of their favorite cartoon characters. Other applications for voice acting are also in the works. 

“My main interests are in better understanding natural speech production but also what happens when it breaks down for both kids as well as for adults,” Patel says. “And then building machines that can then compensate for or somehow enhance communication so that people can continue to interact and communicate with others.”

Part of that mission involves increased awareness of voice banking. As Patel explains, the most successful applications of VocalID require a “bank” of voice recordings that can be fed, as data, into the machine learning models that create new voices.


A New Era

Patel looks forward to a new era of innovation in conversational technologies. Advances in natural language processing (NLP) have progressed at breakneck speed in recent years, with language models and chatbots like OpenAI’s ChatGPT capturing everyone’s attention.

“I think large language models like ChatGPT and AI technology in general have incredible implications for individuals living with communication impairments,” Patel said. “This will allow for a new type of therapeutic intervention, where speech and language practice are necessary for improvement. Chatbots might also be useful to compensate for impairments that may exclude the individual from participating in educational or vocational opportunities.”

Of course, a lot of this technology is not ready for primetime. For every potential benefit there is an alarming flaw. But as Patel says, AI tools like VocalID and, yes, ChatGPT, are almost certain to help level the playing for people with communication disabilities. And that’s something to celebrate.

Learn more about Patel’s research.