AI Can Finally Pronounce My Name Correctly

It’s a fairly simple task to pronounce my name correctly, Saahil Desai. Saahil sounds like sawmill, while Desai sounds like decide with the last part removed. That’s all there is to it. Unfortunately, more often than not, people manage to completely mangle my name. The most common mistake is pronouncing it as Sa-heel, which is at least an honest effort, unlike its mutant counterpart that sounds like seal. Some people come up with pronunciations that defy all logic. Once, a college classmate read my name and confidently greeted me with “Hi, Seattle.”

But the mispronunciations that bother me the most don’t come from humans, they come from bots. Throughout the day, Siri reads my text messages to me through my AirPods, and she somehow manages to turn my name into Sa-hul. It’s even worse with the AI service I use to transcribe interviews. It has identified me by various names that could be mistaken for rejected members of a failed British boy band like Nigel, Sal, Michael, Daniel, Scott Hill. Silicon Valley aims to revolutionize the world with its products, but it seems that also involves changing people’s names. At least, that’s what I used to think.

But then I discovered this: Saahil Desai · Eleven Labs. It’s an AI voice called Adam from Eleven Labs, a startup specializing in voice cloning. It not only pronounces my name correctly, it does it better than I can. You see, Saahil comes from Sanskrit, a language I don’t speak. The end result is a delightful sense of familiarity, akin to finding a souvenir keychain with your name on it—except it’s in the realm of technology.

In addition to chatbots capable of writing haiku and artbots that can recreate a pizza in Picasso’s style, the advent of generative AI has brought about voicebots that can finally get my name right. Just as ChatGPT learns from internet posts, Eleven Labs has trained its voices on an extensive collection of audio clips to replicate human speech, with over 500,000 hours of audio compared to the tens or hundreds of hours used in earlier models. Mati Staniszewski, CEO of Eleven Labs, explained, “We have spent the last two years developing a new foundational model for speech. It means our model is context-aware and language agnostic and therefore better able to pick up on nuances like names, as well as delivering the intonation and emotions that reflect the textual input.” Newer voicebots incorporate data from websites dedicated to pronunciation and can even recognize correct name pronunciations from audiobooks, podcasts, or YouTube videos.

Companies like Amazon, Google, Meta, and Microsoft are also working on more advanced voicebots, though the results are still mixed. I tested the same sentence, “C’mon, it’s not that hard to say Saahil Desai,” on AI voice programs from each of them. While they all managed to handle Desai correctly, none of them pronounced Saahil perfectly. Amazon’s Polly software, perhaps worse than Siri, thought my name was something like Saaaaal. Both Google Cloud and Microsoft Azure were acceptable but still slightly mispronounced Saahil, giving it a foreign sound. Nothing compares to Eleven Labs, but Meta’s Voicebox, a breakthrough in generative AI for speech, came very close.

Now, computers can pronounce a wide range of names beyond just my own. “I noticed the same thing the other day when my student and I created a recording on Eleven Labs of CNN’s Anderson Cooper saying ‘Professor Hany Farid is a complete and total dips**t’ (it’s a long story),” shared Hany Farid, a computer scientist at UC Berkeley. “I was surprised at how well it pronounced my name. I’ve also noticed that it correctly pronounces the names of my non-American students.” Tricky names like Lupita Nyong’o and Timothée Chalamet also trip up many people, but Eleven Labs handles them with ease. Unfortunately, Pete Buttigieg’s last name becomes unintentionally humorous when pronounced by Eleven Labs.

The fact that AI voices can now pronounce uncommon names is no small achievement. They face the same challenges in pronunciation that often stump humans. Names like Giannis Antetokounmpo don’t adhere to the rules of English, while even simpler names can have multiple pronunciations or spellings. A name may still not sound quite right to our ears if an AI voice lacks the human-like color and texture. Previous generations of voice assistants, such as Siri, Alexa, and Google Assistant, simply didn’t have enough information to navigate these complexities. However, advancements in deep-learning techniques inspired by the human brain have allowed AI to better analyze patterns in pitch, rhythm, and intonation.

Herein lies the paradox of AI today. While the technology can introduce biases that exclude certain users (voice assistants more frequently misidentify words from Black speakers compared to white speakers), it can also alleviate smaller feelings of alienation. Hearing bots mispronounce my name repeatedly reminds me that my devices don’t seem to consider me in their design, even though Saahil Desai is a common name in India. My blue iPhone 12, a device that holds more of me than anything else, still manages to get the most basic aspect of my identity wrong.

However, living in a world where bots understand and pronounce our names correctly is also eerie. The same voice-cloning technology used by Eleven Labs has been employed to create believable deepfakes. AI-generated voices can mimic anyone, from a rude Taylor Swift to Joe Rogan and Ben Shapiro debating Ratatouille, or even Emma Watson reading a section of Mein Kampf. An AI scam pretending to be someone you know becomes far more convincing when the voice on the other end pronounces your name just like your loved ones do.

Once it became clear that Eleven Labs couldn’t be stumped, I decided to test it further by adding my middle name, Abhijit. The result was a terrible jumble of syllables that would never fool me. Alright, fine, I admit that saying Saahil Abhijit Desai is actually quite challenging.

Reference

Denial of responsibility! VigourTimes is an automatic aggregator of Global media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, and all materials to their authors. For any complaint, please reach us at – [email protected]. We will take necessary action within 24 hours.
Denial of responsibility! Vigour Times is an automatic aggregator of Global media. In each content, the hyperlink to the primary source is specified. All trademarks belong to their rightful owners, and all materials to their authors. For any complaint, please reach us at – [email protected]. We will take necessary action within 24 hours.
DMCA compliant image

Leave a Comment