In a three-part series by NPR’s Planet Money, Kenny Malone and Jeff Guo explored the ability of Artificial Intelligence to produce an entire podcast. They started with scripts from OpenAI’s ChatGPT Plus, but they needed a voice. To create the most realistic AI voice possible, they turned to WellSaid Labs.
Here at WellSaid Labs, we knew we were taking on more than just creating a synthetic voice implementing our proprietary Generative AI technology. This was an opportunity to share with Planet Money’s audience why and how WellSaid builds and publishes AI voices deliberately and responsibly. To us, this is not a toy or a trivial game of memes.
AI Voice Is Not A Game, and WellSaid Means Business
WellSaid Labs creates the highest quality AI voices from the voices of real people. This means we have a social, ethical, and commercial responsibility to safeguard the individuals whose likenesses we use to create these voice avatars. This includes Planet Money’s former host Robert Smith, who volunteered to have his voice cloned by WellSaid Labs for this series. Before the fun started, and we proceeded to work our magic to wow the hosts of Planet Money and their audience, we needed to work with Mr. Smith to ensure we had his explicit consent to use his voice for this project.
From the first conversation we had with Robert to the immense feeling of gratitude and accomplishment when we saw Robert’s reaction via a screenshot shared by his colleague Kenny, we are proud to say: WellSaid Labs is the first AI company to broadcast on national radio. Real use cases enabled with the highest quality AI voices — deliberately and ethically. Move on, meme generators; WellSaid means business.
How Planet Money Built an AI Podcast Script
Planet Money was very creative in the story they chose for their first AI podcast. They explored if artificial intelligence tools will soon replace knowledge workers such as podcast hosts like themselves. Meta and clever on their part.
To the surprise of maybe those hiding under a rock, they used ChatGPT to write intros, dialogues, and even jokes. Once they sorted out how to prompt OpenAI’s tech to be as funny as a rock, they decided to write an entire podcast with it. Okay, in all fairness to rocks, their humor is more solid. You can listen to the entire episode about creating the script on Planet Money’s website.
They ended up choosing an interesting example to expand on their knowledge worker replacement premise: the history of telephone operators who were displaced by automation in the early 1900s. ChatGPT generated a script that also included a radio drama in the context of historical facts.
Spoiler alert: some of those “facts” were a bit less than factual when checked for accuracy.
Time to Clone Robert’s Voice: Enter WellSaid Labs
What’s a radio show or podcast without a voice? To complete their production, Planet Money took things to the next level by using an AI-generated voice. They cloned former host Robert Smith’s voice with WellSaid Labs. Get ready; you’re in for a treat.
We don’t cut corners when it comes to creating AI voices, not for fame or money. NPR had to run this production in accordance with WellSaid Labs ethical requirements. First, we had to obtain explicit consent from Robert. A WellSaid representative met with the real Robert Smith to discuss the details of the series and get his approval to proceed. The agreement was to create an AI voice using recordings of Robert from previous Planet Money episodes. WellSaid then committed to deleting the Voice Avatar entirely after the series finale.
Matt Hocking, also a co-founder of WellSaid Labs with CTO Michael Petrochuk, describes how earlier attempts to create synthetic voice used pieces of existing recordings to patch together speech. This is also known as concatenative computer-generated speech.
In contrast to the legacy approaches to speech synthesis, with a WellSaid AI Voice Model, Robert’s voice avatar will be able to reproduce any English word in a much more natural-sounding way — by leveraging a predictive model built with Generative AI. Synthetic Robert will be able to say words the AI has never seen while still sounding as natural as the real Robert Smith.
Understanding the Power of Generative AI Voices
Planet Money hosts quickly realized the magnitude of the responsibility they were signing up for when asking WellSaid to create an AI voice of a beloved colleague. They suddenly understood why we had so many protections throughout the process and discussed the ethical concerns of irresponsible voice cloning. These protections extend to the content moderation policies that protect the WellSaid ecosystem from dangerous and vile content creation.
While the underlying technology of Generative AI is software — 0s and 1s, bits, and data — what you hear is a replica of a real human’s voice. Because of this, we spend a lot of attention and effort doing our best to steward this technology the best we can. To expose voice cloning capabilities on the internet without guardrails is irresponsible, lazy, and unethical. We are in the business of empowering businesses and creatives to make mission-critical content for their editorials. We stand against the misuse of this technology, and this is why you’ve never seen a marketing stunt by WellSaid cloning anyone’s (dead or alive) voice without their consent or allowing people to create AI voices of celebrities and nations’ leaders just for giggles.Martín Ramírez, CRO
Building NPR’s AI Voice Model
WellSaid used recordings from previous Planet Money episodes hosted by Robert Smith to create his AI voice. Senior Machine Learning Engineer Rhyan Johnson led the project. Rhyan was in charge of ensuring every technical aspect was buttoned up and that our AI was given the right direction to get the best synthetic version of the enthusiastic and charismatic Mr. Smith. The tallest of orders, that was! After a few weeks of magical and methodical work, Rhyan presented Synthetic Robert to Kenny and Jeff. To say this was a delight falls short of conveying their awe and excitement.
After a few thousand steps, Synthetic Robert sounds like it has been held captive under the sea. As WellSaid’s core AI continues to train Synthetic Robert’s voice model, the uncanny valley quickly shrinks. The voice model goes from a static-y, unintelligible mumble to an indistinguishable replica of Robert’s voice.
This is one of the strongest moments of “WOW!” any technology has ever achieved.
Buckle up for the upcoming flex. Since we spun out of the Allen Institute for Artificial Intelligence, many companies have attempted to do what we do. Some are, to their credit, better at PR than AI. WellSaid Labs is one of the few companies with 100% proprietary architecture, methodology, and principles, which allow us to achieve human parity like no other provider, particularly over long-form content.
Extracting all the properties that make a human voice unique and contextualizing its delivery for a given domain is a complex scientific problem. There are no corners to cut. This pursuit is incrementally more challenging when you choose to innovate ethically. And these are some of the many “why’s” that it is so fulfilling to see how people react to hearing their own synthetic voices. The wow, the awe, and the trust they put in us are all essential to why WellSaid is the leading Generative AI voice company.
Robert Smith Meets “Robert Smith,” or Synthetic Robert
Synthetic Robert is now ready to meet his fully organic namesake. Kenny and Jeff set up a video call to make the introduction. The anticipation is high, expectations are blurry, and the real Robert is not 100% sure what to expect.
And then, this happened:
The shock and amazement in his (actual) voice is palpable. He quickly hit upon the potential of this technology to expand his work capacity. He joked about letting his avatar do the show while he’s at the beach, but he isn’t far from reality.
Voice actors who create WellSaid Voice Avatars have the ability to earn money from all kinds of projects while they take on more creative pursuits.
The Final Cut of NPR’s First AI Podcast and Takeaways
With a podcast script generated by ChatGPT and narrated by a WellSaid AI Voice Avatar, Planet Money made history: the first-ever National Public Radio editorial production made by Artificial Intelligence. We celebrate Kenny Malone and Jeff Guo for this innovative journalistic accomplishment. One for the books!
Having spent hours speaking with Kenny, we knew he would not be satisfied with us just doing what we do best. He will put it to the test. To truly judge the quality of the WellSaid AI voice, Kenny Malone co-hosted a Planet Money episode with Synthetic Robert.
What do you think?
WellSaid Labs did not cherry-pick what renderings NPR chose for their final cut. We trained their content producers on how to use WellSaid Studio, and let them create what they needed.
A Few Final Words (literally “final” for Synthetic Robert)
Innovation and disruption are achievable without cutting corners. This opportunity adds to WellSaid’s five-year history of building the best Generative AI voice without compromising our integrity.
As for Synthetic Robert, we hope you enjoyed listening to him. Pretty soon, he (it?) will be no more, as its purpose has been fulfilled. Bon voyage into cryptographic deletion, Synthetic Robert!