In the RTÉ archives, there is almost four decades of Charlie Bird’s voice. Last year, his wife Claire took time off work to begin the long process of sifting through recordings, narrowing them down to three hours of clear, crisp audio. Charlie’s voice from the past will ensure his voice in the future.
Last October, the veteran broadcaster received a diagnosis of motor neuron disease (MND), the degenerative condition that is already eroding his ability to speak. By the time he sets out to climb Croagh Patrick for charity in April, he estimates, it could be gone altogether.
Existing replacement technology includes stock artificial voices: an Irish woman’s voice or a man with an English accent. For Bird, however, there is a potential alternative – after an RTÉ producer put him in touch with a couple of Irish tech innovators. They have developed on an artificial intelligence driven simulation that means Bird will be able to carry on talking as normal, even when he can no longer speak.
The technology is cutting edge despite appearing straightforward. A person records themselves talking and feeds clean audio into an algorithm that analyses and reproduces the content. When they type a sentence into a laptop or tablet, it comes out of a speaker a few seconds later, indistinguishable from the real thing.
The available technology has improved three-fold, rendering a cloned voice virtually indistinguishable from its human source
“It’s given me a whole new lease of life,” Bird says of what lies ahead. “I can now talk to my kids, my grandkids, with my own voice.” With his own voice now audibly deteriorating, he is clearly buoyed by the technology. His mission now is twofold – to promote its potential use among anyone with a condition that threatens their power of speech, and to encourage them to begin “banking” their words as soon as they can.
Based on existing software, the solution has been custom developed by Keith Davey, founder of Marino Software in Dublin and Trevor Vaugh, assistant professor at the Department of Design Innovation in NUI Maynooth. The pair had developed a similar voice substitute for an episode of RTÉ’s Big Life Fix three years ago. Today, the available technology has improved three-fold, rendering a cloned voice virtually indistinguishable from its human source.
As well as the option to type in sentences, Bird will place “beacons” in key areas that can detect his presence and automatically present stock phrases on his device – everything from wanting a coffee in the kitchen, to calling his dog Tiger for a walk, or even ordering a pint in his local (where one beacon is due to be positioned).
“Start recording your voice now so that in two years you can have a bank of your own voice,” he says, zeroing in on what, for him, is the key message in promoting the software.
Progressive neurological conditions
The voice-banking approach can be used by anybody with a progressive neurological condition. MND is an obvious one given its symptoms can progress rapidly, but it can also be an option for those whose speech is affected by surgery for head and neck cancers.
There are existing options but many are uncomfortable with them. “Off the shelf voices” are limited, and include a male voice with an English accent. For those pushing enhanced technology, the importance of retaining a person’s natural sound is key.
“I think we confuse communication,” says Prof Vaugh. “We sometimes think the words are enough, no matter who says them or what is said. But it’s part of who we are, it’s more than just words. It’s how you say them, it’s the meaning behind them.”
He notes that Stephen Hawking – whose synthesised vocals are undoubtedly the most familiar – rejected the option to upgrade his technology as it had come to form a central part of his identity. For others, the options are getting better but still require development.
“Basically very smart people have created very smart algorithms,” says Prof Vaugh. “It’s very hard to get funding for what we are doing. I think a lot of the funding agencies are looking for patents and IP (intellectual property) and they don’t see the impact it can have on a human as the most important thing.”
At Marino Software they are just beginning to consider how best to attract finance that would allow the technology benefit people on a broad scale – but their vision is clear.
Davey explains that the entire process of isolating audio and feeding it into a computer currently takes a few weeks but could be reduced to a matter of days. Ultimately he would like to see it available as a simple app that can be used on a home computer.
The bulk of the work is in isolating natural, clean audio. The algorithm looks for tonality in the voice by breaking the words down into phonemes, the distinct units of sound in languages. To some extent it is a computer impersonation, but a clinical one.
What is critical, and sometimes difficult Davey says, is that users need to record themselves sounding natural – all too often people have a tendency to adopt a formal “recording” voice which can hamper the desired result of preserving a familiar identity.
“You are trying to find the best representation of audio that will be fed into the algorithm,” he says. “You have to do a lot of trial and error.”
In Charlie Bird’s case, the opposite was the problem – audio delivered and recorded professionally was almost too good, but they got there in the end thanks to hours of editing by his wife Claire. Bird plays a sample of a sentence typed into his device, something about a gin and tonic, and taking Tiger for a walk. It is almost disturbingly precise and demonstrates the power of maintaining such an intimate part of human identity.
“The whole point of this is that we want this to be available to anyone who has any issues with their voice,” Claire says. “I feel like I am communicating with him. It’s his voice.”