Fagone’s article showed that Stephen Hawking identified strongly with his synthetic voice even
though he didn’t physically produce it, and changing it would be like having Hawking “change a
physical part of himself” (Fagone). This idea raises questions about the line between physical and
digital extensions of ourselves, and what parts we have ownership of. Soon, this problem of
ownership may not be one being asked by those with synthetic voices and body parts, but also for the
majority of us who rely on natural methods to produce sound. Various technological innovations have
overlapped into the emergence of the deepfake, and new technology like the Google Duplex could
allow deepfakes to affect our everyday lives in ways previously unimaginable.
Deepfakes are defined in an article from CSO as “fake video or audio recordings that look and
sound just like the real thing” (Porup). So far, the limit of my interaction with deepfakes has been
with comedic videos on the internet of politicians saying things blatantly against their official stances,
or of celebrity face juxtapositions such as The Rock’s face on Miley Cyrus’s music video of Wrecking
Ball (see below). Innovations in visual manipulation are currently more advanced than audio
manipulations, as face swap technologies are available to most consumers with a smartphone or
access to Snapchat. However voice technologies are quickly catching up with many companies
creating software which allows individuals to record their own voice for synthetic use. One group,
Lyrebird, has created a product which allows an individual to create a synthetic voice double which
sounds almost identical to a real voice. Their technology goes one step further with their Overdub
feature, which “allows you to replace recorded words and phrases with synthesized speech thats
tonally blended with the surrounding audio” (Descript). Lyrebird’s website has a feature which
allowed me to type in any word to replace what is already in the sentence, and Overdub could recite
the word with the correct intonation, thus making it nearly impossible to discern from a real voice.
This technology is already approaching the uncanny valley. With it, the combination of accurate
voice deepfakes and visual manipulation would mean we would no longer be able to distinguish what
is real on the internet. Public addresses from politicians and messages from celebrities could be faked
leaving the public unaware of what sources to trust. Google Duplex’s programs could take this
problem to a whole new level. With the combination of deepfake technology and Duplex’s AI
conversational functions, deepfakes could interact with us in real time with no human interaction.
Duplex technology could mean that any hacker could one day hypothetically steal photos and audio
of an individual and manipulate it into a simulated deepfake Facetime session, which would mean we
would lose our ability to trust even those around us. This is the true horror that will await us in the
uncanny valley, as these technologies can lead to the accurate simulation of our family and friends,
causing us to question even our closest loved ones.
Fagone, Jason. “Exclusive: The Silicon Valley Quest to Preserve Stephen Hawking's Voice.” San Francisco Chronicle. San Francisco Chronicle, August 3, 2018. https://www.sfchronicle.com/bayarea/article/The-Silicon-Valley-quest-to-preserve-Stephen 12759775.php#photo-15234266.
Lyrebird. “Lyrebird: Ultra-Realistic Voice Cloning and Text to Speech.” Descript. Descript. Accessed October 5, 2019. https://www.descript.com/lyrebird-ai.
Porup, J.M. “Deepfake Videos: How and Why They Work - and What Is at Risk.” CSO Online. IDG Communications, April 10, 2019. https://www.csoonline.com/article/3293002/deepfake videos-how-and-why-they-work.html.
YouTube. Google, October 3, 2019. https://www.youtube.com/watch?v=sJsNc8vEI3A.
Comments