Simulated Voice Technologies Make Deep Fakes a Whole 'Lot Scarier (Miller 2019)

October 05, 2019

Fagone’s article showed that Stephen Hawking identified strongly with his synthetic voice even

though he didn’t physically produce it, and changing it would be like having Hawking “change a

physical part of himself” (Fagone). This idea raises questions about the line between physical and

digital extensions of ourselves, and what parts we have ownership of. Soon, this problem of

ownership may not be one being asked by those with synthetic voices and body parts, but also for the

majority of us who rely on natural methods to produce sound. Various technological innovations have

overlapped into the emergence of the deepfake, and new technology like the Google Duplex could

allow deepfakes to affect our everyday lives in ways previously unimaginable.

Deepfakes are defined in an article from CSO as “fake video or audio recordings that look and

sound just like the real thing” (Porup). So far, the limit of my interaction with deepfakes has been

with comedic videos on the internet of politicians saying things blatantly against their official stances,

or of celebrity face juxtapositions such as The Rock’s face on Miley Cyrus’s music video of Wrecking

Ball (see below). Innovations in visual manipulation are currently more advanced than audio

manipulations, as face swap technologies are available to most consumers with a smartphone or

access to Snapchat. However voice technologies are quickly catching up with many companies

creating software which allows individuals to record their own voice for synthetic use. One group,

Lyrebird, has created a product which allows an individual to create a synthetic voice double which

sounds almost identical to a real voice. Their technology goes one step further with their Overdub

feature, which “allows you to replace recorded words and phrases with synthesized speech thats

tonally blended with the surrounding audio” (Descript). Lyrebird’s website has a feature which

allowed me to type in any word to replace what is already in the sentence, and Overdub could recite

the word with the correct intonation, thus making it nearly impossible to discern from a real voice.

This technology is already approaching the uncanny valley. With it, the combination of accurate

voice deepfakes and visual manipulation would mean we would no longer be able to distinguish what

is real on the internet. Public addresses from politicians and messages from celebrities could be faked

leaving the public unaware of what sources to trust. Google Duplex’s programs could take this

problem to a whole new level. With the combination of deepfake technology and Duplex’s AI

conversational functions, deepfakes could interact with us in real time with no human interaction.

Duplex technology could mean that any hacker could one day hypothetically steal photos and audio

of an individual and manipulate it into a simulated deepfake Facetime session, which would mean we

would lose our ability to trust even those around us. This is the true horror that will await us in the

uncanny valley, as these technologies can lead to the accurate simulation of our family and friends,

causing us to question even our closest loved ones.

Fagone, Jason. “Exclusive: The Silicon Valley Quest to Preserve Stephen Hawking's Voice.” San Francisco Chronicle. San Francisco Chronicle, August 3, 2018. https://www.sfchronicle.com/bayarea/article/The-Silicon-Valley-quest-to-preserve-Stephen 12759775.php#photo-15234266.

Lyrebird. “Lyrebird: Ultra-Realistic Voice Cloning and Text to Speech.” Descript. Descript. Accessed October 5, 2019. https://www.descript.com/lyrebird-ai.

Porup, J.M. “Deepfake Videos: How and Why They Work - and What Is at Risk.” CSO Online. IDG Communications, April 10, 2019. https://www.csoonline.com/article/3293002/deepfake videos-how-and-why-they-work.html.

YouTube. Google, October 3, 2019. https://www.youtube.com/watch?v=sJsNc8vEI3A.

Comments

Maggie said…

I think your criticism of 'deepfakes' is completely valid, and I think we can tie this back to some of our discussion on why AI is also so scary. Like we mentioned in class, the idea that there is something so similar to human likeness and yet is not human is incredibly scary to humans (some more so than others - I'm thinking about the activity last class where we lined up on a spectrum). This makes me wonder why some versions of deepfake (like Siri, maybe? Would that count?) are acceptable and yet others are too frightening. However, I think in the case of Stephen Hawking, the essential part of his synthetic voice was that it was his agency to choose what he wanted to sound like. Yes, an analysis of the automation of his voice can hint at the uncanny valley, but my first impression of the article when I read it was that Stephen Hawking was being given the chance to dictate how he wanted to sound, and thus how he wanted to be perceived. I think for an individual who was physically hindered like he was, decisions like these are more monumental than most others can perceive, and thus I think in this specific instance, Hawking's ability to choose a specific synthetic voice was incredibly valuable and important.

October 7, 2019 at 9:03 PM

Sydney Otis said…

Your post reminds me of the spectrum activity we did in class the other day. I agree that these technologies are progressing quickly and it may soon be hard to differentiate between real and the deep fake even when it concerns those close to us. I am curious as to what can, if anything, be done about the progression of these technologies. Even if a large percentage of the population is frightened by this, while big corporations continue to build upon these systems or will smaller groups with the means continue to capitalize on the benefit they can reap from it? Is there any way that the progression of these technologies can be halted past a certain point? I wonder whether there will be a point where these technologies will be declared unsafe or problematic that would lead to a nation-wide response.

October 7, 2019 at 10:24 PM

Elaine Kim said…

What effects could visual and audio manipulation have on our perceptions of our own identity? While these new technologies are currently being used to create voice doubles of politicians and celebrities, with the accessibility of programs like Lyrebird, people could easily create audio clones of their friends, family, and even themselves. If we or other people can create visual and audio clones of us, what does that mean for our understanding of identity? Identity can be defined by the physical body and mind-matter. Current technology does not allow us to clone either of those components of identity; however, it does allow us to create audio and visual clones that can trick audiences into thinking those components exist. If we see a video of our friend talking to us in a voice that sounds like our friend, we are going to believe that it is a video of our friend; we will believe that there is an identity to the being we see on the screen. Visual and audio manipulation allows us to create identities. But the emergence of visual and audio manipulation may mean that we need to start being skeptical of technology we used to trust.

October 8, 2019 at 7:45 AM

Eileen Cho said…

In a previous blog post, Sydney discussed how complete consumption in technology can lead to people avoiding real problems in the physical/real world. In class, we've roughly divided the world into three (the virtual, the physical, and the real) and came to the conclusion that living in any of these three worlds should be considered valid. However, after reading this blogpost, it seems that deep fakes are introducing more realities. Before, I did not even think about how people could start casually using deep fakes as a way to, well, fake things. With more technology such as Lyrebird becoming more and more accessible, it is easier than ever to manipulate not only one's own, but also the many surrounding people's realities on a larger scale than ever before. As long as technology is used, these programs will exist and continue to be used, so can we do anything about these magnifying issues without completely avoiding technology as a whole?

October 13, 2019 at 11:47 AM

Roschan Rao said…

I really liked your line about "physical and digital extensions of ourselves." I think you're right, that digital voices blur the line between what's ours and what's not. For example, would we consider Google Duplex to be a digital extension of ourselves? Does Stephen Hawking feel the same way about his voice as people with disabilities feel about their avatars on Second Life? Is The Rock's face on Miley Cyrus' music video an extension of The Rock? Of Miley Cyrus? Of the creator of the video? Of the software used to accomplish it? Maybe some combination? I think your post really raises interesting questions of ownership in this digital world, and maybe it's the uncertainty of ownership that makes us so uncomfortable with deepfakes and technologies like Google Duplex & Lyrebird.

October 13, 2019 at 12:39 PM

Toni Aguilar Rosenthal said…

Do you think there’s any positives to these seemingly very sinister technologies? Are there ways in which these technologies have potential to provide more equitable access across differing abilities or socioeconomic strata? For example, things such as Ashley Too (from Black Mirror). Though the story is fictitious, the premise is very real. Do you think that these technologies provide an opportunity for folks who are introverted or who struggle socially the opportunity to find in their idols, or in a humanoid other, a “friend”? Would that inherently be a bad thing?

October 14, 2019 at 4:56 PM

Dylan said…

As someone who is absolutely terrified of AI and deep fakes, I would like to ponder Toni's question: are there any positives? I honestly think any positives I could come up with, like Toni's example of friendship for social introverts, could easily become negatives if the AI is strong enough to develop its own cognition. What would we try to accomplish with an AI friend for introverts? A perfect human, easy to talk to, hypes you up, who you know is a robot but is natural enough to be human? I think in order to advance the technology enough to make the AI seem human, the AI would need to have the capability to mess up, be mean, be selfish, things real humans do. I think it boils down to a thought I had about life in the virtual world - if you run from your woes by changing your external circumstances your woes will usually find ways to follow you and pop up. So, even if an introvert made friends with a realistic AI, they might still face problems with other humans. I don't know - it's so nuanced!

October 16, 2019 at 1:19 PM

Hearing in Virtual Space: Student Forum

Search This Blog

Simulated Voice Technologies Make Deep Fakes a Whole 'Lot Scarier (Miller 2019)

Comments