Real-Time Voice Cloning

Spedracer

im gonna be chasin' after someone.
kiwifarms.net
Joined
Jan 2, 2019
Thoughts on the future of voice-cloning tools?
I played around with this for a couple hours. It's fun. It still doesn't sound quite right but it's certainly better than it was only a few years ago.
The general idea is to build a pretrained model off of a large set of voice clips, which takes care of the common characteristics associated with speech. All you have to do is supply 5 seconds of audio and it attempts to replicate that voice through text-to-speech.
There's a lot of issues, however. The provided trained model is based on audiobook narrations which doesn't sound like natural speech. "Sometimes" it sounds accurate, depending on the voice. Other times you'll get these long pauses of white noise and other artifacts. It's really finicky. I figure people with natural monotone voices would be easier to clone.
There's a lot of implications worth discussing. I know the internet already larped about the moralities of deep fakes or whatever. I like the idea of using it to reduce costs for voice acting but right now we're sitting in the uncanny valley of voice synthesis.
 
Might be useful to me at some point if it's further refined in the next year or two.
 
I have already given my thoughts on the matter
Ya know given the rate of tech advancement, its entirely probable that by the time the most crucial cast members start dying off, voice synthesizers will have gotten to the point where execs could just have an unpaid intern mumble lines into a mic and get pitch perfect Frank Azaria at his prime. Hell perhaps the execs are already hoping the OG cast dies/retires so they can roll out their 'new and improved' voice actors for a fraction of the cost.

I have a terrible feeling that the simpsons are going to outlive all of us...and that the last things we hear before we die will be the resurrected voice of Phil Hartman making a joke about twerking drag kids and Lisa's first period
 
  • Agree
Reactions: Floop and Spedracer
combine this with cgi people, deepfakes, and a bunch of other bullshit and we wont even need to hire actors for movies at this rate
Futures gonna be a weird of scary place when we get to a point where anyone can use a program to mimic a person and their speech pattern, don't even want to think about what powerful people will be using it for.
 
I've been really interested in this. This companies program seems awesome and doesn't have that robotic "ting" to it:

It's not free and seems to be selective in who they allow to use it, but they allow you to play around with some samples to sense what they can do.
 
  • Informative
Reactions: Spedracer
combine this with cgi people, deepfakes, and a bunch of other bullshit and we wont even need to hire actors for movies at this rate
It sounds like a novel concept now but just wait until movies become even more characterless and boring with reboots starring old dead celebrities or maybe middle aged or older ones happy to sell their likeness of themselves in their prime.
 
All of a sudden, that bit in Policenauts about cloning dead celebs doesn't sound so far fetched.
 
This is more likely to screw people already in the public eye rather than the ordinary guy, due to the requirement for source data.
Of course, fabricating a "leaked recording" is going to lead to some blowup soon. You can even claim bad sound quality due to supposed surreptitious recording...
 
Last edited:
  • Agree
Reactions: Recoil
Who knew, Jordan Peterson is all about that late 90's gangster rap
 
I can't into computers but would it be possible to create some program/AI/whatever that will examine whether or not its a deepfake?
Yes and no. That's kind of the idea behind adversarial networks. It's a byproduct of training something like a deepfake model.
The problem is there's no general solution and I don't think there will be. If you built the worlds best deepfake detection tool, someone could use that as their own discriminate network to build the world's best deepfake model. I wouldn't worry about it. Honestly, deepfakes are probably the least of our worries when China or the US will probably use machine learning for some fucked up shit.
 
  • Like
Reactions: Baguette Child
Back