Elevenlabs.io - AI Generate anyone's voice with a minute of audio saying how much they hate trannies

  • 🐕 I am attempting to get the site runnning as fast as possible. If you are experiencing slow page load times, please report it.
Man, Twilight Sparkle has fallen on some hard fucking times, bros.



_ec424da3-9489-4c9b-a07b-c420c3c84949.jpg_f06047da-c577-46fc-a625-874e8f477708.jpg_a8e9e021-bdad-44d9-be2c-d0d8f8b4d599.jpg

_cd9777ad-39bd-46b1-b46d-a549cb3882b8.jpg_a5616f73-692e-4f14-aee7-120458098ff4.jpg_bb64f132-3cd0-494f-afe2-2bed5e43c752.jpg
 
>803 pages
Right, see you in a year or so.

Ok, but seriously, it took me 6 hours to generate those 5 minutes of audio (and then re-generate certain sentences that came out wrong the first time and then splice all the sentences together so that the whole thing sounds at least mostly natural). Taking on this entire doorstopper would take me about this long, and I just don't think I'm ready for such a massive undertaking.
 
I came across this just now:
Has anyone ever tried this? This one's fairly new, so I can't find much information about it. How does it compare to other currently-existing AI-powered TTS applications (especially to TorToiSe)? Because I do kinda want to do the entire Matthew Harris manifesto, but it will take me month to get there at a pace TorToiSe is working.
 
There is this new open source TTS called OpenVoice that's been floating around, although the actual output quality sounds like your average Zoom call
 
  • Informative
Reactions: Cowboy Kim and Vecr
There is this new open source TTS called OpenVoice that's been floating around, although the actual output quality sounds like your average Zoom call
@IamnottheNSA would this be good enough for the transferring the style of the Francis E. Dec narrator to the Matthew C. Harris manifesto, or is it too slow/inaccurate? I've cloned the repo at https://github.com/myshell-ai/OpenVoice and I've downloaded the checkpoints in case anything happens to them.
 
Last edited:
@IamnottheNSA would this be good enough for the transferring the style of the Francis E. Dec narrator to the Matthew C. Harris manifesto, or is it too slow/inaccurate? I've cloned the repo at https://github.com/myshell-ai/OpenVoice and I've downloaded the checkpoints in case anything happens to them.
I can't see where can I train/finetune my own model. And, from my experience, using default speaker models does not produce good results, especially with Dec narrator's voice.
Can I reuse the model I finetuned for TorToiSe?
 
  • Thunk-Provoking
Reactions: Cowboy Kim
Can I reuse the model I finetuned for TorToiSe?
Probably not, you do need a "base speaker" TTS model however, so if you're having problems with getting any model at all (no matter how it sounds) to speak the words this won't help. To use this you need to put in `reference_speaker.mp3` as the Dec narrator's voice, then hook up the base speaker to say the words. If your problem is the base speaker, if you can find *any* base speaker that works well enough I'm willing to hook it up to this program.
 
Probably not, you do need a "base speaker" TTS model however, so if you're having problems with getting any model at all (no matter how it sounds) to speak the words this won't help. To use this you need to put in `reference_speaker.mp3` as the Dec narrator's voice, then hook up the base speaker to say the words. If your problem is the base speaker, if you can find *any* base speaker that works well enough I'm willing to hook it up to this program.
Well shit. Tried the local version and it just errors out. Tried the colab version and it sounds nothing like Boyd Britton and his Dec narrations. Sounds like a whinier, robotic Asston, without Tardski hollering in the back.

By the way, this was the reference sound clip:


To be fair, I do kinda want to do the manifesto, so I started doing it with TorToiSe anyway, no matter how long it takes. I am already halfway through the part 1 of the manifesto. That's 3 hours of audio.
This thing is gonna be BIG.
 
  • Informative
Reactions: Cowboy Kim and Vecr
Sounds like a whinier, robotic Asston, without Tardski hollering in the back.
You can set the source mode to "angry" and play with the speed and put in a longer style sample clip (recommended), but it might not be worth it. If you can find a base TTS model that gets the basic speed and tone right, and requires less do-overs than TorToiSe it could speed you up, but if not there's no point.
 
You can set the source mode to "angry" and play with the speed and put in a longer style sample clip (recommended), but it might not be worth it.
Yeah, I did exactly that, and the result was an angry, whiny, slightly less robotic Asston. Still not what I need.
Oh well, TorToiSe it is then. I just got a new video card too, so that's going to speed things up a bunch.
 
  • Like
Reactions: Cowboy Kim
I just got a new video card too, so that's going to speed things up a bunch.
If only it was possible to run actual TTS on two different processes, so it generated outputs simultaneously in parallel on both GPUs (or maybe pool VRAM from both GPUs, however that's gonna work). Because right now I have to make a second computer just so I can use that second card for TTS generation too.

Regardless, I am currently re-finetuning my TorToiSe model - about 20% of the time it pronounces NIGGER as "Niger". This is going to be a problem going forward, when you consider the following:
NIGGER.png

Also it doesn't seem to know the word "kike".
And when you're training/finetuning a model, having two GPUs on one system does help a lot.
 
Back