Elevenlabs.io - AI Generate anyone's voice with a minute of audio saying how much they hate trannies

  • 🔧 At about Midnight EST I am going to completely fuck up the site trying to fix something.
Well, now that 11 paywalled their system, has anyone tried Tortoise? It's much slower, but the results are interesting.

Here's a clip I generated of Kevin Conroy:
 
I am tempted to go for the trial version, or even buy it, but in principle I don’t have faith in buying things from companies that ask directly for credit cards. It’s usually a bad idea as disputing any unwanted charges becomes a pain.
When they locked it down they seem to have changed something about voice cloning too. My old ones work fine but everything new I add, like, they're not even trying.
 
Well, now that 11 paywalled their system, has anyone tried Tortoise? It's much slower, but the results are interesting.
I have. I've been searching for a free TTS system for ages, I went through Festival and some of the other Linux offerings and they're so robotic and static prone that they're actually damaging to the ears. I browse the news via Newsboat which is a RSS reader for the command line. My goal has always been to just run a macro that pipes the article to a TTS program or server and to receive the mp3 or article back as audio. Particularly useful for stuff like Glenn Greenwald's substack which can be excessively verbose.

I have a system already functional that uses Mozilla's standalone readability plugin: https://github.com/mozilla/readability and pipes the article to gTTS: https://github.com/pndurette/gTTS. This provides me with a rather bland voice with a lot of inflection errors and spikes in volume because its cheating the system by using Google Translate's speaking function and joining all of the paragraphs together into one large sound file.

At the time I had some issues with readability throwing errors over some Google botnet shit embedded into news webpages so I used a clean article that didn't have any botnet built in, plus its very clean bare bones html, article came out very cleanly. I piped this to TortoiseTTS and let it rip.

It produced 15 mp3 and a combined.wav of them all. The process took about 2h 30min of rendering time using my 1060 6GB. The result is fantastic, though I have no idea whom the "mol" example voice is emulating.
The article is 52 lines, 634 characters. This isn't exactly a speedy proposition for even a desktop with a dedicated GPU. I can't imagine how the Laptop users would handle this.

I suppose I could start rendering a long article before I go to bed and listen to it the next day after 12hr of rendering, but that removes the spontaneity that I have regarding the stuff I read/listen to.
 
I have. I've been searching for a free TTS system for ages, I went through Festival and some of the other Linux offerings and they're so robotic and static prone that they're actually damaging to the ears. I browse the news via Newsboat which is a RSS reader for the command line. My goal has always been to just run a macro that pipes the article to a TTS program or server and to receive the mp3 or article back as audio. Particularly useful for stuff like Glenn Greenwald's substack which can be excessively verbose.

I have a system already functional that uses Mozilla's standalone readability plugin: https://github.com/mozilla/readability and pipes the article to gTTS: https://github.com/pndurette/gTTS. This provides me with a rather bland voice with a lot of inflection errors and spikes in volume because its cheating the system by using Google Translate's speaking function and joining all of the paragraphs together into one large sound file.

At the time I had some issues with readability throwing errors over some Google botnet shit embedded into news webpages so I used a clean article that didn't have any botnet built in, plus its very clean bare bones html, article came out very cleanly. I piped this to TortoiseTTS and let it rip.

It produced 15 mp3 and a combined.wav of them all. The process took about 2h 30min of rendering time using my 1060 6GB. The result is fantastic, though I have no idea whom the "mol" example voice is emulating.
View attachment 4633597The article is 52 lines, 634 characters. This isn't exactly a speedy proposition for even a desktop with a dedicated GPU. I can't imagine how the Laptop users would handle this.

I suppose I could start rendering a long article before I go to bed and listen to it the next day after 12hr of rendering, but that removes the spontaneity that I have regarding the stuff I read/listen to.
Did you try the voice cloning part? This is what I used: https://git.ecker.tech/mrq/ai-voice-cloning/wiki/Home
 
  • Like
Reactions: MrJokerRager
@anustart76
I just noticed there was a fork to Tortoise-tts called Tortoise-tts-fast I can attest that it is much faster than the base Tortotoise-tts, I used the ultra_fast setting and I don't have too much problem with the output.

Original Tortoise using fast and voice:mol , approx render time of 2h 30min
New: Approximately 10-15min render time, ultra_fast setting, voice: mol


Yes it has some errors in the inflection here and there but its entirely acceptable. The render time isn't obscenely bad, this is quite within striking range for my use case.
 
  • Like
Reactions: MrJokerRager
I have no interest in doing the voice cloning, I just want a reasonably good voice to read excessively long articles aloud. Getting some celebrity to shout nigger and other offensive shit is for teenagers.
@anustart76
I just noticed there was a fork to Tortoise-tts called Tortoise-tts-fast I can attest that it is much faster than the base Tortotoise-tts, I used the ultra_fast setting and I don't have too much problem with the output.

Original Tortoise using fast and voice:mol , approx render time of 2h 30min

New: Approximately 10-15min render time, ultra_fast setting, voice: mol
View attachment 4643185

Yes it has some errors in the inflection here and there but its entirely acceptable. The render time isn't obscenely bad, this is quite within striking range for my use case.
I'm not interested in getting celebrities to say any of that. I just wanted to see if I could make Kevin Conroy's voice live on.
I'm familiar with the -fast fork, I'm curious if it can be integrated into the mrq repo.
 
When they locked it down they seem to have changed something about voice cloning too. My old ones work fine but everything new I add, like, they're not even trying.
The effect of two slider settings they give you don’t make any sense to me. If I push a slider even slightly either way, I can get wildly different results. I cloned two male voices, one UK and one US (before and after they locked down accounts, I joined a day before they closed it off), both voices had a habit of turning into different generic US males, the US one with less samples also turned into a woman at random sometimes. I just can’t make sense of it. I’d guess that might be because the US one had high pitched noise from where I removed background vocals - but that’s a thought o mine that is entirely uninformed.
Their service is very simple to use but not transparent and that sort of thing always pisses me off.
 
Now this is interesting: I managed to take the training finetune file and the conditional latents file that the mrq repo generates, and plug them into the -fast fork with impressive (and impressively fast) results. The only problem was that the -fast fork has no instructions on how to actually put those models in so you can use them.
 
So I'm still having trouble getting tortoise to install.... The installer hangs up every time it tries to use a git clone command into the appdata temp folder for an install (I think it's only doing this on the "whisper" module) .
Further, if my network gets disrupted when doing the first time startup of the GUI (my problems are probably mostly shit internet) adamw and bitsandbytes fail to install, but assume they're successfully installed on subsequent runs.... what do I need to change to force a retry?

Note: my only other experience with these kinds of programs is the NovelAI webui, so I've done some minor troubleshooting before, by my knowledge of programing is a 10 year old 101 level Fortran class.
 
So I'm still having trouble getting tortoise to install.... The installer hangs up every time it tries to use a git clone command into the appdata temp folder for an install (I think it's only doing this on the "whisper" module) .
Further, if my network gets disrupted when doing the first time startup of the GUI (my problems are probably mostly shit internet) adamw and bitsandbytes fail to install, but assume they're successfully installed on subsequent runs.... what do I need to change to force a retry?

Note: my only other experience with these kinds of programs is the NovelAI webui, so I've done some minor troubleshooting before, by my knowledge of programing is a 10 year old 101 level Fortran class.
Which repo are you installing? In all likelihood, you're going to need to download and cache those files manually.
Also I recommend installing PyEnv to manage your Python versions.
 
  • Like
Reactions: MrJokerRager
Which repo are you installing? In all likelihood, you're going to need to download and cache those files manually.
Also I recommend installing PyEnv to manage your Python versions.
The one you're using (ai-voice-cloning) . I want to manually install the modules that won't download, but don't know exactly where to put them. I'll eventually figure it out by poking it or I'll be somewhere with good enough internet to download in one go.
I'll look into pyenv, but I wouldn't even know why that would help.

Edit: whisper cloned over fine. Now I need to get bitsandbytes and adamw to attempt a redownload.... Just need to find where they're at and remove the incomplete installs, I think.
 
Last edited:
  • Like
Reactions: MrJokerRager
Back