Anyone know the best free site or software for A.I. voice generation reading text to speech? Every free site I've tried has obnoxious stipulations or cuck you by making it so you can only pick from X number of voice and record x number of times before needing to pay.
I guess that's just the direction everything is going in: you just don't own the software anymore.
Your best bet is just doing things locally. You don't need an RTX GPU to run things through RVC Applio and right now it's as easy to install as downloading a prepackaged preassembled 5GB folder. It has built in text to speech with literal hundreds of voices to use. As for voice models, you can pick and choose from ones people posted here. Use it to your heart's content.
It's genuinely a miracle this thing has been free to use for so long. Different developers worked on this thing since I think June 2023 and it's had nothing but steady improvements ever since. Feels like the Stable Diffusion of voice cloning. I hope I don't jinx it
Your best bet is just doing things locally. You don't need an RTX GPU to run things through RVC Applio and right now it's as easy to install as downloading a prepackaged preassembled 5GB folder. It has built in text to speech with literal hundreds of voices to use. As for voice models, you can pick and choose from ones people posted here. Use it to your heart's content.
It's genuinely a miracle this thing has been free to use for so long. Different developers worked on this thing since I think June 2023 and it's had nothing but steady improvements ever since. Feels like the Stable Diffusion of voice cloning. I hope I don't jinx it
I've tried the online version and I cannot get it to say any text that I want it to say or even select any other voice. So, either I am missing what may be the most obvious thing in the world or it's simply not as user friendly as nerds always claim.
What exactly do I do to load a voice, have it say what I want it to fucking say and that's all I need with some pitch-shifting and basic bitch features. That's it. I don't need to jerk off to Hasbin Hotel characters talking to me or any other weird shit. I need something simple and easy to understand. Literally, all I need.
Every time I've tried to use A.I. anything it's always a pain in the ass, you have to play with it, or it simply does not work because of lazy programmers or autists that have zero idea how to make anything that's user friendly.
It's true that you sort of just have to know what to do. The tech progresses so quickly any and all tutorials on YT are outdated by now. Applio 3.0 came out this month while anything on YT is at best from 3 months ago.
I checked the online version and upon pressing Refresh some new voice models do pop up when you click on the Voice Model tab. It's a bunch of weird shit that sounds like ass, I tried to import a nicer model but it wouldn't get added to the list. I suggest you instead just use the local version, which works fine.
Try this guide, I hope it helps!
Applio usage 101
0. Download the precompiled version of the software (3.0.4, 6,1 gigs as of February 2024) and extract the zip, it's fine if it's not on your C: partition, mine is on D: and it works okay.
1. Click run-applio.bat, watch a command console appear, then a new Applio tab open in your browser.
2. Pick and download a model from that model repository. There's more of them out there, but it's the easiest one to navigate in I think. You can't do any work without voice models, it's what Applio is for.
a. the higher epochs the better (anything below 300 sounds ass, 600 is good, some people go up to 1000)
b. filter the results by RMVPE, all those different names are settings used in training and RMVPE has the best sounding results. Mangio, Crepe and Harvest are all past training settings and are pretty much outdated. If nothing pops up, then maybe try RVC or KITS.AI
3. Go to ApplioV3.0.5, logs, extract the zip here. Inside should be two files - an .index file format and a .pth file format. Paste them into appropriate folders following the image:
4. Go back to Applio, click Refresh. The voice model and index file should now appear in the voice model selection. Pick them both, even if it looks like the index file loads in automatically, it really doesn't. Pick it manually from the list too.
5. You can adjust the Pitch of the speaker in Advanced Settings, it's one of the first sliders. The rest is technical stuff that doesn't matter for us.
You can experiment with it as much as you need, going past -12 and +12 gives very exaggerated results. If you're doing stuff like a female voice model singing a male vocalist's song, you get best results by sliding the pitch up to 12+ and vice versa.
It's a bunch of weird shit that sounds like ass, I tried to import a nicer model but it wouldn't get added to the list. I suggest you instead just use the local version, which works fine.
Forgot to mention that, sorry. Older versions before 3.0 had ways to select AMD compatibility iirc, but even if that worked you'd come across more roadblocks. You just have to get lucky with what parts you bought for your computer I guess.
I've only been able to listen to about 2 hours and 40 minutes of it so far, and unfortunately most of it was done in the background while I worked on other things. The spot checks I did sounded good, and I can't remember hearing any silent parts, but at my general level of attention I won't be able to detect skips in the sequence. Detecting repeats is pretty much impossible without reading the manifesto at the same time, due to how much repetition is there already. Edit: all right, I'm done. Very good job. It's absolutely the best AI generated audio I've heard except in <5 minute clips, not that 7 hour book chapters are something I'm familiar with.
This is awesome! It is possible for you to do whole manifesto? I had idea to do the whole thing with Moonman voice but there is no AT&T Mike TTS for free.
The spot checks I did sounded good, and I can't remember hearing any silent parts, but at my general level of attention I won't be able to detect skips in the sequence.
I made sure there weren't any anomalies in the final result.
When you're generating shit with TorToiSe, two things to look out for:
1. It generates speech in roughly 15 second chunks and there is anywhere between half a second to a whole second of silence between generated chunks in the final combined output.
2. There is garbage at the very end of almost any generated chunk, such as
1: noticeable "artifacts,"
2: repetitions, or garbled sound following sentences.
3: Sometimes some sentences were completed omited from synthesis.
This is a well-known issue, and AFAIK there is no fix for it. And while sometimes it works out in my favor (some of these garbled outputs sound really psychotic and emotional, I keep them in when that happens - check the attached audio for a good example), roughly 10% of all the output had to be re-generated and edited in (mostly because it insists on pronouncing NIGGER as "niger" when it loses clarity near the end of the chunk).
So I had to go through the entire thing and fix it up, which takes a long time.
But hey, it's free. And regarding voice quality, it's still the best option out there. Outside of 11.ai, none of the other models even come close.
This is awesome! It is possible for you to do whole manifesto? I had idea to do the whole thing with Moonman voice but there is no AT&T Mike TTS for free.
I intend to. All I've seen other people do so far is try Book 1 and then give up right after, so this will be the first full audiobook version of the Matt Harris manifesto. Or, at the very least, the first AI-generated audiobook version of the Matt Harris manifesto.
Right now I'm about a quarter of the way through Book 2. Here's a little sample: