AI Singing/Music - General discussion for posting songs that have had their vocals changed by AI

  • 🐕 I am attempting to get the site runnning as fast as possible. If you are experiencing slow page load times, please report it.

Autistic Spergout

Survivor of the great Kiwi Farms DDoS #129037
kiwifarms.net
Joined
Jan 12, 2022
After reading the Elevenlabs.io thread people were posting songs that were ran through an AI to have Kanye West and a few other singers sing different songs
I'll add more later

Which made me think "How can I make my own?" It's actually very simple and fun once you get it all working.

To just jump straight into it you can use the Google Colab link here (Requires Google account) which has nearly everything someone would need to do get Kanye West AI singing.

But if you don't want to use Google Colabs and want to host locally:
Install https://github.com/34j/so-vits-svc-fork and other prerequisites. Instructions to get this working are on the Github. This has a little bit more extra than the Google Colab version.

Thanks to @Colon capital V for suggesting Ultimate Vocal Remover this tool is great but not perfect at separating the vocals from songs, it does occasionally leave some of the instrumentals which is a little annoying but also funny listening to the AI trying to guess how to vocalize these errors. It is highly recommended to use something like this

The hardest part was finding models to use. Hugging Face has a few (Biden, Trump and Obama | GLaDOS, Bob Odenkirk and a few more | Cartman, David Bowie and a few more | Kanye West (Mega link) ). I do not know how to train models or even really where to start to train them so I can't really help there.

The left side is the most important. This is how I have mine currently set up but ignore the Pitch slider that needs to be tweaked depending on the source audio but if you don't want to fuck around with that just tick "Auto predict F0" doesn't give the best result but it's good enough to get something that might be worth using. I have no idea what the other sliders do so don't ask me.

1681802129827.png

I recommend having "Auto play" off and just placing the output audio into Audacity(Or something similar) along with the instrumentals. If you use the instrumental output from Ultimate Vocal Remover it will be perfectly synced and ready to export. If you don't disable the auto play feature you will be forced to listen to the output without a way of stopping and the audio can get really distorted which can cause a painful high pitch.

This is what I have done so far. Some of them are really bad but it's something
Shoop Shoop Song - Cher (Not the best example)

The Gambler - Kenny Rogers

I think I'm going to Kill Myself - Elton John

Sultans of Swing - Dire Straits

SMB3 Mastubatory Madness - I don't know I got it off youtube

Piano Man - Billy Joel

Now You're a Man - Orgazmo

Pomfpomfpomf =3
America Fuck Yeah! - Team America World Police

The Gambler - Kenny Rogers

I Think I'm Going to Kill Myself - Elton John

Sultans of Swing - Dire Straits

Pomfpomfpomf =3
America Fuck Yeah - Team America World Police

Sultans of Swing - Dire Straits
I think I'm Going to Kill Myself - Elton John
The Gambler - Kenny Rogers

If there is something I missed and there are quite a few people having the same issues. I'll have to assist and then fix the OP.

Also just for the sake of sources label your shit properly so others can find the original source etc. etc. etc
 
Last edited:
This is fun to mess with. Not a perfect result, but it made me laugh;



Cartman model from the OP + the We Are Number One stemset that got released a couple years back when Stefan Karl died

I'm curious what this would do combined with a very regular, perfectly on-key voice via speech synthesis and Melodyne. Might try it later if I get time.
 
This is fun to mess with. Not a perfect result, but it made me laugh;

View attachment 5067723

Cartman model from the OP + the We Are Number One stemset that got released a couple years back when Stefan Karl died

I'm curious what this would do combined with a very regular, perfectly on-key voice via speech synthesis and Melodyne. Might try it later if I get time.
The speech synthesis is going to be a huge factor. Taking a voice and putting it in a different dialect will always sound weird since it's basing the speech habits off someone else. Sometimes it's less apparent like the Biden version of Sultans of Swing because it's close to his speaking range and accent.
 
This took me about 30 minutes to get to a decent standard:
Chris Chans - Captain's Log, Stardate January 15th, 2009
Joe Biden


Still has a lot of imperfections but it would be good enough to fool people

I spent a little over an hour on this one with no experience in Audacity
Chris wants his house off the Internet
Joe Biden

Donald Trump

These are god awful because the AI can't really handle the sudden and frequent pitch changes that Chris does. If it can then I'm retarded and disregard what I've said

And just for fun:
Kanye

Obama

Biden

Trump

I'm having too much fun with this and need to find more stupid things for them to sing
 
I keep trying to feed the fucking Collab thing the .pth dataset for Cartman but for some reason his model won't show up on the dropdown menu for which voice to synthesize. Granted, I went with a different Google Collab link since the one listed in the OP kept harping me to restart my runtime, along with other fucky things that I couldn't be bitched to figure out (Link here).

But regardless, I wanted someone to sing the "Don't Forget" track from Deltarune so Kanye it is.

Wanna try and get Jackson or someone that can hit those high notes relatively decently since this is p okay.

E: Got one with Jackson, seems to do a bit better with those long/high notes.
 
Last edited:
  • Thunk-Provoking
Reactions: Vecr
And just for fun:

I'm having too much fun with this and need to find more stupid things for them to sing
I was wondering if someone would start with Johnny Rebel and look at that, there it is.

Now someone needs to make one of these with some Vtuber's voice so I don't have to wait for GTA 6 to come out.
 
I found the video a while back and just changed the voice. I did post it a while ago but forgot to post it here

I should check out if there are new or improved models or if there are other changes
 
  • Winner
Reactions: Wooo and Vecr
Making these seemed like a total waste of time, but I laughed making them so I guess there's that

Barack Obama - Ah'm a Nigger Man


Chris Chan - My Ding-A-Ling


Eric Cartman - Suck A Hyena's Dick


TF2 Engineer - Joe's Garage
 
Some things I've found:

* There is another framework called Retrieval-based Voice Conversion (RVC). It has a web UI here: https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI I find it a bit easier to use and less janky than the so-vits-svc-fork UI which has a number of annoying bugs. RVC models will not work in so-vits-svc and vice versa. Download it from this link, on Windows run "go-web.bat" and after some load time it should open up the UI in a browser.

* There is a huge repo of voice models here: https://huggingface.co/QuickWick/Music-AI-Voices/tree/main Some of them are for RVC as mentioned above, some are for so-vits-svc. If a model is for RVC, it will say in the folder name (though I have seen some that say RVC but are actually for so-vits-svc. If it has a config.json file then it's for so-vits-svc, if not it's for RVC). Most of them are pop singers, rappers, anime voices etc. but there are some interesting ones like Chris Chan, Moonman, Microsoft Sam, characters from TF2, Spongebob, Simpsons, Family Guy, South Park etc.

* Some settings recommendations: crepe or harvest pitch prediction methods are the best but take longer, especially depending on your hardware. Depending on the model and the song, autopredict can give better results. For example with the Chris Chan model I prefer autopredict on because it makes it sound more like how Chris Chan might actually sing something, whereas when it's off the singing is unrealistically good.

* If you're getting too many other instruments in the extracted vocals from UVR5, or the modal output isn't getting the right phonetics, try a different extraction model. Sometimes demucs does better at some tracks than MDX-Net. As a sidenote I've noticed the extraction results are better on songs with straightforward and professional mixing and production. I tried to separate vocals from King Cobra's songs and it was a complete failure, with Cobra's guitar licks getting mixed in with the vocals constantly.

Now for some absolutely cursed audio, here is Yoko Ono singing "Gypsy" by Mercyful Fate
 
* There is another framework called Retrieval-based Voice Conversion (RVC). It has a web UI here: https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI I find it a bit easier to use and less janky than the so-vits-svc-fork UI which has a number of annoying bugs. RVC models will not work in so-vits-svc and vice versa. Download it from this link, on Windows run "go-web.bat" and after some load time it should open up the UI in a browser.
Here's a video tutorial on using RVC, plus a couple of sample ones.

 
I kept getting something akin to voice crack until I used the new rmvpe option instead of harvest or crepe. Will post back tomorrow with a neat result (if not, remind me and call me a faggot who didn't deliver).

Update: here it is, I'm quoting from another thread:
RVC really is pretty cool, I was able to train a pretty decent model in about 90 minutes.

I trained a model to reproduce the singing voice of Kevin Conroy/Batman (RIP). I always wanted to hear him sing Snake Eater, since he had a perfect theatrical singer type of voice that I always thought would suit a male version of the song perfectly. Sadly, he's gone so we'll never hear him sing it for real, but the AI came up with a surprisingly good result. The biggest weakness seems to be that it copies the original singer's accent almost exactly, or at least tries to, so the accent output doesn't match the new voice's source. It also doesn't like long silences and inserts weird noises, but those can be edited out easily.

I had to use the newer rmvpe option instead of harvest or crepe, because those kept causing some kind of artifacting that sounded like a voice crack.

Here's the solo version. I had to clean up the audio a little and add some effects to get it to sound the same way the original does.
View attachment 5211728

I also made a duet with Cynthia Harrell's original singing, which came out beautifully.
View attachment 5211729

For comparison, here's an actual track that Conroy recorded a few years ago of him singing "Am I Blue"
View attachment 5211730

And since I know you Jokers (pardon the pun) will ask for it, here's the output with the voice crack effect:
View attachment 5211732
 
Last edited:
So, I took RHCP's Otherside, lazily separated my side vocals, threw them through the thing with a Kanye model, spliced it back together in Audacity, and uhh, let's just say it does not like the parts with the chorus :story:
 
  • Thunk-Provoking
Reactions: Vecr
This actually makes her shitty apology more palettable.
 
Does anybody have clips of Ted Kaczynski speaking? I found an interview with him here and that's about it. I was thinking maybe other clips exist and it'd be possible to make him sing Virtual Insanity or some other song but I'm not sure if that interview alone is enough?
Also Ted sounds like a nerd
 
Back