Uncensored AI Chat Models - Getting your very own ChatGPT

Jimjamflimflam · Feb 1, 2024

Tried the front ends but guessing my computer is out of date (only modern piece is a 3060. The rest was specced around i3). Judging from the parts in probably better off just building a new pc rather then trying to update parts.

Are there any spec guides for running local llms on budget? Want to be able to mess around with it but not break my bank account.

Stalphos Johnson · Feb 1, 2024

BrunoMattei said:
I've looked all of this up and can never find a decent guide or you just find endless comments from people trying to figure it out or a guy saying "Took me months to figure it out but I got it working!" Without ever going into detail, of course, of how they got it to work.

Yeah, I don't think AMD GPUs have enough market share or ROCm support for people to start making dedicated guides. I had to dig through a lot of sources to figure out why Stable Diffusion wasn't working on my 7800xt, but I've got it working now. I haven't tried any LLMs yet because of that. I can give you some tips, but they will probably only apply to Linux and will probably work completely different on Windows.

DavidS877 · Feb 1, 2024

Stalphos Johnson said:
Yeah, I don't think AMD GPUs have enough market share or ROCm support for people to start making dedicated guides. I had to dig through a lot of sources to figure out why Stable Diffusion wasn't working on my 7800xt, but I've got it working now. I haven't tried any LLMs yet because of that. I can give you some tips, but they will probably only apply to Linux and will probably work completely different on Windows.

For oobabooga on my 7900xtx it was literally: git clone, run, tell it I wanted AMD. It did all the dependencies(Linux, YMMV, HTH, HAND)

Edit: I guess I also already had the ROCm libraries and AMD drivers, but none of the python bits.

Stalphos Johnson · Feb 1, 2024

davids877 said:
For oobabooga on my 7900xtx it was literally: git clone, run, tell it I wanted AMD. It did all the dependencies(Linux, YMMV, HTH, HAND)

Edit: I guess I also already had the ROCm libraries and AMD drivers, but none of the python bits.

There's still no official ROCm support for the 7800xt, so my problem was getting it to work on my card. It would load, crap itself while making an image or when loading specific models, until I found some environment variables that make Stable Diffusion think it's running on a 7900xt. Now it runs perfectly. The problem is that none of the install guides let you know about it, and there aren't a lot of good 3rd party guides for Linux. ROCm has a wider range of supported cards on Windows and most of the guides are for Windows. AMD's AI software support has been awful, and I really hope with their new focus on AI they start getting more and more cards officially supported. Most of these problems are my fault for getting an unsupported card, but it's still frustrating.

anustart76 · Feb 1, 2024

GunCar Gary said:
slow as fuck because you don't have a proper video card

I have a 3090Ti and it's still slow as fuck

Peaches the Wonder Cat · Feb 3, 2024

How do I keep Dolphin-Mixtral chatting longer without the AI repeating itself? Or do I just need to go back, ask the AI to summarize, and start a new conversation with a "PREVIOUSLY on..."

One thing that was neat (and probably didn't help with the chat size) was adding this to character context in text-generation-ui:

When replying I will include my current status in the format [THOUGHTSurrent inner dialog inside your mind,EMOTION:The emotion you are currently feeling,FEELINGS:Your current feelings,MOOD:Your current mood]

Jelli · Feb 3, 2024

Peaches the Wonder Cat said:
How do I keep Dolphin-Mixtral chatting longer without the AI repeating itself? Or do I just need to go back, ask the AI to summarize, and start a new conversation with a "PREVIOUSLY on..."

One thing that was neat (and probably didn't help with the chat size) was adding this to character context in text-generation-ui:

Mixtral is a highly confident model so the temperature can be turned up pretty high.
Regenerate the same thing a few times, if it is almost word for word identical your temperature is probably too low.

It will also act up a lot worse than other models if the instruction template is wrong.

Justtocheck · Feb 3, 2024

Tech wizards! I'm running a potato, but recently I got a windfall, so I'm updating my pc. My main goal is normal gaming, but really I want to buy a premade PC and have it be "GREAT" for uncensored LLM and image creation. Is VRAM the most important? I'm seeing a lot of RTX 4080 or 4070 Ti. An I7 core and shiiiet. And although I read, I don't quite understand the difference between a graphics card and a GPU or where I should look for it.

But is there anything I should not miss or is really important for this sort of work? Any recs? (I know I could ask GPT4 this shit, but I trust you guys more, with it being gimped and all.)

As a thank you in advance, here's an A.I. of Bossman biting the FELT

Jelli · Feb 3, 2024

Justtocheck said:
Tech wizards! I'm running a potato, but recently I got a windfall, so I'm updating my pc. My main goal is normal gaming, but really I want to buy a premade PC and have it be "GREAT" for uncensored LLM and image creation. Is VRAM the most important? I'm seeing a lot of RTX 4080 or 4070 Ti. An I7 core and shiiiet. And although I read, I don't quite understand the difference between a graphics card and a GPU or where I should look for it.

But is there anything I should not miss or is really important for this sort of work? Any recs? (I know I could ask GPT4 this shit, but I trust you guys more, with it being gimped and all.)

As a thank you in advance, here's an A.I. of Bossman biting the FELT

View attachment 5690567

Llms are going to take way more vram than image gen models that are available.

Mixtral is way better than anything else on the market and can run quantized in 24gb okay. 48gb vram using two 3090s is the most friendly approach to bigger models avoiding using weird Nvidia server cards.

Roland TB-303 · Feb 4, 2024

BrunoMattei said:
I've looked all of this up and can never find a decent guide or you just find endless comments from people trying to figure it out or a guy saying "Took me months to figure it out but I got it working!" Without ever going into detail, of course, of how they got it to work.

LM Studio is by far the easiest way to get up and running if you’re not comfortable with CLI.

BrunoMattei · Feb 4, 2024

Homofascism said:
LM Studio is by far the easiest way to get up and running if you’re not comfortable with CLI.

I discovered that later and yeah, it actually works without issue.

Justtocheck · Feb 7, 2024

As I have my potato setup, I'm currently using GPT4all, the Hermes and Wizard 1.2 languages. They are okay/mediocre, and they seem to be quite prone to repetition. It's weird. When they work, they work, but a seemingly simple task from another angle makes them shit bricks and explode. Mistral instruct is actually quite decent for the small size.

AbyssGazer · Feb 8, 2024

Has gab.ai been mentioned yet? It rocks.

ShooterOfCum · Feb 10, 2024

Homofascism said:
LM Studio is by far the easiest way to get up and running if you’re not comfortable with CLI.

Just be aware that it is not free (as in freedom) software. Yeah, their website says "we don't collect your data," but we all know how that goes.

Stalphos Johnson said:
I can give you some tips, but they will probably only apply to Linux and will probably work completely different on Windows.

ROCm and OpenCL on AMD GPUs currently works only on Linux, for whatever reason. If you have an AMD GPU on Windows, you either have to set up Linux on your computer, or make do with only the CPU. Strangely, Windows users get shafted in this case, when it tends to be the other way around.

BrunoMattei said:
Every single one of these guides likes to assume that no one has an AMD CPU.

All CPUs work, as far as I know. I have a Ryzen 7 2700X and I never had issues running LLMs on CPU, but of course it is very slow compared to using a GPU. If CPU does not work then it has to be some kind of issue with OpenBLAS on your system, which I believe is the go-to solution for LLM inference on CPUs. That should be your troubleshooting clue.

ChillMaster · Feb 12, 2024

Vecr said:
You literally download the model. https://huggingface.co/TheBloke/Mix...lob/main/mixtral-8x7b-instruct-v0.1.Q8_0.gguf . Or are you talking about a guide for buying RAM and installing the software to run it?

Is this one supposed to be uncensored though, cuz it's being all weird about ethics and it's programming and won't cooperate.

macrodegenerate · Feb 12, 2024

Justtocheck said:
Tech wizards! I'm running a potato, but recently I got a windfall, so I'm updating my pc. My main goal is normal gaming, but really I want to buy a premade PC and have it be "GREAT" for uncensored LLM and image creation. Is VRAM the most important? I'm seeing a lot of RTX 4080 or 4070 Ti. An I7 core and shiiiet. And although I read, I don't quite understand the difference between a graphics card and a GPU or where I should look for it.

But is there anything I should not miss or is really important for this sort of work? Any recs? (I know I could ask GPT4 this shit, but I trust you guys more, with it being gimped and all.)

As a thank you in advance, here's an A.I. of Bossman biting the FELT

View attachment 5690567

Build the thing yourself. If you are in NA just go to a microcenter pick up one of the motherboard/RAM/CPU bundles and an associate should kit you out for the rest. You are looking at 1.5k with the COVID tax.

VRAM is king and fast RAM is queen. A used 3090 ti with 24GB or a 4090 gives you more capabilities for LLMs. Fast DDR5 RAM with .GGUF can make up for short falls.

Edit: Before you pull the trigger on an expensive system for ML. I would recommend you checkout if LLMs are for you. Go to RunPod.io and rent out a 4090 instance for an hour or two.
It costs less than $1 an hour. Download a 60B or 13B model and just chill with that for a little. The only time I really had fun with an LLM model was with Goliath and that took 96GB VRAM to run.

FILTH Tourist · Feb 14, 2024

macrodegenerate said:
Build the thing yourself. If you are in NA just go to a microcenter pick up one of the motherboard/RAM/CPU bundles and an associate should kit you out for the rest. You are looking at 1.5k with the COVID tax.

VRAM is king and fast RAM is queen. A used 3090 ti with 24GB or a 4090 gives you more capabilities for LLMs. Fast DDR5 RAM with .GGUF can make up for short falls.

Edit: Before you pull the trigger on an expensive system for ML. I would recommend you checkout if LLMs are for you. Go to RunPod.io and rent out a 4090 instance for an hour or two.
It costs less than $1 an hour. Download a 60B or 13B model and just chill with that for a little. The only time I really had fun with an LLM model was with Goliath and that took 96GB VRAM to run.

I would also recommend openrouter for trying out ML. They have a bunch of models, including Goliath and GPT4/Claude2. It also helps that they are pay by the token so a few dollars can last awhile if you are using weaker models.

macrodegenerate · Feb 14, 2024

FILTH Tourist said:
I would also recommend openrouter for trying out ML. They have a bunch of models, including Goliath and GPT4/Claude2. It also helps that they are pay by the token so a few dollars can last awhile if you are using weaker models.

I cannot stress how important it is that he manually gets it running on a system which has constrained VRAM/memory. Specifically in his price range. He should rent out a 4090 instance on RunPod. Then see if any of the models he wants to run can be ran, and if their performance on that platform is acceptable for his use case.
I think that 70B is the bare minimum for serious and professional work with a long context. I also think that a 13B model is sufficient for basic chatting for shits and giggles. One takes 40-50GB of VRAM to run real time the other under 24GB. That doesn't prevent you from running a 70B with a shitload of system memory augmenting your graphics card, but you should know it will be unacceptably slow if you are expecting realtime.

DeadwastePrime · Feb 14, 2024

anustart76 said:
I have a 3090Ti and it's still slow as fuck

exactly. you need retarded ass 12,000 dollar server grade gpu for this shit

Roland TB-303 · Feb 15, 2024

Nvidia has launched "Chat with RTX," a demo app that allows you to create a personalized AI chatbot using your own content like documents, notes, videos, etc. It utilizes retrieval-augmented generation (RAG), TensorRT-LLM, and RTX acceleration for quick and contextually relevant responses. The app runs locally on Windows RTX PCs or workstations, ensuring fast and secure results. It's available for free download, requiring a GeForce RTX 30 series GPU with 8 GB VRAM, 16 GB RAM, and Windows 11.

Uncensored AI Chat Models - Getting your very own ChatGPT

Jimjamflimflam

Stalphos Johnson

Very spooky

DavidS877

Giant Meteor Goes to Washington

Stalphos Johnson

Very spooky

anustart76

I can't believe I shaved my dick for this.

Peaches the Wonder Cat

Jelli

Justtocheck

Galen Legrand Huss in kf

Jelli

Roland TB-303

Acid Generator

BrunoMattei

No I am not the Cinema Snob

Justtocheck

Galen Legrand Huss in kf

AbyssGazer

Point of no unseeing 👁

ShooterOfCum

I am shooting cum!!!

ChillMaster

macrodegenerate

Generative AI was a mistake

FILTH Tourist

Failed in Life? Try Hell.

macrodegenerate

Generative AI was a mistake

DeadwastePrime

pronouns in bio

Roland TB-303

Acid Generator