Uncensored AI Chat Models - Getting your very own ChatGPT

  • 🐕 I am attempting to get the site runnning as fast as possible. If you are experiencing slow page load times, please report it.
Tried the front ends but guessing my computer is out of date (only modern piece is a 3060. The rest was specced around i3). Judging from the parts in probably better off just building a new pc rather then trying to update parts.

Are there any spec guides for running local llms on budget? Want to be able to mess around with it but not break my bank account.
 
I've looked all of this up and can never find a decent guide or you just find endless comments from people trying to figure it out or a guy saying "Took me months to figure it out but I got it working!" Without ever going into detail, of course, of how they got it to work.
Yeah, I don't think AMD GPUs have enough market share or ROCm support for people to start making dedicated guides. I had to dig through a lot of sources to figure out why Stable Diffusion wasn't working on my 7800xt, but I've got it working now. I haven't tried any LLMs yet because of that. I can give you some tips, but they will probably only apply to Linux and will probably work completely different on Windows.
 
Yeah, I don't think AMD GPUs have enough market share or ROCm support for people to start making dedicated guides. I had to dig through a lot of sources to figure out why Stable Diffusion wasn't working on my 7800xt, but I've got it working now. I haven't tried any LLMs yet because of that. I can give you some tips, but they will probably only apply to Linux and will probably work completely different on Windows.
For oobabooga on my 7900xtx it was literally: git clone, run, tell it I wanted AMD. It did all the dependencies(Linux, YMMV, HTH, HAND)

Edit: I guess I also already had the ROCm libraries and AMD drivers, but none of the python bits.
 
  • Like
Reactions: Vecr
For oobabooga on my 7900xtx it was literally: git clone, run, tell it I wanted AMD. It did all the dependencies(Linux, YMMV, HTH, HAND)

Edit: I guess I also already had the ROCm libraries and AMD drivers, but none of the python bits.
There's still no official ROCm support for the 7800xt, so my problem was getting it to work on my card. It would load, crap itself while making an image or when loading specific models, until I found some environment variables that make Stable Diffusion think it's running on a 7900xt. Now it runs perfectly. The problem is that none of the install guides let you know about it, and there aren't a lot of good 3rd party guides for Linux. ROCm has a wider range of supported cards on Windows and most of the guides are for Windows. AMD's AI software support has been awful, and I really hope with their new focus on AI they start getting more and more cards officially supported. Most of these problems are my fault for getting an unsupported card, but it's still frustrating.
 
Last edited:
How do I keep Dolphin-Mixtral chatting longer without the AI repeating itself? Or do I just need to go back, ask the AI to summarize, and start a new conversation with a "PREVIOUSLY on..."

One thing that was neat (and probably didn't help with the chat size) was adding this to character context in text-generation-ui:

When replying I will include my current status in the format [THOUGHTS:current inner dialog inside your mind,EMOTION:The emotion you are currently feeling,FEELINGS:Your current feelings,MOOD:Your current mood]
 
  • Thunk-Provoking
Reactions: Vecr
How do I keep Dolphin-Mixtral chatting longer without the AI repeating itself? Or do I just need to go back, ask the AI to summarize, and start a new conversation with a "PREVIOUSLY on..."

One thing that was neat (and probably didn't help with the chat size) was adding this to character context in text-generation-ui:
Mixtral is a highly confident model so the temperature can be turned up pretty high.
Regenerate the same thing a few times, if it is almost word for word identical your temperature is probably too low.

It will also act up a lot worse than other models if the instruction template is wrong.
 
Tech wizards! I'm running a potato, but recently I got a windfall, so I'm updating my pc. My main goal is normal gaming, but really I want to buy a premade PC and have it be "GREAT" for uncensored LLM and image creation. Is VRAM the most important? I'm seeing a lot of RTX 4080 or 4070 Ti. An I7 core and shiiiet. And although I read, I don't quite understand the difference between a graphics card and a GPU or where I should look for it.

But is there anything I should not miss or is really important for this sort of work? Any recs? (I know I could ask GPT4 this shit, but I trust you guys more, with it being gimped and all.)

As a thank you in advance, here's an A.I. of Bossman biting the FELT

FELTED.jpg
 
Tech wizards! I'm running a potato, but recently I got a windfall, so I'm updating my pc. My main goal is normal gaming, but really I want to buy a premade PC and have it be "GREAT" for uncensored LLM and image creation. Is VRAM the most important? I'm seeing a lot of RTX 4080 or 4070 Ti. An I7 core and shiiiet. And although I read, I don't quite understand the difference between a graphics card and a GPU or where I should look for it.

But is there anything I should not miss or is really important for this sort of work? Any recs? (I know I could ask GPT4 this shit, but I trust you guys more, with it being gimped and all.)

As a thank you in advance, here's an A.I. of Bossman biting the FELT

View attachment 5690567
Llms are going to take way more vram than image gen models that are available.

Mixtral is way better than anything else on the market and can run quantized in 24gb okay. 48gb vram using two 3090s is the most friendly approach to bigger models avoiding using weird Nvidia server cards.
 
I've looked all of this up and can never find a decent guide or you just find endless comments from people trying to figure it out or a guy saying "Took me months to figure it out but I got it working!" Without ever going into detail, of course, of how they got it to work.
LM Studio is by far the easiest way to get up and running if you’re not comfortable with CLI.
 
  • Like
Reactions: BrunoMattei
As I have my potato setup, I'm currently using GPT4all, the Hermes and Wizard 1.2 languages. They are okay/mediocre, and they seem to be quite prone to repetition. It's weird. When they work, they work, but a seemingly simple task from another angle makes them shit bricks and explode. Mistral instruct is actually quite decent for the small size.
 
LM Studio is by far the easiest way to get up and running if you’re not comfortable with CLI.
Just be aware that it is not free (as in freedom) software. Yeah, their website says "we don't collect your data," but we all know how that goes.
I can give you some tips, but they will probably only apply to Linux and will probably work completely different on Windows.
ROCm and OpenCL on AMD GPUs currently works only on Linux, for whatever reason. If you have an AMD GPU on Windows, you either have to set up Linux on your computer, or make do with only the CPU. Strangely, Windows users get shafted in this case, when it tends to be the other way around.
Every single one of these guides likes to assume that no one has an AMD CPU.
All CPUs work, as far as I know. I have a Ryzen 7 2700X and I never had issues running LLMs on CPU, but of course it is very slow compared to using a GPU. If CPU does not work then it has to be some kind of issue with OpenBLAS on your system, which I believe is the go-to solution for LLM inference on CPUs. That should be your troubleshooting clue.
 
Tech wizards! I'm running a potato, but recently I got a windfall, so I'm updating my pc. My main goal is normal gaming, but really I want to buy a premade PC and have it be "GREAT" for uncensored LLM and image creation. Is VRAM the most important? I'm seeing a lot of RTX 4080 or 4070 Ti. An I7 core and shiiiet. And although I read, I don't quite understand the difference between a graphics card and a GPU or where I should look for it.

But is there anything I should not miss or is really important for this sort of work? Any recs? (I know I could ask GPT4 this shit, but I trust you guys more, with it being gimped and all.)

As a thank you in advance, here's an A.I. of Bossman biting the FELT

View attachment 5690567
Build the thing yourself. If you are in NA just go to a microcenter pick up one of the motherboard/RAM/CPU bundles and an associate should kit you out for the rest. You are looking at 1.5k with the COVID tax.

VRAM is king and fast RAM is queen. A used 3090 ti with 24GB or a 4090 gives you more capabilities for LLMs. Fast DDR5 RAM with .GGUF can make up for short falls.

Edit: Before you pull the trigger on an expensive system for ML. I would recommend you checkout if LLMs are for you. Go to RunPod.io and rent out a 4090 instance for an hour or two.
It costs less than $1 an hour. Download a 60B or 13B model and just chill with that for a little. The only time I really had fun with an LLM model was with Goliath and that took 96GB VRAM to run.
 
Last edited:
Build the thing yourself. If you are in NA just go to a microcenter pick up one of the motherboard/RAM/CPU bundles and an associate should kit you out for the rest. You are looking at 1.5k with the COVID tax.

VRAM is king and fast RAM is queen. A used 3090 ti with 24GB or a 4090 gives you more capabilities for LLMs. Fast DDR5 RAM with .GGUF can make up for short falls.

Edit: Before you pull the trigger on an expensive system for ML. I would recommend you checkout if LLMs are for you. Go to RunPod.io and rent out a 4090 instance for an hour or two.
It costs less than $1 an hour. Download a 60B or 13B model and just chill with that for a little. The only time I really had fun with an LLM model was with Goliath and that took 96GB VRAM to run.
I would also recommend openrouter for trying out ML. They have a bunch of models, including Goliath and GPT4/Claude2. It also helps that they are pay by the token so a few dollars can last awhile if you are using weaker models.
 
I would also recommend openrouter for trying out ML. They have a bunch of models, including Goliath and GPT4/Claude2. It also helps that they are pay by the token so a few dollars can last awhile if you are using weaker models.
I cannot stress how important it is that he manually gets it running on a system which has constrained VRAM/memory. Specifically in his price range. He should rent out a 4090 instance on RunPod. Then see if any of the models he wants to run can be ran, and if their performance on that platform is acceptable for his use case.
I think that 70B is the bare minimum for serious and professional work with a long context. I also think that a 13B model is sufficient for basic chatting for shits and giggles. One takes 40-50GB of VRAM to run real time the other under 24GB. That doesn't prevent you from running a 70B with a shitload of system memory augmenting your graphics card, but you should know it will be unacceptably slow if you are expecting realtime.
 
Nvidia has launched "Chat with RTX," a demo app that allows you to create a personalized AI chatbot using your own content like documents, notes, videos, etc. It utilizes retrieval-augmented generation (RAG), TensorRT-LLM, and RTX acceleration for quick and contextually relevant responses. The app runs locally on Windows RTX PCs or workstations, ensuring fast and secure results. It's available for free download, requiring a GeForce RTX 30 series GPU with 8 GB VRAM, 16 GB RAM, and Windows 11.
 
Back