- Joined
- May 17, 2024
Follow along with the video below to see how to install our site as a web app on your home screen.
Note: This feature may not be available in some browsers.
Give it a week or two and someone will train a Pepe LoRA. I have been playing around a bit with training LoRAs for Flux, and it's actually less bad than I was expecting. It seems like you can quantise to int8 and train LoRAs on a 3090/4090 with minimal quality loss. I'm getting around 3 it/sec training on a 4090, so it takes an hour or two, but it's definitely doable.Flux is like the opposite of whatever Bing is called now, it can do real world people but not fictional ones.
View attachment 6319293
edit: I take it back, this is "perfect"
View attachment 6319298
Default Flux still thinks that Pepe is just a synonym for frog. This is the thing thats gonna take all of our jobs?
It's like outsourcing, only a retarded out-of-touch executive that doesn't do actual work thinks it can replace actual Americans.Default Flux still thinks that Pepe is just a synonym for frog. This is the thing thats gonna take all of our jobs?
Congratulations! The good news is once you've got it working, you can usually just re-do it quickly if you ever need to. The most complex thing I have to do with my set-up now is occasional manual ROCm updates and that's only because I had the temerity to get an AMD card.After spending two hours in the missing dll, version downgrade and dependency hell known as Python, I finally got ComfyUI and Flux working.
Here's my thread tax:
View attachment 6328402
I tried to wrangle rocm for hours to get flash attention installed for my 6800xt on mint 21.1 but eventually gave up. I even tried in docker and still the build fails with an http error 404 or something to that effect, and that was after 4 hours of waiting for docker pulls, building, reinstalling pytorch+rocm with pip. I also tried building llama.cpp for rocm and it just uses the CPU for inference inexplicably.The most complex thing I have to do with my set-up now is occasional manual ROCm updates and that's only because I had the temerity to get an AMD card.
Ok I wrote all of this out and it's so long and autistic I'm just going to spoil it. It's a few insights into the architecture of Flux and why I think this style of conditioning works.An interesting thing to attempt is to let an AI rewrite the prompt. I discounted automatic prompting with SD largely because in my experimenting, it simply did not lead to good results if you didn't feed the AI all the right keywords, and at that point, you might as well write the prompt yourself. It seems to work well with Flux though. If you consider that the images were probably categorized by AI in a conversational way as theorized in this thread earlier, it makes sense that another AI would find the "right language" (perhaps GPTisms?) to get exactly what you asked for.
Prompting Flux is very different from prompting SD and all the other models that came out (and perhaps MJ and Dall-E, never used those) and for optimal results instead of looking for the right keywords (which often, simply do not exist), it makes much more sense to just describe what you want. I know this has already been said on this very page, just for the sake of completeness.
"Betker et al. (2023) demonstrated that synthetically generated captions can greatly improve text-to-image models trained at scale. This is due to the oftentimes simplistic nature of the human-generated captions that come with large-scale image datasets, which overly focus on the image subject and usually omit details describing the background or composition of the scene, or, if applicable, displayed text (Betker et al., 2023). We follow their approach and use an off-the-shelf, state-of-the-art vision-language model, CogVLM (Wang et al., 2023), to create synthetic annotations for our large-scale image dataset. As synthetic captions may cause a text-to-image model to forget about certain concepts not present in the VLM’s knowledge corpus, we use a ratio of 50 % original and 50 % synthetic captions."
"To test our synthetic captions at scale, we train DALL-E 3, a new state of the art text to image generator. To
train this model, we use a mixture of 95% synthetic captions and 5% ground truth captions."
"The above experiments suggest that we can maximize the performance of our models by training on a very high percentage of synthetic captions. However, doing so causes the models to naturally adapt to the distribution of long, highly-descriptive captions emitted by our captioner. Generative models are known to produce poor results when sampled out of their training distribution. Thus [...] we will need to exclusively sample from them with highly descriptive captions. [...] Models like GPT-4 have become exceptionally good at tasks that require imagination [...] It stands to reason that they might also be good at coming up with plausible details in an image description.
Given a prompt [...] we found that GPT-4 will readily "upsample" any caption into a highly descriptive one."
Hmmmm.I tried to wrangle rocm for hours to get flash attention installed for my 6800xt on mint 21.1 but eventually gave up. I even tried in docker and still the build fails with an http error 404 or something to that effect, and that was after 4 hours of waiting for docker pulls, building, reinstalling pytorch+rocm with pip. I also tried building llama.cpp for rocm and it just uses the CPU for inference inexplicably.
I had some difficulty getting ROCM running on Mint 21.3 with a 7800xt and use llama.cpp/Stable Diffusion, but I figured out what I was doing wrong. Follow the instructions @Overly Serious posted at first. Mint had a package that didn't install for me so after getting ROCM installed, run:I tried to wrangle rocm for hours to get flash attention installed for my 6800xt on mint 21.1 but eventually gave up. I even tried in docker and still the build fails with an http error 404 or something to that effect, and that was after 4 hours of waiting for docker pulls, building, reinstalling pytorch+rocm with pip. I also tried building llama.cpp for rocm and it just uses the CPU for inference inexplicably.
apt install rocm-hip-sdk
export HSA_OVERRIDE_GFX_VERSION=10.3.0
export HCC_AMDGPU_TARGET=gfx1030
export ROCM_PATH=/opt/rocm (or whatever your ROCM path is)
His target market is all the dumb niggercattle who don't know what a filesystem is, let alone how to set up SD.That nomad indie hacker-whatever pieter levels claims he's making about $100k a month from just two shitty SD apps, photoAI and interiorAI. He's so lazy both have the same look and UX despite being for 2 completely different markets. But seriously who is paying for this crap? Its not cheap either.
Tried some interior design app with nearly 5 stars on the app store and this shit just generated a random overstylized picture full of artifacts completely different from any room in my house, who the fuck likes this crap? are all reviews fake now?
I just wanted to vent mostly because I spent 8 hours pulling my hair out. I will try that, thank you.Hmmmm.
I mean part of me wants to offer to help because I might be able to, but the realities of trying to help debug something anonymously over forum messaging are tricky. I mostly just followed along with the instructions for installing it from their repos here:
I'll give that a try. I did get inference working in pytorch but specifically installing flash attention so I could speed up tts inferencing was the issue, it wouldn't cooperate with building from source or installing the wheel from the rocm flash attention release on github. My plan is to try again in the future but if it's still a pain I'm going to buy a 3900 or something similar, I assume that's easier to get working reliably. I thought about renting a server specifically for ML but it's expensive and I already have a capable system for what I want to do.This should make ROCM run on RX6000 card
Oh, we hear you! We've been there!I just wanted to vent mostly because I spent 8 hours pulling my hair out.
I upgrade very rarely and when I bought my 7900XT I knew much less about this stuff, at least in hands-on terms - theory I was fine with. And I thought to myself: "It's got 20GB VRAM and anything Nvidia will cost much more for the same amount of RAM. I can compromise a bit on AI being a bit worse in performance for the extra RAM". Woefully uninformed decision. I'd have saved myself many hours and an OS install (need Linux for current ROCm) if I'd just gone Nvidia. And to rub a little salt in the wound, the prices have plumetted from what I paid nearly on release to around £620. It's probably one of the most ill-informed tech purchases I've ever made. And I'm usually pretty cautious and discriminating on this stuff.I'll give that a try. I did get inference working in pytorch but specifically installing flash attention so I could speed up tts inferencing was the issue, it wouldn't cooperate with building from source or installing the wheel from the rocm flash attention release on github. My plan is to try again in the future but if it's still a pain I'm going to buy a 3900 or something similar, I assume that's easier to get working reliably. I thought about renting a server specifically for ML but it's expensive and I already have a capable system for what I want to do.
I'm glad I've finally figured out how to get working on my setup, but I'm real wary of updating from Mint 21.3 to 22. It took me a couple weeks of on and off troubleshooting before I figured what was going wrong. Even now I had to figure out what was causing random crashes on the Automatic1111 UI before I found this launch script after digging through the ROCM github issues page. I'm glad it's working, and I can generate some 2048x2048 and bigger images with Tiled Vae, but I want to make sure it never breaks again and not have to deal with it. My fault for going AMD, but I was using it for mostly gaming before I got back into SD and the 7800xt is a good card for that, especially on Linux.I recently spent a whole day trying out different distros and tearing my hair out with ROCM. In the end, none of my Linux/SD setups worked right.