Stable Diffusion, NovelAI, Machine Learning Art - AI art generation discussion and image dump

  • 🐕 I am attempting to get the site runnning as fast as possible. If you are experiencing slow page load times, please report it.

OpenAI’s new AI image generator pushes the limits in detail and prompt fidelity​

With better response to details and text, DALL-E 3 hopes to make prompt engineering obsolete.​

dalle_2_hero-800x450.jpg

Looks like a major escalation.

avocado_1.jpgsalami-1440x823.jpg
An illustration of an avocado sitting in a therapist's chair, saying 'I just feel so empty inside' with a pit-sized hole in its center. The therapist, a spoon, scribbles notes.A vast landscape made entirely of various meats spreads out before the viewer. tender, succulent hills of roast beef, chicken drumstick trees, bacon rivers, and ham boulders create a surreal, yet appetizing scene. the sky is adorned with pepperoni sun and salami clouds.
coffeeshop-1440x823.jpgcrab-1440x823.jpg
A minimap diorama of a cafe adorned with indoor plants. Wooden beams crisscross above, and a cold brew station stands out with tiny bottles and glasses.Close-up photograph of a hermit crab nestled in wet sand, with sea foam nearby and the details of its shell and texture of the sand accentuated.
paperart-1440x823.jpgpixel_coittower-1440x823.jpg
A paper craft art depicting a girl giving her cat a gentle hug. Both sit amidst potted plants, with the cat purring contentedly while the girl smiles. The scene is adorned with handcrafted paper flowers and leaves.Pixel art scene of Coit Tower standing tall on Telegraph Hill, with a panoramic view of the city below and birds flying around.
potatoking-1440x823.jpgheart-1440x823.jpg
Tiny potato kings wearing majestic crowns, sitting on thrones, overseeing their vast potato kingdom filled with potato subjects and potato castles.An illustration of a human heart made of translucent glass, standing on a pedestal amidst a stormy sea. Rays of sunlight pierce the clouds, illuminating the heart, revealing a tiny universe within. The quote 'Find the universe within you' is etched in bold letters across the horizon.
plategirl-1440x823.jpgteacup_tempest.jpg
A middle-aged woman of Asian descent, her dark hair streaked with silver, appears fractured and splintered, intricately embedded within a sea of broken porcelain. The porcelain glistens with splatter paint patterns in a harmonious blend of glossy and matte blues, greens, oranges, and reds, capturing her dance in a surreal juxtaposition of movement and stillness. Her skin tone, a light hue like the porcelain, adds an almost mystical quality to her form.A 3D render of a coffee mug placed on a window sill during a stormy day. The storm outside the window is reflected in the coffee, with miniature lightning bolts and turbulent waves seen inside the mug. The room is dimly lit, adding to the dramatic atmosphere.

dalle_comparison.jpg
A comparison of "An expressive oil painting of a basketball player dunking, depicted as an explosion of a nebula" as generated by DALL-E 2 (left) and DALL-E 3 (right).

You can see some problems with DALL-E 3 following instructions in the meat landscape and potato king prompts.
 
New audio AI dropped, infinite lo-fi hip hop AI hell is now very plausible:


Check "Prompt: lofi hip hop beat melodic chillhop 85 BPM" (it's a .M4A)


Trance, Ibiza, Beach, Sun, 4 AM, Progressive, Synthesizer, 909, Dramatic Chords, Choir, Euphoric, Nostalgic, Dynamic, Flowing
View attachment 5337886

Synthpop, Big Reverbed Synthesizer Pad Chords, Driving Gated Drum Machine, Atmospheric, Moody, Nostalgic, Cool, Club, Striped-back, Pop Instrumental, 100 BPM
View attachment 5337887

Feels like it's got a ways to go with certain genres, but interesting to see where it goes. I tried
Code:
JRPG orchestral battle music, piano, energetic, trumpets, drums, triangle
and got this:
 
As a nod to these controversies, OpenAI says that DALL-E 3 is designed to decline requests that ask for an image in the style of a living artist. OpenAI also provides a form where creators can opt out of having their images used to train future models. It seems unlikely that these measures will satisfy artists who typically think AI training should be opt-in only rather than included in image data sets by default.
It'll be interesting to see how people work on creating possible workarounds/jailbreaks of this, if possible. I don't really believe OpenAI will be stripping out any part of the dataset that could help improve the model's efficacy, so these steps feel more like an attempt at legal CYA measures.
 
If DALLE is closed source. There's not a point. SDXL is amazing for LoRAs and LoRAs sell stable diffusion.
If a normie can do more useful work with an OpenAI subscription than messing around with open source models, then there's definitely a point. It also gives an idea of where things are heading for competing models.
 
It'll be interesting to see how people work on creating possible workarounds/jailbreaks of this, if possible. I don't really believe OpenAI will be stripping out any part of the dataset that could help improve the model's efficacy, so these steps feel more like an attempt at legal CYA measures.
OpenAI has proven again and again that they will neuter and ruin their own products for the sake of “safety”. The article even mentions they have more safeguards in place for DALL-E 3.
 
OpenAI has proven again and again that they will neuter and ruin their own products for the sake of “safety”. The article even mentions they have more safeguards in place for DALL-E 3.
And here's the thing, I think the vast amount of image generation is done by coomers, furries and people copying styles to resell. So mostly consumers.
DALL-E 3 isn't a viable product to maximize profits if it can't hit those market segments.
Plus those market segments provide massive leaps forward in the tech in the span of months or weeks. The efficiency gains in the training time between Dreambooth, Hypernetworks and LoRAs is insane. That's just 6 months for completely different iterative approaches to improve on fine-tuning accuracy.
 
and yet another contender has arrived. https://mistral.ai frenchmen and ex-llama people released an 7b model. Notably, from benchmarks it beats 13b and code 34b llama 2 models. The general consensus seems to be that it's smart for a 7b model. Trained on a context size of 8k (vs. 4k llama 2) the team announces 13b and 34b for this fall. I feel this might be one to look out for.

There was also a chinese model released, Qwen 14b. Nothing special really except it really likes to play up Xi Jinping.
 
Feels like it's got a ways to go with certain genres, but interesting to see where it goes. I tried
Code:
JRPG orchestral battle music, piano, energetic, trumpets, drums, triangle
and got this:
View attachment 5353880
I could see that as the BGM of the realm of a mad god; the sky is a discordant mishmash of jagged colors which dart crazily from one end to the other, while abominations of nature scream and flail about - and it's impossible to tell whether they even intend to hurt you or not.
 
Normie Twitter has discovered controlnet and is having fun hiding politically incorrect messages in pastoral scenes. Eg. here are two examples I found:

F7AsQ3OaEAAyxP1.jpeg F61a0wnWEAA1PPo.jpeg

And here's a sneed castle I made using the same method.
13065-2079610094-a castle on a hill with a river running through it and boats in the water bel...png 13066-2079610095-a castle on a hill with a river running through it and boats in the water bel...png

And here's the thing, I think the vast amount of image generation is done by coomers, furries and people copying styles to resell. So mostly consumers.
This is 100% accurate. Look at at the most popular models and LoRAs on Civitai. It's a mix of NSFW, furry shit, and borderline copyright infringement (recognizable characters, people, etc). Also, making DALL-E 3 unable to mimic styles of particular artists is going to be a problem for anyone who wants to use assets in a game, book, or some other context where they need stylistic consistency.
 
I could see that as the BGM of the realm of a mad god; the sky is a discordant mishmash of jagged colors which dart crazily from one end to the other, while abominations of nature scream and flail about - and it's impossible to tell whether they even intend to hurt you or not.
Stable Audio isn't perfect, but it's a lot better than that crap that appeared a few months ago. What was it called... I think it was Riffusion. It appears to be on a trajectory that could take music generation to where image generation is now in the future. And of course, lazy fucks can use GPT to create verbose text descriptions to be fed into the music generators.
 
This is 100% accurate. Look at at the most popular models and LoRAs on Civitai. It's a mix of NSFW, furry shit, and borderline copyright infringement (recognizable characters, people, etc).
Holy shit you were not kidding. I've never used Civitai because I don't release my models, but wow that's a lot of porn. Also found this:

Screenshot_20230928-013718.jpg
 
In theory, couldnt you just ask whatever text ai to convert your description to a prompt, making D-3 immediately obsolete?
 
  • Optimistic
Reactions: Toolbox
In theory, couldnt you just ask whatever text ai to convert your description to a prompt, making D-3 immediately obsolete?
It's possible they are doing that, taking the user's prompt, applying a text->text model to turn the natural language prompt into a more machine interpretable prompt, and then using that as the input for their text->image model. Still though, that does not explain their ability to insert text directly into the image and compose multiple objects/concepts in such a reliable way.
 
In theory, couldnt you just ask whatever text ai to convert your description to a prompt, making D-3 immediately obsolete?
It's possible they are doing that, taking the user's prompt, applying a text->text model to turn the natural language prompt into a more machine interpretable prompt, and then using that as the input for their text->image model. Still though, that does not explain their ability to insert text directly into the image and compose multiple objects/concepts in such a reliable way.
Not sure what's happening internally in DALL-E 3, but they say it's "built on ChatGPT" and they will be encouraging the use of ChatGPT to make prompts:
DALL·E 3 is built natively on ChatGPT, which lets you use ChatGPT as a brainstorming partner and refiner of your prompts. Just ask ChatGPT what you want to see in anything from a simple sentence to a detailed paragraph.

When prompted with an idea, ChatGPT will automatically generate tailored, detailed prompts for DALL·E 3 that bring your idea to life. If you like a particular image, but it’s not quite right, you can ask ChatGPT to make tweaks with just a few words.

DALL·E 3 will be available to ChatGPT Plus and Enterprise customers in early October.
 
  • Informative
Reactions: greasyArchivist
1696139409039-0.png1696139409039-1.png1696139409039-2.png1696139409039-3.png
I know it's a humorous scene to behold, but when I first saw this I was legitimately taken aback and realized we pretty much are on the cusp of AI capable of generating anything with convincing results. Fucking wild it's managed to get a grasp on this sorta "low-poly, in-game, analog TV set" look.
 
Back