ChatGPT - If Stack Overflow and Reddit had a child

  • 🐕 I am attempting to get the site runnning as fast as possible. If you are experiencing slow page load times, please report it.
Just noticed this on shitter:

Researchers recreated DeepSeek's core technology for just $30! A research team at the University of California, Berkeley, has reportedly recreated the core technology behind China’s advanced DeepSeek AI for just $30. This remarkably low-cost replication highlights a growing trend in AI development—while major tech companies produce impressive models, “garage” open source approaches significantly reduce the cost of building them. Under the leadership of Ph.D. candidate Jiayi Pan, the team successfully replicated DeepSeek R1-Zero’s reinforcement learning (RL) capabilities using a compact language model with just 3 billion parameters. Despite its smaller scale, the model demonstrated self-verification and search capabilities, allowing it to refine its own responses iteratively—key features of DeepSeek’s advanced AI.
 
I'm not sure what most of this means or whether this 3B parameter is as smart or close to Deepseek's nearly 700B parameter model, but I guess as long as it's free and open to use it'll help actually give more people power and freedom choosing how to use AI in their lives.
It means you don't need 500 gorillons of USD to train a reasoning model. 3B model is extremely scaled down, but if they have really managed to train it for 30 bucks, suddenly Deepseek's claims regarding their model's training cost aren't necessarily a mongoloid fantasies
 
I'm not sure what most of this means or whether this 3B parameter is as smart or close to Deepseek's nearly 700B parameter model, but I guess as long as it's free and open to use it'll help actually give more people power and freedom choosing how to use AI in their lives.

While larger models aren't necessarily better, any model with seven billion parameters or less is dumb. For comparison, 3B models can fit on a smartphone. You want at least 13 billion parameters.
 
  • Informative
Reactions: Colon capital V
but if they have really managed to train it for 30 bucks, suddenly Deepseek's claims regarding their model's training cost aren't necessarily a mongoloid fantasies
at this point it really does make you wonder whats stopping multiple groups from having their own version of their 700B model and adjusting it a bit and having true free speech machines. Kiwifarms AI?

seriously whats stopping Null or someone else from just downloading a stupid 14b model and letting users on the farms chat with it?
 
Last edited:
  • Feels
Reactions: Fabio Socialist
at this point it really does make you wonder whats stopping multiple groups from having their own version of their 700B model and adjusting it a bit and having true free speech machines. Kiwifarms AI?

seriously whats stopping Null or someone else from just downloading a stupid 14b model and letting users on the farms chat with it?
It will get spammed to shit and they cost too much to run


Something's coming from OAI today. I wonder what it is.
 
Last edited:
  • Like
Reactions: The Spoils of War
o3/o3-mini most likely. If you watch this sphere this always happens, somebody releases a great model and then everyone rushes to dump whatever they have.

I'm honestly not sure what all this excitement is even really about. It's so early in the AI race, expect the guy at the front to change more often in the coming months and years, IMO. It doesn't mean much. If this is anything like the home/personal computer revolution (and there are really a lot of parallels) ten years down the road, most of these companies will be defunct and only the people who were really interested in all this will remember their names. Hell, there might be one day "retro AI enthusiasts" dissecting/manipulating/modernizing all these models with advanced tools we can't even imagine now. It's how the cookie crumbles.

OpenAI will probably still be utter dicks and not releasing anything into the public domain tho.
 
OpenAI will probably still be utter dicks and not releasing anything into the public domain tho.
someone brought it up on a subreddit, but its ironic that a non-profit only releases closed source and a fucking hedgefund gave us the biggest/most advanced open source model (so far, at this time)

i don't even know why someone would want to put AI on a computer the size of a credit card, but at minimum it keeps the rest of these fuckers on their toes. if Deepseek didn't do this, it would probably be even more years down the line before someone accomplished this. even the people downplaying this have to admit that because of this any dipshit with 6k to spent on a computer can put on their own computer a model that rivals what the fucks at OpenAI wouldn't let be released to the public last year.

It basically lights a fire under the asses of these companies and forces them to answer "why should i use your service when i can use Tommy dipshit's model for free without as much censorship and get an answer at least somewhat close to what you gave me" and just based on the download figures there's at least a half million people running this on their own shit and could very easily cut into private companies obnoxious $/per token madness. If this shit came out 2 months ago no way in hell would OpenAI have idiots paying $200/month for their bullshit. samething with Poe offering $100/month plans or Anthropic and their $20 a month only 5% of messages not refused madness.

I guarantee OpenAI wasn't going to release their o3 faggotry until December. this was their "in case of emergency" move so that they're still the big name in the technology and get the contracts.

Especially when it comes to AI this decade, people need to remember "this is the most expensive and slowest it will ever be"

We're not going to hit where we are with GPUs where even when we switch to a new generation the output per core/wattage is negligible. for a long fucking time.
 
1738361236299.png

Deepseek claims CIA is taking their operations down.


Honestly I think Keffals could do it.
 
View attachment 6926933

This is some damn good prose from DeepSeek v3.
I've been wondering how to use these things in videogames in a reasonable way for about 1-2 years now.

For general reasoning about game logic, they do not perform well. I tried to daisychain older llama models into navigating a MUD like enviroment and they did not do a great job in general. It's much easier and more effective to do this with conventional programming, and it's a lot more efficent resource-wise to boot. They're also not being good at injecting random stuff into things because of the strong biases most models have. The smarter they are, the truer this actually seemed to get. R1 (which is sort of the parent of V3) is sort of the odd one out in that regard that it's both very smart and very "creative", able to come to wildly different conclusions in the same setting if you just prompt it to do so. I think deepseek actually trained it for roleplaying and creativity targetly, or at least didn't filter the datasets that go into those directions as hard.

Controling NPC dialog - kind of, some game mods already tried that in CK III and Skyrim and I think Fallout 4?! (at least the three I know of) I feel the problem here is that they'd quickly lose track and/or could easily talked by the player into immersion-breaking and weird outcomes.

I feel they would work really well in Solo RPG settings, as long as you keep track of the rules and just have the LLM write fluff or wordbuilding. I'd trust R1 and V3 to pull that off without getting too boring. But of course, that's something that needs a lot of your input and isn't for everyone.
 
Back