ChatGPT - If Stack Overflow and Reddit had a child

  • 🐕 I am attempting to get the site runnning as fast as possible. If you are experiencing slow page load times, please report it.
Can you tell me which of the R1 models you used and how you got it to run? There's quite a lengthy list to pick from and I was hoping to run one of them in GGUF format in Kobold. Tried my luck with Distill-Llama-8B and it'd endlessly go on tangents trying to think about the story instead of continuing one.

You have a misunderstanding there - these models were basically "trained" on r1's synthetic outputs to become smarter. Underyling, they're still llamas/qwens etc.. It improved them by quite a bit, but at the end of the day, an 8b model is still an 8b model and will be a bit stupid. I saw the best output of the distilled models coming from the 70b llama model. Also very good are the 32b and 14b qwen models, for their size.

The output you saw from me is from deepseek's API, the actual R1 model that generated the outputs the others were trained on, a 685b parameter model, used via some emacs lisp glue code I wrote to query the API. You can download this model too, but well, it's a bit hard to run.

Their API is very cheap, $2.19 for a million output tokens and $0.55 for a million input tokens on cache miss. (on cache hit, if you resend parts of an identical prompt it is $0.14)

I played around with it all day yesterday and spent about 11 cents.

For comparsion, gpt 4o (which is a lot worse) costs $2.50/million input tokens and $10/million output tokens.

The to r1 more equivalent intelligence-wise o1 costs $15/million input tokens $60/million output tokens and AFAIK, you need a $200 sub to OpenAI to even access it. (somebody correct me if I'm wrong)

It's impossible to overstate how much cheaper deepseeks models are.
 
Last edited:
Someone threw a picture of what they claim is Deepseek running on a Raspberry Pi into a chat channel I read. Are the Deepseek models that revolutionary that they've cut the hardware requirements from a cutting edge GPU like a 4090 to a fucking Pi? I instantly suspected there was some asspull like the Pi just doing API calls to the home servers but everything I'm reading seems to indicate that the models are indeed more efficient than the old ones, but that kind of hardware cost reduction seems farfetched. Whats the truth here?
 
Someone threw a picture of what they claim is Deepseek running on a Raspberry Pi into a chat channel I read. Are the Deepseek models that revolutionary that they've cut the hardware requirements from a cutting edge GPU like a 4090 to a fucking Pi? I instantly suspected there was some asspull like the Pi just doing API calls to the home servers but everything I'm reading seems to indicate that the models are indeed more efficient than the old ones, but that kind of hardware cost reduction seems farfetched. Whats the truth here?
See right over your post. They released R1 and a bunch of finetunes of other models (llama, qwen) they tuned on R1s output and then also confusingly called R1. These are just finetunes, not the actual R1 model which is around 685b parameters and would most defintively not fit on a Pi. I'm sure they won't make that mistake again.
 
1737820681422.png

Claude can generate shapes now. I feel like an ape discovering fire.
 
View attachment 6902921

Claude can generate shapes now. I feel like an ape discovering fire.
That’s cool. Like 2 months ago I asked it to show me an image of a 4 bit step adder and subtractor which it then basically exploded before giving me an Actually Indian result. I guess they changed the parameter config because when I try now it won’t even attempt it. There’s probably a layer of inferred abstraction they removed.
 
Are the Deepseek models that revolutionary that they've cut the hardware requirements from a cutting edge GPU like a 4090 to a fucking Pi?
You don't need a 4090, I'm running deepseek-r1:14b on an old 3060 with 12gb of VRAM with good results. May be possible to run one of the more compressed models on a Raspberry Pi. Can definitely run it on those new Jetson Orin Nano devices.
 
You don't need a 4090, I'm running deepseek-r1:14b on an old 3060 with 12gb of VRAM with good results. May be possible to run one of the more compressed models on a Raspberry Pi. Can definitely run it on those new Jetson Orin Nano devices.
The big fanfaire I've seen is that deepseek is so efficient that it can be run locally even using what would be average workspace hardware, I'm talking laptops with igpu or some office workstation using gpu like a 1060 because shit only gets upgraded about once a decade or so. If this is true it effectively kills the LLM market because there is no real way to monetize it for future growth.
 
The big fanfaire I've seen is that deepseek is so efficient that it can be run locally even using what would be average workspace hardware, I'm talking laptops with igpu or some office workstation using gpu like a 1060 because shit only gets upgraded about once a decade or so. If this is true it effectively kills the LLM market because there is no real way to monetize it for future growth.
Haven't tried it on older devices, but I may pull out my old (and practically obsolete) Jetson Nano from a few gens ago and see if I can get it to run on it.
 
Thread should be renamed "LLMs General", and OpenAI should be slaughtered.

AMD announced their "Strix Halo" mega APUs at CES, which will appear on the market in the coming months. Gaming-oriented models are paired with a minimum of 32 GB of LPDDR5X, but "AI workstations" will come with up to 128 GB. AMD's software can configure 75% (up to 96 GB) as "VRAM", allowing it to easily run 70B size models.

One of the claims AMD made is that Strix Halo can outperform the RTX 4090 while using a fraction of the power:

AMD’s beastly ‘Strix Halo’ Ryzen AI Max+ debuts with radical new memory tech to feed RDNA 3.5 graphics and Zen 5 CPU cores (archive)
8yiV8sLajcWb9K5ZqjeVV3.jpg
AMD also shared some rather impressive results showing a Llama 70B Nemotron LLM AI model running on both the Ryzen AI Max+ 395 with 128GB of total system RAM (32GB for the CPU, 96GB allocated to the GPU) and a desktop Nvidia GeForce RTX 4090 with 24GB of VRAM (details of the setups in the slide below). AMD says the AI Max+ 395 delivers up to 2.2X the tokens/second performance of the desktop RTX 4090 card, but the company didn’t share time-to-first-token benchmarks.

Perhaps more importantly, AMD claims to do this at an 87% lower TDP than the 450W RTX 4090, with the AI Max+ running at a mere 55W. That implies that systems built on this platform will have exceptional power efficiency metrics in AI workloads.

Strix Halo's big iGPU is only around a 7600 XT. If this kind of result is typical, it's only because it has the equivalent of 96 GB of VRAM compared to a measly 24 GB for the 4090, so it can handle larger models better. To get more memory bandwidth, Strix Halo is configured with quad-channel (256-bit) LPDDR5X for around 256 GB/s. It also has 32 MiB of Infinity Cache, which may not help (it's good for gaming).

I haven't seen many AI benchmarks for the RTX 5090 yet, but if it's only around a +30-50% improvement, Strix Halo could outperform that too. But only when limited by VRAM. If a 30B model can fit in the 5090's 32 GB, it will smoke everything out there.

Nvidia has an answer for this: the Project Digits "AI supercomputer". That is a mini PC with a 20-core ARM CPU and larger Blackwell GPU that includes... 128 GB of LPDDR5X. It's $3,000, but I expect 128 GB Strix Halo systems to cost around that much too. Tiny Attention Whore Corp scorned the box.

If Strix Halo is successful, AMD could follow it up with a Zen 6 "Medusa Halo". Hopefully with even greater memory support, e.g. 192-256 GB. There's also nothing stopping AMD from adding 3D V-Cache (like a 9800X3D) because these parts use desktop CPU chiplets.

Also, take a look at this. It could be marketing bullshit, but it might actually be a weird trick to download more VRAM:
CNX Software: Phison’s aiDAPTIV+ AI solution leverages SSDs to expand GPU memory for LLM training
On the other hand, the aiDAPTIV+middleware manages data flow between the GPU memory and the SSDs. It analyzes the AI model’s needs, identifies less active data segments, and transfers them to the SSDs, effectively creating a vast virtual memory pool for the GPU. This technique allows for the training of much larger models than would be possible with the GPUs onboard.

Worked fine for me with rapidsave.com
I use that site and don't get audio unless I pick the (default) highest resolution copy.
 
You don't need a 4090, I'm running deepseek-r1:14b on an old 3060 with 12gb of VRAM with good results.
That's the distill model, https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
They used deepseek r1 to generate training data to finetune Qwen-14b to achieve that.
The full sized model is at https://huggingface.co/deepseek-ai/DeepSeek-R1 (685b parameter according HF).

https://huggingface.co/ibm-granite/granite-3.1-3b-a800m-instruct is probably more suitable for use on a pure CPU environment. It should be small enough to use on an Rpi if you load it as 4-bit transformer model.

Edit: typo
 
That's the distill model, https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
They used deepseek r1 to generate training data to finetune Qwen-14b to achieve that.
The full sized model is at https://huggingface.co/deepseek-ai/DeepSeek-R1 (685b parameter according HF).

https://huggingface.co/ibm-granite/granite-3.1-3b-a800m-instruct is probably more suitable for use on a pure CPU environment. It should be small enough to use on an Rpi if you load it as 4-bit transformer model.

Edit: typo
Yes, it's not possible to run the full R1 model even on a 4090, just the finetuned Qwen or Llama models.

Need a few of these bad boys for the full R1 model: https://www.ebay.com/itm/386203547539
1737984665412.png

Edit as not to double-post: Anyone got Browser Use and WebUI to work with the R1 finetuned models to make a homemade AI operator service? Works with ollama and qwen2.5:7b, but the R1 models crap out with a 400 error.
 
Last edited:
Yes, it's not possible to run the full R1 model even on a 4090, just the finetuned Qwen or Llama models.

Need a few of these bad boys for the full R1 model: https://www.ebay.com/itm/386203547539
I believe it needs in the ballpark of 512-768 GB (I heard around 300 GB earlier, is that a lower precision mode?):
https://bojan.substack.com/p/the-first-few-impressions-of-deepseek (archive)

Not possible with only one chip on the market right now, but definitely feasible within a few years. HBM stacks can get larger and more numerous to go from the current 288 GB memory on package to closer to 1 TB. AMD could iterate on the Strix Halo formula and allow the use of LPDDR6/7 LPCAMMs to let the user buy as much memory as they can possibly use. True 3D DRAM (archive) (with dozens/hundreds of layers like V-NAND) could be coming by around the early 2030s, increasing density and hopefully lowering cost per bit. If that takes off then 1 TB could become commonplace even for consumers in a decade.

State-of-the-art model sizes could increase more, but at least this next step will be taken.

* Cantor Fitzgerald On NVIDIA: The DeepSeek “Announcement Is Actually Very Bullish With AGI Seemingly Closer To Reality And Jevons Paradox Almost Certainly Leading To The AI Industry Wanting More Compute, Not Less” (archive)
* Has DeepSeek Ended The AI Run Led By NVIDIA? A Deep Dive Into What’s Next For The Magnificent Seven (archive)

There has been a lot of noise about Nvidia and their stock dropped 15% today, but I believe they will be fine as the shovel/pickaxe seller in the short term, and it's really OpenAI and others that have to watch out.

For anyone who hasn't heard of it, Jevons paradox is the idea that falling energy/resource prices increase demand/consumption. So dropping the token cost by 95-98% isn't going to decrease the demand for AI accelerators and GPUs by that much. It will make it easier to provide higher quality cloud/search AI without burning billions of dollars.
 
Last edited:
There has been a lot of noise about Nvidia and their stock dropped 15% today
Even worse, It's 17% now. I agree that Nvidia isn't going anywhere, they're going to keep profiting from this as if nothing changed.

OpenAI on the other hand is in full panic mode. Everyone is shitting on them while their researcher jeets are malding and coping about "muh privacy" and "CCP backdoors."

Has Sam Altman made any statements yet? I fully expect him to misread the room and ask for more taxpayer money and H1Bs so they can be more "competitive" to China.

Edit: Some reactions from malding nerds.
1737996966922.png

1737997004899.png


1737997015318.png

And obligatory shitting on them:
1737997112677.png
 
Last edited:
Just a horrible basic bitch take from whoever Steven Heidel is.

No Viet Cong ever called me nigger, and no Chink ever arrested me for cyberterrorism or misgendering. And that's before you get into the fact that you can run this stuff locally.

Meanwhile, one of OpenAI's latest products is Operator, which hands over control of your computer to AI. And I would even consider using AutoHotkey on Roids on a cheap, throwaway computer, but I could still see that going terribly wrong.
 
Just a horrible basic bitch take from whoever Steven Heidel is.

No Viet Cong ever called me nigger, and no Chink ever arrested me for cyberterrorism or misgendering. And that's before you get into the fact that you can run this stuff locally.

Meanwhile, one of OpenAI's latest products is Operator, which hands over control of your computer to AI. And I would even consider using AutoHotkey on Roids on a cheap, throwaway computer, but I could still see that going terribly wrong.
His profile is just... *chef's kiss*
1737999522850.png

Edit: Bros, they just keep coming.
1738000304204.png

/ https://archive.ph/wip/iesoB

 
Last edited:
No Viet Cong ever called me nigger, and no Chink ever arrested me for cyberterrorism or misgendering. And that's before you get into the fact that you can run this stuff locally.

I hate OpenAI too, but that doesn't mean I'll start sucking Chinese cock because they've released models that can do decent creative writing. I think I'll wait until they've been retrained by people in free countries to have less of a communist bias.
 
Back