- Joined
- Jan 28, 2018
Can you tell me which of the R1 models you used and how you got it to run? There's quite a lengthy list to pick from and I was hoping to run one of them in GGUF format in Kobold. Tried my luck with Distill-Llama-8B and it'd endlessly go on tangents trying to think about the story instead of continuing one.
You have a misunderstanding there - these models were basically "trained" on r1's synthetic outputs to become smarter. Underyling, they're still llamas/qwens etc.. It improved them by quite a bit, but at the end of the day, an 8b model is still an 8b model and will be a bit stupid. I saw the best output of the distilled models coming from the 70b llama model. Also very good are the 32b and 14b qwen models, for their size.
The output you saw from me is from deepseek's API, the actual R1 model that generated the outputs the others were trained on, a 685b parameter model, used via some emacs lisp glue code I wrote to query the API. You can download this model too, but well, it's a bit hard to run.
Their API is very cheap, $2.19 for a million output tokens and $0.55 for a million input tokens on cache miss. (on cache hit, if you resend parts of an identical prompt it is $0.14)
I played around with it all day yesterday and spent about 11 cents.
For comparsion, gpt 4o (which is a lot worse) costs $2.50/million input tokens and $10/million output tokens.
The to r1 more equivalent intelligence-wise o1 costs $15/million input tokens $60/million output tokens and AFAIK, you need a $200 sub to OpenAI to even access it. (somebody correct me if I'm wrong)
It's impossible to overstate how much cheaper deepseeks models are.
Last edited: