LLMstudio
TinyZero
LLMstudio | TinyZero | |
---|---|---|
2 | 9 | |
325 | 11,594 | |
5.8% | 6.9% | |
9.5 | 9.3 | |
2 days ago | 21 days ago | |
Python | Python | |
Mozilla Public License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
LLMstudio
-
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via RL
> And of course if you ask it anything related to the CCP it will suddenly turn into a Pinokkio simulator.
Smh this isn't a "gotcha!". Guys, it's open source, you can run it on your own hardware[^2]. Additionally, you can liberate[^3] it or use an uncensored version[^0] on your own hardware. If you don't want to host it yourself, you can run it at https://nani.ooo/chat (Select "NaniSeek Uncensored"[^1]) or https://venice.ai/chat (select "DeepSeek R1")
[^0]: https://huggingface.co/mradermacher/deepseek-r1-qwen-2.5-32B...
[^1]: https://huggingface.co/NaniDAO/deepseek-r1-qwen-2.5-32B-abla...
[^2]: https://github.com/TensorOpsAI/LLMStudio
[^3]: https://www.lesswrong.com/posts/jGuXSZgv6qfdhMCuJ/refusal-in...
-
Mixtral: Mixture of Experts
Lmstudio (that they linked) is definitely not open source, and doesn't even offer a pricing model for business use.
Llmstudio is, but I suspect that was a typo in their comment. https://github.com/TensorOpsAI/LLMStudio
TinyZero
-
LIMO: Less Is More for Reasoning
Yes, the authors explicitly highlighted those two points in the abstract, in terms of them being the elicitation threshold for complex reasoning, namely, an extremely complete pre-trained foundation model, and a set of extremely high quality examples post-training.
To your question on finetuning on the initial 10 million pool - intuitively, it would require tremendous amount of finetuning data to move the needle - you really won't be able to move the gradients much with just 817 examples, that initial pool is effectively enforcing pretty rigid regularization.
There is now an increasing interest in showing that small data with inference time scaling is providing significant yield. Couple of recent examples:
* TinyZero: https://github.com/Jiayi-Pan/TinyZero
-
Mini-R1: Reproduce DeepSeek R1 "Aha Moment"
They do mention it here
> Note: This blog is inspired by Jiayi Pan [1] who initially explored the idea and proofed it with a small model.
But I agree, that attribution could be more substantial.
> Note: This blog is inspired by Jiayi Pan [1] who also reproduced the "Aha Moment" with their TinyZero [2] model.
[1] https://x.com/jiayi_pirate/status/1882839370505621655 (1.1M views btw)
[2] https://github.com/Jiayi-Pan/TinyZero
A lot of people are busy reproing R1 right now. I think this is the spark.
- Berkeley Researchers Replicate DeepSeek R1's Core Tech for Just $30: A Small Mod
-
Berkeley Researchers Replicate DeepSeek R1's Core Tech for Just $30
This is blogspam of https://github.com/Jiayi-Pan/TinyZero and https://nitter.lucabased.xyz/jiayi_pirate/status/18828393705.... This also doesn't mention that it's for one specific domain (playing Countdown).
-
Explainer: What's R1 and Everything Else?
This is indeed a massive exaggeration, I'm pretty sure the $30 experiment is this one: https://threadreaderapp.com/thread/1882839370505621655.html (github: https://github.com/Jiayi-Pan/TinyZero).
And while this is true that this experiment shows that you can reproduce the concept of direct reinforcement learning of an existing LLM, in a way that makes it develop reasoning in the same fashion Deepseek-R1 did, this is very far from a re-creation of R1!
-
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via RL
>I wonder if this was a deliberate move by PRC or really our own fault in falling for the fallacy that more is always better.
Well, let’s see …hmmm… are we discussing this on a platform ran by people who made insane money flipping zero-value companies to greater fools during the dotcom bubble, only to pivot to doing the same thing to big tech during the FANG era or one for discussing of hard ML research among the no-nonsense math elite from some of the world’s top universities.
More seriously, we don’t have to even speculate about any of this because the methods from DeepSeek’s work are already being reproduced:
https://github.com/Jiayi-Pan/TinyZero
- TinyZero
What are some alternatives?
ollama - Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other large language models.
DeepSeek-R1
gateway - The only fully local production-grade Super SDK that provides a simple, unified, and powerful interface for calling more than 200+ LLMs.
open-r1 - Fully open reproduction of DeepSeek-R1
r2md - Convert an entire code repository (local or remote) to a single markdown or pdf file
DeepSeek-LLM - DeepSeek LLM: Let there be answers