open-r1
TinyZero

open-r1 | TinyZero | |
---|---|---|
4 | 9 | |
22,359 | 11,174 | |
99.5% | 54.1% | |
9.4 | 9.4 | |
7 days ago | 5 days ago | |
Python | Python | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
open-r1
TinyZero
-
LIMO: Less Is More for Reasoning
Yes, the authors explicitly highlighted those two points in the abstract, in terms of them being the elicitation threshold for complex reasoning, namely, an extremely complete pre-trained foundation model, and a set of extremely high quality examples post-training.
To your question on finetuning on the initial 10 million pool - intuitively, it would require tremendous amount of finetuning data to move the needle - you really won't be able to move the gradients much with just 817 examples, that initial pool is effectively enforcing pretty rigid regularization.
There is now an increasing interest in showing that small data with inference time scaling is providing significant yield. Couple of recent examples:
* TinyZero: https://github.com/Jiayi-Pan/TinyZero
-
Mini-R1: Reproduce DeepSeek R1 "Aha Moment"
They do mention it here
> Note: This blog is inspired by Jiayi Pan [1] who initially explored the idea and proofed it with a small model.
But I agree, that attribution could be more substantial.
> Note: This blog is inspired by Jiayi Pan [1] who also reproduced the "Aha Moment" with their TinyZero [2] model.
[1] https://x.com/jiayi_pirate/status/1882839370505621655 (1.1M views btw)
[2] https://github.com/Jiayi-Pan/TinyZero
A lot of people are busy reproing R1 right now. I think this is the spark.
- Berkeley Researchers Replicate DeepSeek R1's Core Tech for Just $30: A Small Mod
-
Berkeley Researchers Replicate DeepSeek R1's Core Tech for Just $30
This is blogspam of https://github.com/Jiayi-Pan/TinyZero and https://nitter.lucabased.xyz/jiayi_pirate/status/18828393705.... This also doesn't mention that it's for one specific domain (playing Countdown).
-
Explainer: What's R1 and Everything Else?
This is indeed a massive exaggeration, I'm pretty sure the $30 experiment is this one: https://threadreaderapp.com/thread/1882839370505621655.html (github: https://github.com/Jiayi-Pan/TinyZero).
And while this is true that this experiment shows that you can reproduce the concept of direct reinforcement learning of an existing LLM, in a way that makes it develop reasoning in the same fashion Deepseek-R1 did, this is very far from a re-creation of R1!
-
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via RL
>I wonder if this was a deliberate move by PRC or really our own fault in falling for the fallacy that more is always better.
Well, let’s see …hmmm… are we discussing this on a platform ran by people who made insane money flipping zero-value companies to greater fools during the dotcom bubble, only to pivot to doing the same thing to big tech during the FANG era or one for discussing of hard ML research among the no-nonsense math elite from some of the world’s top universities.
More seriously, we don’t have to even speculate about any of this because the methods from DeepSeek’s work are already being reproduced:
https://github.com/Jiayi-Pan/TinyZero
- TinyZero
What are some alternatives?
DeepSeek-R1
DeepSeek-V3
DeepSeek-LLM - DeepSeek LLM: Let there be answers
