DeepSeek-V3
DeepSeek-LLM
DeepSeek-V3 | DeepSeek-LLM | |
---|---|---|
14 | 29 | |
96,026 | 4,462 | |
6.6% | 0.0% | |
8.3 | 7.2 | |
18 days ago | about 1 year ago | |
Python | Makefile | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
DeepSeek-V3
- DeepSeek V3-0324 vs. Claude 3.7 Sonnet Base: Which AI Codes Better?
-
Deepseek API Complete Guide: Mastering the DeepSeek API for Developers
What distinguishes DeepSeek-V3 is its training efficiency—completed using only 2.664M H800 GPU hours on 14.8 trillion tokens, making it remarkably cost-effective for its size. Technical specifications are available on the GitHub page for DeepSeek-V3.
-
Analyzing DeepSeek API Instability: What API Gateways Can and Can't Do
DeepSeek, known for its high-performance AI models like R1 and V3, has been a game-changer in the AI landscape. However, recent reports have highlighted issues with API instability, affecting developers and users who rely on these services. Understanding the root causes of this instability is essential for addressing and mitigating these issues.
-
DeepSeek not as disruptive as claimed, firm has 50k GPUs and spent $1.6B
It is not FOSS. The LLM industry has repurposed "open source" to mean "you can run the model yourself." They've released the model, but it does not meet the 'four freedoms' standard: https://github.com/deepseek-ai/DeepSeek-V3/blob/main/LICENSE...
-
Build your next AI Tech Startup with DeepSeek
Typically, training parts of an AI model usually meant updating the whole thing, even if some parts didn't contribute anything, which lead to a massive waste of resources. To solve this, they introduced an Auxiliary-Loss-Free (ALS) Load Balancing. The ALS Load Balancing works by introducing a bias factor to prevent overloading one chip, while under-utilizing another (Source). This resulted in only 5% of the model's parameters being trained per-token, and around 91% cheaper cost to train than GPT 4 (GPT 4 costed $63 million to train (Source) and V3 costed $5.576 million to train. (Source))
-
Is DeepSeek’s Influence Overblown?
According to the official paper, DeepSeek took only $5.6 mln to train with impressive results. This is a remarkable achievement for a large language model (LLM). In comparison, OpenAI's CEO Sam Altman admitted that training OpenAI GPT-4 took over $100 mln, not saying how much more. Some AI specialists assume that the estimation of the DeepSeek training expense is underreported. Nevertheless, the hidden gem is not how much it cost to train but how drastically it improved runtime requirements.
- Maybe you missed this file when looking at DeepSeek?
-
DeepSeek proves the future of LLMs is open-source
> If the magic values are some kind of microcode or firmware, or something else that is executed in some way, then no, it is not really open source.
To my understanding, the contents of a .safetensors file is purely numerical weights - used by the model defined in MIT-licensed code[0] and described in a technical report[1]. The weights are arguably only really "executed" to the same extent kernel weights of a gaussian blur filter would be, though there is a large difference in scale and effect.
[0]: https://github.com/deepseek-ai/DeepSeek-V3/blob/main/inferen...
[1]: https://arxiv.org/html/2412.19437v1
- DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via RL
-
AI and Startup Moats
But the cost is _definitely_ falling. For a recent example, see DeepSeek V3[1]. It's a model that's competitive with GPT-4, Claude Sonnet. But cost ~$6 Million to train.
This is ridiculously cheaper than what we had before. Inference is basically getting an 10x cheaper per year!
We're spending more because bigger models are worth the investment. But the "price per unit of [intelligence/quality]" is getting lower and _fast_.
Saying that models are getting more expensive is confusing the absolute value spent with the value for money.
- [1] https://github.com/deepseek-ai/DeepSeek-V3/tree/main
DeepSeek-LLM
-
Say Goodbye to Color Guessing: Enable HSL Previews in Tailwind + VSCode
Tested and verified ont ChatGTP and DeepSeek
-
DeepSeek V3-0324: A "Minor" Upgrade That Feels Major
DeepSeek AI has just rolled out a new version of its popular model, DeepSeek V3-0324. While the official announcement downplays it as a "minor update," real-world testing suggests otherwise. Users have reported noticeable improvements in logical reasoning, programming capabilities, and problem-solving—making this upgrade feel anything but small.
- Deepseek V3-0324
- Too many AIs
-
The Great Programmer Purge: How AI Is Taking Over the Tech Workforce
DeepSeek
-
How to create a RustDLL binding for NodeJs
First off, I research for this with some help from DeepSeek. It's easier to search answers that way than Googling.
-
Rant: state of generative AI in code generation.
And the following Shell code with DeepSeek R1:
-
Yet another Go client for Deepseek API
30 seconds demo: left-side browser with chat.deepseek.com v/s go-deepseek in right-side terminal.
-
Build your next AI Tech Startup with DeepSeek
You can use DeepSeek V3 and R1 for free on their official website.
- DeepSeek gives Europe's tech firms a chance to catch up
What are some alternatives?
DeepSeek-R1
open-r1 - Fully open reproduction of DeepSeek-R1
TinyZero - Clean, minimal, accessible reproduction of DeepSeek R1-Zero
sglang - SGLang is a fast serving framework for large language models and vision language models.
ollama - Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 3, Mistral Small 3.1 and other large language models.