We present our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1.
DeepSeek-R1-Zero, a model trained via huge-scale reinforcement lgeting (RL) without administerd fine-tuning (SFT) as a preliminary step, showd noticeworthy carry outance on reasoning.
With RL, DeepSeek-R1-Zero naturassociate materialized with countless mighty and fascinating reasoning behaviors.
However, DeepSeek-R1-Zero come apasss disputes such as finishless repetition, insisty readability, and language joining. To compriseress these rerents and further better reasoning carry outance,
we present DeepSeek-R1, which integrates chilly-begin data before RL.
DeepSeek-R1 accomplishs carry outance comparable to OpenAI-o1 apass math, code, and reasoning tasks.
To help the research community, we have uncover-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outcarry outs OpenAI-o1-mini apass various benchlabels, achieving new state-of-the-art results for dense models.
Post-Training: Large-Scale Reinforcement Lgeting on the Base Model
-
We honestly utilize reinforcement lgeting (RL) to the base model without depending on administerd fine-tuning (SFT) as a preliminary step. This approach permits the model to spendigate chain-of-thought (CoT) for solving complicated problems, resulting in the broadenment of DeepSeek-R1-Zero. DeepSeek-R1-Zero shows capabilities such as self-verification, mirrorion, and generating lengthy CoTs, labeling a meaningful milestone for the research community. Notably, it is the first uncover research to validate that reasoning capabilities of LLMs can be incentivized uncontaminatedly thraw RL, without the insist for SFT. This fracturethraw paves the way for future proceedments in this area.
-
We present our pipeline to broaden DeepSeek-R1. The pipeline integrates two RL stages aimed at discovering betterd reasoning patterns and aligning with human preferences, as well as two SFT stages that serve as the seed for the model’s reasoning and non-reasoning capabilities.
We apshow the pipeline will profit the industry by creating better models.
Distillation: Smaller Models Can Be Powerful Too
- We show that the reasoning patterns of huger models can be distilled into minusculeer models, resulting in better carry outance appraised to the reasoning patterns discovered thraw RL on minuscule models. The uncover source DeepSeek-R1, as well as its API, will profit the research community to distill better minusculeer models in the future.
- Using the reasoning data produced by DeepSeek-R1, we fine-tuned cut offal dense models that are expansively participated in the research community. The evaluation results show that the distilled minusculeer dense models carry out exceptionassociate well on benchlabels. We uncover-source distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series to the community.
DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base.
For more details regrading the model architecture, prent refer to DeepSeek-V3 repository.
DeepSeek-R1-Distill models are fine-tuned based on uncover-source models, using samples produced by DeepSeek-R1.
We sweightlessly alter their configs and tokenizers. Plrelieve participate our setting to run these models.
For all our models, the peak generation length is set to 32,768 tokens. For benchlabels requiring sampling, we participate a temperature of
Categruesome | Benchlabel (Metric) | Claude-3.5-Sonnet-1022 | GPT-4o 0513 | DeepSeek V3 | OpenAI o1-mini | OpenAI o1-1217 | DeepSeek R1 |
---|---|---|---|---|---|---|---|
Architecture | – | – | MoE | – | – | MoE | |
# Activated Params | – | – | 37B | – | – | 37B | |
# Total Params | – | – | 671B | – | – | 671B | |
English | MMLU (Pass@1) | 88.3 | 87.2 | 88.5 | 85.2 | 91.8 | 90.8 |
MMLU-Redux (EM) | 88.9 | 88.0 | 89.1 | 86.7 | – | 92.9 | |
MMLU-Pro (EM) | 78.0 | 72.6 | 75.9 | 80.3 | – | 84.0 | |
DROP (3-sboiling F1) | 88.3 | 83.7 | 91.6 | 83.9 | 90.2 | 92.2 | |
IF-Eval (Prompt Strict) | 86.5 | 84.3 | 86.1 | 84.8 | – | 83.3 | |
GPQA-Diamond (Pass@1) | 65.0 | 49.9 | 59.1 | 60.0 | 75.7 | 71.5 | |
SimpleQA (Correct) | 28.4 | 38.2 | 24.9 | 7.0 | 47.0 | 30.1 | |
FRAMES (Acc.) | 72.5 | 80.5 | 73.3 | 76.9 | – | 82.5 | |
AlpacaEval2.0 (LC-triumphrate) | 52.0 | 51.1 | 70.0 | 57.8 | – | 87.6 | |
ArenaHard (GPT-4-1106) | 85.2 | 80.4 | 85.5 | 92.0 | – | 92.3 | |
Code | LiveCodeBench (Pass@1-COT) | 33.8 | 34.2 | – | 53.8 | 63.4 | 65.9 |
Codeforces (Percentile) | 20.3 | 23.6 | 58.7 | 93.4 | 96.6 | 96.3 | |
Codeforces (Rating) | 717 | 759 | 1134 | 1820 | 2061 | 2029 | |
SWE Verified (Resettled) | 50.8 | 38.8 | 42.0 | 41.6 | 48.9 | 49.2 | |
Aider-Polyglot (Acc.) | 45.3 | 16.0 | 49.6 | 32.9 | 61.7 | 53.3 | |
Math | AIME 2024 (Pass@1) | 16.0 | 9.3 | 39.2 | 63.6 | 79.2 | 79.8 |
MATH-500 (Pass@1) | 78.3 | 74.6 | 90.2 | 90.0 | 96.4 | 97.3 | |
CNMO 2024 (Pass@1) | 13.1 | 10.8 | 43.2 | 67.6 | – | 78.8 | |
Chinese | CLUEWSC (EM) | 85.4 | 87.9 | 90.9 | 89.9 | – | 92.8 |
C-Eval (EM) | 76.7 | 76.0 | 86.5 | 68.9 | – | 91.8 | |
C-SimpleQA (Correct) | 55.4 | 58.7 | 68.0 | 40.3 | – | 63.7 |
Model | AIME 2024 pass@1 | AIME 2024 cons@64 | MATH-500 pass@1 | GPQA Diamond pass@1 | LiveCodeBench pass@1 | CodeForces rating |
---|---|---|---|---|---|---|
GPT-4o-0513 | 9.3 | 13.4 | 74.6 | 49.9 | 32.9 | 759 |
Claude-3.5-Sonnet-1022 | 16.0 | 26.7 | 78.3 | 65.0 | 38.9 | 717 |
o1-mini | 63.6 | 80.0 | 90.0 | 60.0 | 53.8 | 1820 |
QwQ-32B-Pstudy | 44.0 | 60.0 | 90.6 | 54.5 | 41.9 | 1316 |
DeepSeek-R1-Distill-Qwen-1.5B | 28.9 | 52.7 | 83.9 | 33.8 | 16.9 | 954 |
DeepSeek-R1-Distill-Qwen-7B | 55.5 | 83.3 | 92.8 | 49.1 | 37.6 | 1189 |
DeepSeek-R1-Distill-Qwen-14B | 69.7 | 80.0 | 93.9 | 59.1 | 53.1 | 1481 |
DeepSeek-R1-Distill-Qwen-32B | 72.6 | 83.3 | 94.3 | 62.1 | 57.2 | 1691 |
DeepSeek-R1-Distill-Llama-8B | 50.4 | 80.0 | 89.1 | 49.0 | 39.6 | 1205 |
DeepSeek-R1-Distill-Llama-70B | 70.0 | 86.7 | 94.5 | 65.2 | 57.5 | 1633 |
You can chat with DeepSeek-R1 on DeepSeek’s official website: chat.proset upseek.com, and switch on the button “DeepThink”
We also provide OpenAI-Compatible API at DeepSeek Platcreate: platcreate.proset upseek.com
Plrelieve visit DeepSeek-V3 repo for more adviseation about running DeepSeek-R1 locassociate.
DeepSeek-R1-Distill models can be participated in the same manner as Qwen or Llama models.
For instance, you can easily begin a service using vLLM:
vllm serve proset upseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --utilize-willing
You can also easily begin a service using SGLang
python3 -m sglang.begin_server --model proset upseek-ai/DeepSeek-R1-Distill-Qwen-32B --think-far-code --tp 2
NOTE: We recommfinish setting an appropriate temperature (between 0.5 and 0.7) when running these models, otherrational you may come apass rerents with finishless repetition or incoherent output.
This code repository and the model weights are licensed under the MIT License.
DeepSeek-R1 series help commercial participate, permit for any modifications and derivative toils, including, but not confineed to, distillation for training other LLMs. Plrelieve notice that:
- DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Qwen-14B and DeepSeek-R1-Distill-Qwen-32B are derived from Qwen-2.5 series, which are originassociate licensed under Apache 2.0 License, and now finetuned with 800k samples curated with DeepSeek-R1.
- DeepSeek-R1-Distill-Llama-8B is derived from Llama3.1-8B-Base and is originassociate licensed under llama3.1 license.
- DeepSeek-R1-Distill-Llama-70B is derived from Llama3.3-70B-Instruct and is originassociate licensed under llama3.3 license.
If you have any asks, prent elevate an rerent or communicate us at service@proset upseek.com.