GitHub - deepseek-ai/DeepSeek-R1

Paper Link👁️

We present our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1.
DeepSeek-R1-Zero, a model trained via huge-scale reinforcement lgeting (RL) without administerd fine-tuning (SFT) as a preliminary step, showd noticeworthy carry outance on reasoning.
With RL, DeepSeek-R1-Zero naturassociate materialized with countless mighty and fascinating reasoning behaviors.
However, DeepSeek-R1-Zero come apasss disputes such as finishless repetition, insisty readability, and language joining. To compriseress these rerents and further better reasoning carry outance,
we present DeepSeek-R1, which integrates chilly-begin data before RL.
DeepSeek-R1 accomplishs carry outance comparable to OpenAI-o1 apass math, code, and reasoning tasks.
To help the research community, we have uncover-sourced DeepSeek-R1-Zero, DeepSeek-R1, and six dense models distilled from DeepSeek-R1 based on Llama and Qwen. DeepSeek-R1-Distill-Qwen-32B outcarry outs OpenAI-o1-mini apass various benchlabels, achieving new state-of-the-art results for dense models.

Post-Training: Large-Scale Reinforcement Lgeting on the Base Model

We honestly utilize reinforcement lgeting (RL) to the base model without depending on administerd fine-tuning (SFT) as a preliminary step. This approach permits the model to spendigate chain-of-thought (CoT) for solving complicated problems, resulting in the broadenment of DeepSeek-R1-Zero. DeepSeek-R1-Zero shows capabilities such as self-verification, mirrorion, and generating lengthy CoTs, labeling a meaningful milestone for the research community. Notably, it is the first uncover research to validate that reasoning capabilities of LLMs can be incentivized uncontaminatedly thraw RL, without the insist for SFT. This fracturethraw paves the way for future proceedments in this area.
We present our pipeline to broaden DeepSeek-R1. The pipeline integrates two RL stages aimed at discovering betterd reasoning patterns and aligning with human preferences, as well as two SFT stages that serve as the seed for the model’s reasoning and non-reasoning capabilities.
We apshow the pipeline will profit the industry by creating better models.

Distillation: Smaller Models Can Be Powerful Too

We show that the reasoning patterns of huger models can be distilled into minusculeer models, resulting in better carry outance appraised to the reasoning patterns discovered thraw RL on minuscule models. The uncover source DeepSeek-R1, as well as its API, will profit the research community to distill better minusculeer models in the future.
Using the reasoning data produced by DeepSeek-R1, we fine-tuned cut offal dense models that are expansively participated in the research community. The evaluation results show that the distilled minusculeer dense models carry out exceptionassociate well on benchlabels. We uncover-source distilled 1.5B, 7B, 8B, 14B, 32B, and 70B checkpoints based on Qwen2.5 and Llama3 series to the community.

DeepSeek-R1-Zero & DeepSeek-R1 are trained based on DeepSeek-V3-Base.
For more details regrading the model architecture, prent refer to DeepSeek-V3 repository.

DeepSeek-R1-Distill Models

DeepSeek-R1-Distill models are fine-tuned based on uncover-source models, using samples produced by DeepSeek-R1.
We sweightlessly alter their configs and tokenizers. Plrelieve participate our setting to run these models.

For all our models, the peak generation length is set to 32,768 tokens. For benchlabels requiring sampling, we participate a temperature of $0.6$, a top-p cherish of $0.95$, and produce 64 responses per query to appraise pass@1.

Categruesome	Benchlabel (Metric)	Claude-3.5-Sonnet-1022	GPT-4o 0513	DeepSeek V3	OpenAI o1-mini	OpenAI o1-1217	DeepSeek R1
	Architecture	–	–	MoE	–	–	MoE
	# Activated Params	–	–	37B	–	–	37B
	# Total Params	–	–	671B	–	–	671B
English	MMLU (Pass@1)	88.3	87.2	88.5	85.2	91.8	90.8
	MMLU-Redux (EM)	88.9	88.0	89.1	86.7	–	92.9
	MMLU-Pro (EM)	78.0	72.6	75.9	80.3	–	84.0
	DROP (3-sboiling F1)	88.3	83.7	91.6	83.9	90.2	92.2
	IF-Eval (Prompt Strict)	86.5	84.3	86.1	84.8	–	83.3
	GPQA-Diamond (Pass@1)	65.0	49.9	59.1	60.0	75.7	71.5
	SimpleQA (Correct)	28.4	38.2	24.9	7.0	47.0	30.1
	FRAMES (Acc.)	72.5	80.5	73.3	76.9	–	82.5
	AlpacaEval2.0 (LC-triumphrate)	52.0	51.1	70.0	57.8	–	87.6
	ArenaHard (GPT-4-1106)	85.2	80.4	85.5	92.0	–	92.3
Code	LiveCodeBench (Pass@1-COT)	33.8	34.2	–	53.8	63.4	65.9
	Codeforces (Percentile)	20.3	23.6	58.7	93.4	96.6	96.3
	Codeforces (Rating)	717	759	1134	1820	2061	2029
	SWE Verified (Resettled)	50.8	38.8	42.0	41.6	48.9	49.2
	Aider-Polyglot (Acc.)	45.3	16.0	49.6	32.9	61.7	53.3
Math	AIME 2024 (Pass@1)	16.0	9.3	39.2	63.6	79.2	79.8
	MATH-500 (Pass@1)	78.3	74.6	90.2	90.0	96.4	97.3
	CNMO 2024 (Pass@1)	13.1	10.8	43.2	67.6	–	78.8
Chinese	CLUEWSC (EM)	85.4	87.9	90.9	89.9	–	92.8
	C-Eval (EM)	76.7	76.0	86.5	68.9	–	91.8
	C-SimpleQA (Correct)	55.4	58.7	68.0	40.3	–	63.7

Distilled Model Evaluation

Model	AIME 2024 pass@1	AIME 2024 cons@64	MATH-500 pass@1	GPQA Diamond pass@1	LiveCodeBench pass@1	CodeForces rating
GPT-4o-0513	9.3	13.4	74.6	49.9	32.9	759
Claude-3.5-Sonnet-1022	16.0	26.7	78.3	65.0	38.9	717
o1-mini	63.6	80.0	90.0	60.0	53.8	1820
QwQ-32B-Pstudy	44.0	60.0	90.6	54.5	41.9	1316
DeepSeek-R1-Distill-Qwen-1.5B	28.9	52.7	83.9	33.8	16.9	954
DeepSeek-R1-Distill-Qwen-7B	55.5	83.3	92.8	49.1	37.6	1189
DeepSeek-R1-Distill-Qwen-14B	69.7	80.0	93.9	59.1	53.1	1481
DeepSeek-R1-Distill-Qwen-32B	72.6	83.3	94.3	62.1	57.2	1691
DeepSeek-R1-Distill-Llama-8B	50.4	80.0	89.1	49.0	39.6	1205
DeepSeek-R1-Distill-Llama-70B	70.0	86.7	94.5	65.2	57.5	1633

5. Chat Website & API Platcreate

You can chat with DeepSeek-R1 on DeepSeek’s official website: chat.proset upseek.com, and switch on the button “DeepThink”

We also provide OpenAI-Compatible API at DeepSeek Platcreate: platcreate.proset upseek.com

Plrelieve visit DeepSeek-V3 repo for more adviseation about running DeepSeek-R1 locassociate.

DeepSeek-R1-Distill Models

DeepSeek-R1-Distill models can be participated in the same manner as Qwen or Llama models.

For instance, you can easily begin a service using vLLM:

vllm serve proset upseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --utilize-willing

You can also easily begin a service using SGLang

python3 -m sglang.begin_server --model proset upseek-ai/DeepSeek-R1-Distill-Qwen-32B --think-far-code --tp 2

NOTE: We recommfinish setting an appropriate temperature (between 0.5 and 0.7) when running these models, otherrational you may come apass rerents with finishless repetition or incoherent output.

This code repository and the model weights are licensed under the MIT License.
DeepSeek-R1 series help commercial participate, permit for any modifications and derivative toils, including, but not confineed to, distillation for training other LLMs. Plrelieve notice that:

DeepSeek-R1-Distill-Qwen-1.5B, DeepSeek-R1-Distill-Qwen-7B, DeepSeek-R1-Distill-Qwen-14B and DeepSeek-R1-Distill-Qwen-32B are derived from Qwen-2.5 series, which are originassociate licensed under Apache 2.0 License, and now finetuned with 800k samples curated with DeepSeek-R1.
DeepSeek-R1-Distill-Llama-8B is derived from Llama3.1-8B-Base and is originassociate licensed under llama3.1 license.
DeepSeek-R1-Distill-Llama-70B is derived from Llama3.3-70B-Instruct and is originassociate licensed under llama3.3 license.

If you have any asks, prent elevate an rerent or communicate us at service@proset upseek.com.

Source connect