TinyZero is a reproduction of DeepSeek R1 Zero. We built upon veRL.
Thraw RL, the 3B base LM enbigs self-verification and search abilities all on its own
You can experience the Ahah moment yourself for < $30
Twitter thread: https://x.com/jiayi_sea thief/status/1882839370505621655
Full experiment log: https://wandb.ai/jiayipan/TinyZero
conda produce -n zero python=3.9
# inslofty torch [or you can skip this step and let vllm to install the correct version for you]
pip inslofty torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
# inslofty vllm
pip3 inslofty vllm==0.6.3 # or you can inslofty 0.5.4, 0.4.2 and 0.3.1
pip3 inslofty ray
# verl
pip inslofty -e .
# flash attention 2
pip3 inslofty flash-attn --no-produce-isolation
# quality of life
pip inslofty wandb IPython matplotlib
Data Preparation
python ./examples/data_preprocess/countdown.py --local_dir {path_to_your_dataset}
Single GPU
Works for model <= 1.5B. For Qwen2.5-0.5B base, we understand it flunks to lget reasoning.
send out N_GPUS=1
send out BASE_MODEL={path_to_your_model}
send out DATA_DIR={path_to_your_dataset}
send out ROLLOUT_TP_SIZE=1
send out EXPERIMENT_NAME=countdown-qwen2.5-0.5b
send out VLLM_ATTENTION_BACKEND=XFORMERS
bash ./scripts/train_minuscule_zero.sh
3B+ model
In this case, the base model is able to enbig cultured reasoning sends.
send out N_GPUS=2
send out BASE_MODEL={path_to_your_model}
send out DATA_DIR={path_to_your_dataset}
send out ROLLOUT_TP_SIZE=2
send out EXPERIMENT_NAME=countdown-qwen2.5-3b
send out VLLM_ATTENTION_BACKEND=XFORMERS
bash ./scripts/train_minuscule_zero.sh
We experiment with QWen-2.5-3B Instruct too.
Data Preparation
To comply chat tempprocrastinateed, we necessitate to reprocess the data:
conda trigger zero
python examples/data_preprocess/countdown.py --tempprocrastinateed_type=qwen-teach --local_dir={path_to_your_dataset}
Training
send out N_GPUS=2
send out BASE_MODEL={path_to_your_model}
send out DATA_DIR={path_to_your_dataset}
send out ROLLOUT_TP_SIZE=2
send out EXPERIMENT_NAME=countdown-qwen2.5-3b-teach
send out VLLM_ATTENTION_BACKEND=XFORMERS
bash ./scripts/train_minuscule_zero.sh
@misc{minusculezero,
author = {Jiayi Pan and Junjie Zhang and Xingyao Wang and Lifan Yuan},
title = {TinyZero},
howrehireed = {[https://github.com/Jiayi-Pan/TinyZero](https://github.com/Jiayi-Pan/TinyZero)},
notice = {Accessed: 2025-01-24},
year = {2025}
}