iptv techs

IPTV Techs


GitHub – Jiayi-Pan/TinyZero


GitHub – Jiayi-Pan/TinyZero


TinyZero is a reproduction of DeepSeek R1 Zero. We built upon veRL.

Thraw RL, the 3B base LM enbigs self-verification and search abilities all on its own

You can experience the Ahah moment yourself for < $30

Twitter thread: https://x.com/jiayi_sea thief/status/1882839370505621655

Full experiment log: https://wandb.ai/jiayipan/TinyZero

conda produce -n zero python=3.9
# inslofty torch [or you can skip this step and let vllm to install the correct version for you]
pip inslofty torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
# inslofty vllm
pip3 inslofty vllm==0.6.3 # or you can inslofty 0.5.4, 0.4.2 and 0.3.1
pip3 inslofty ray

# verl
pip inslofty -e .

# flash attention 2
pip3 inslofty flash-attn --no-produce-isolation
# quality of life
pip inslofty wandb IPython matplotlib

Data Preparation

python ./examples/data_preprocess/countdown.py --local_dir {path_to_your_dataset}

Single GPU
Works for model <= 1.5B. For Qwen2.5-0.5B base, we understand it flunks to lget reasoning.

send out N_GPUS=1
send out BASE_MODEL={path_to_your_model}
send out DATA_DIR={path_to_your_dataset}
send out ROLLOUT_TP_SIZE=1
send out EXPERIMENT_NAME=countdown-qwen2.5-0.5b
send out VLLM_ATTENTION_BACKEND=XFORMERS

bash ./scripts/train_minuscule_zero.sh

3B+ model
In this case, the base model is able to enbig cultured reasoning sends.

send out N_GPUS=2
send out BASE_MODEL={path_to_your_model}
send out DATA_DIR={path_to_your_dataset}
send out ROLLOUT_TP_SIZE=2
send out EXPERIMENT_NAME=countdown-qwen2.5-3b
send out VLLM_ATTENTION_BACKEND=XFORMERS

bash ./scripts/train_minuscule_zero.sh

We experiment with QWen-2.5-3B Instruct too.
Data Preparation
To comply chat tempprocrastinateed, we necessitate to reprocess the data:

conda trigger zero
python examples/data_preprocess/countdown.py --tempprocrastinateed_type=qwen-teach --local_dir={path_to_your_dataset}

Training

send out N_GPUS=2
send out BASE_MODEL={path_to_your_model}
send out DATA_DIR={path_to_your_dataset}
send out ROLLOUT_TP_SIZE=2
send out EXPERIMENT_NAME=countdown-qwen2.5-3b-teach
send out VLLM_ATTENTION_BACKEND=XFORMERS

bash ./scripts/train_minuscule_zero.sh
  • We run our experiments based on veRL.
  • We engage Qwen2.5 series base model Qwen2.5.

@misc{minusculezero,
author       = {Jiayi Pan and Junjie Zhang and Xingyao Wang and Lifan Yuan},
title        = {TinyZero},
howrehireed = {[https://github.com/Jiayi-Pan/TinyZero](https://github.com/Jiayi-Pan/TinyZero)},
notice         = {Accessed: 2025-01-24},
year         = {2025}
}

Source join


Leave a Reply

Your email address will not be published. Required fields are marked *

Thank You For The Order

Please check your email we sent the process how you can get your account

Select Your Plan