Megatron Training Pipeline πŸš€οƒ

This repository is based on alibaba/Pai-Megatron-Patch for secondary development, thanks to the original author’s work. This project is used to support Alignment such as SFT and DPO etc. Currently, the operation flow of SFT and DPO is as follows.

Environment Setup πŸ› οΈοƒ

Prerequisites

  • Miniconda/Anaconda

  • Git

  • NVIDIA GPU with CUDA support

Environment Configuration πŸ”§οƒ

bash scripts/train/megatron/env_install.sh

You can also refer to the docker images(dsw-registry.cn-wulanchabu.cr.aliyuncs.com/pai/pai-megatron-patch:24.07) provided in the Pai-Megatron-Patch repository.

SFT/DPO Training Pipeline πŸ”„οƒ

1. Model Weight Conversion (HF β†’ Megatron) πŸ”„οƒ

bash scripts/train/megatron/convert/qwen2_5/convert_hf_to_mcore.sh
  • config in convert_hf_to_mcore.sh

MODEL_SIZE=${MODEL_SIZE:-"3B"}
TP=${TP:-"2"} # Tensor Parallelism
PP=${PP:-"2"} # Pipeline Parallelism
PRECISION=${PRECISION:-"bf16"}
USE_TE=${USE_TE:-"true"} # Default
MG2HF=${MG2HF:-"false"} # Default
HF_CKPT_PATH=${HF_CKPT_PATH:-"Qwen/Qwen2.5-3B-Instruct"} # HF model path
TARGET_CKPT_PATH="./mg_models/Qwen2.5-${MODEL_SIZE}-hf-to-mcore-te-tp${TP}-pp${PP}" # Output Megatron model path

2. SFT/DPO Data Preprocessing πŸ“Šοƒ

bash scripts/train/megatron/preprocess/sft_conv.sh(dpo_conv.sh)
  • config in sft_conv.sh/dpo_conv.sh

INPUT_JSON=${INPUT_JSON:-"./data/dummy_sft.json"} # Input JSON path
DATA_TYPE=${DATA_TYPE:-"conversations"} # Key for conversation data
TOKENIZER=${TOKENIZER:-"Qwen2Tokenizer"} # Tokenizer class
SEQ_LEN=${SEQ_LEN:-4096} # Sequence length
OUTPUT_DATA_PREFIX=${OUTPUT_DATA_PREFIX:-"./data/dummy_sft_mg"} # Output tokenized data path
HF_MODEL_PATH=${HF_MODEL_PATH:-"Qwen/Qwen2.5-3B-Instruct"} # HF model path
EXTRA_VOCAB_SIZE=${EXTRA_VOCAB_SIZE:-293} # Extra vocab size (1.5B: 293, 3B: 293, 7B: 421, 14B: 421, 32B: 421, 72B: 421)
TEMPLATE=${TEMPLATE:-"chatml-idsys"} # Chat Template 

required sft data format:

[
    {
        "conversations": [
            {
                "from": "human",
                "value": "xxx"
            },
            {
                "from": "gpt",
                "value": "xxx"
            }
        ]
    }
]

required dpo data format:

[
    {
      "chosen":[
        {
          "value":"xxx",
          "from":"human"
        },
        {
          "value":"xxx",
          "from":"gpt"
        }
      ],
      "rejected":[
        {
          "value":"xxx",
          "from":"human"
        },
        {
          "value":"xxx",
          "from":"gpt"
        }
      ],
    },
]

3. Training Configuration πŸŽ―οƒ

bash scripts/train/megatron/train/qwen2_5/sft_conv.sh(dpo_conv.sh)
  • Key configurations in sft_conv.sh/dpo_conv.sh:

# Paths
DATASET_PATH=${DATASET_PATH:-"./data/dummy_sft_mg_conversations_maxlen_4096"} # Path prefixed of tokenized data
VALID_DATASET_PATH=${VALID_DATASET_PATH:-"./data/dummy_sft_mg_conversations_maxlen_4096"} # Same as DATASET_PATH for validation
PRETRAIN_CHECKPOINT_PATH=${PRETRAIN_CHECKPOINT_PATH:-"./mg_models/Qwen2.5-3B-hf-to-mcore-te-tp2-pp2"} # Converted Megatron model path
OUTPUT_BASEPATH=${OUTPUT_BASEPATH:-"./checkpoints/sft/"} # Output path for training checkpoints

# Training Hyperparameters
MODEL_SIZE=${MODEL_SIZE:-"3B"}
BATCH_SIZE=${BATCH_SIZE:-4}
GLOBAL_BATCH_SIZE=${GLOBAL_BATCH_SIZE:-16}
LR=${LR:-5e-6}
MIN_LR=${MIN_LR:-0.0}
SEQ_LEN=${SEQ_LEN:-4096}
PAD_LEN=${PAD_LEN:-4096}
EPOCHS=${EPOCHS:-10}

# Parallelism Configuration (Must match with convert_hf_to_mcore.sh)
TP=${TP:-2}
PP=${PP:-2}

4. Model Weight Conversion (Megatron β†’ HF) πŸ”„οƒ

bash scripts/train/megatron/convert/qwen2_5/convert_mcore_to_hf.sh(convert_mcore_to_hf_dpo.sh)
  • config in convert_mcore_to_hf.sh/convert_mcore_to_hf_dpo.sh

MODEL_SIZE=${MODEL_SIZE:-"3B"}
TP=${TP:-"2"} # Tensor Parallelism
PP=${PP:-"2"} # Pipeline Parallelism
MG_MODEL_PATH=${MG_MODEL_PATH:-"./checkpoint/sft/checkpoint"} # Checkpoint Path
HF_CKPT_PATH=${HF_CKPT_PATH:-"Qwen/Qwen2.5-3B-Instruct"} # HF model path
PRECISION=${PRECISION:-"fp32"} # Default
USE_TE=${USE_TE:-"true"} # Default
MG2HF=${MG2HF:-"true"} # Default
TARGET_CKPT_PATH="./hf_models_from_mg/Qwen2.5-${MODEL_SIZE}-hf-to-mcore-te-tp${TP}-pp${PP}" # Output HF model path

Troubleshooting πŸ”§οƒ

Apex Compilation Issues

  1. Check if required GCC/G++ are installed in conda

  2. Verify if installed PyTorch matches environment CUDA version

Note βš οΈοƒ

  • Always ensure CUDA and PyTorch versions are compatible

  • Keep consistent sequence lengths across preprocessing and training

  • Adjust model size configurations according to your hardware capabilities

Supported Models πŸ“šοƒ

  • Qwen2.5 series (1.5B, 3B, 7B, 14B, 32B, 72B)