How to Train

Complete guide to training code generation models with halo forge

This guide walks you through training a code generation model from scratch using RAFT. Start with the quick start for immediate results, then explore advanced sections for optimization.

TL;DR - Quick Start (10 minutes)

Already have the toolbox built? Run training immediately:

# 1. Enter the toolbox
toolbox enter halo-forge

# 2. Install halo-forge
cd ~/projects/halo-forge && pip install -e .

# 3. Run smoke test
halo-forge test --level smoke

# 4. Start RAFT training (quick validation)
halo-forge raft train \
    --model Qwen/Qwen2.5-Coder-0.5B \
    --prompts data/rlvr/mbpp_train_prompts.jsonl \
    --verifier mbpp \
    --cycles 2 \
    --output models/quick_test

That’s it. Training will begin and produce checkpoints as it progresses.


Prerequisites Checklist

Before training, ensure you have:

Hardware

ComponentMinimumRecommended
GPU24GB VRAM48GB+ (Strix Halo)
RAM32GB64GB+
Storage50GB SSD200GB NVMe
NetworkStable connectionFast for model downloads

Software

PlatformRequirements
FedoraFedora 42+, podman, toolbox
UbuntuUbuntu 22.04+, Docker
Kernel6.16+ (for gfx1151 without parameters)

Part 1: Setup

Option A: Fedora with podman toolbox

# Clone repository
git clone https://github.com/professor-moody/halo-forge.git
cd halo-forge/toolbox

# Build toolbox
./build.sh --no-cache

# Create and enter
toolbox create halo-forge --image localhost/halo-forge:latest
toolbox enter halo-forge

# Install package
cd ~/projects/halo-forge
pip install -e .

Option B: Ubuntu with Docker (Experimental)

Note: Ubuntu/Docker support is experimental. Fedora toolbox is recommended for production.

# Clone repository
git clone https://github.com/professor-moody/halo-forge.git
cd halo-forge/toolbox

# Build Docker image
./build-ubuntu.sh --no-cache

# (If GPU not visible) Add udev rules
sudo tee /etc/udev/rules.d/99-amd-kfd.rules >/dev/null <<'EOF'
SUBSYSTEM=="kfd", GROUP="render", MODE="0666"
SUBSYSTEM=="drm", KERNEL=="card[0-9]*", GROUP="render", MODE="0666"
EOF
sudo udevadm control --reload-rules && sudo udevadm trigger

# Run container
docker run -it --device=/dev/kfd --device=/dev/dri \
  --security-opt seccomp=unconfined \
  -v ~/projects:/workspace \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  halo-forge:ubuntu

# Inside container
cd /workspace/halo-forge
pip install -e .

Verify Setup

# Quick validation (5 seconds, no GPU)
halo-forge test --level smoke

# Standard validation (2-3 minutes, loads model)
halo-forge test --level standard

# Full validation (5 minutes, includes training step)
halo-forge test --level full

Expected output:

============================================================
halo-forge Standard Test
Model: Qwen/Qwen2.5-Coder-0.5B
============================================================

  [OK] Import modules (0.0s)
  [OK] Compiler available (0.0s)
  [OK] GPU available (0.0s)
  [OK] Model loading (1.2s)
  [OK] Code generation (21.6s)
  [OK] Code verification (0.3s)

============================================================
Test Results: 6/6 passed
============================================================

Part 2: Data Preparation

Option A: Use Built-in Sample Data (Quick Start)

halo-forge includes ready-to-use sample datasets for immediate testing. No download required:

Python Datasets (RLVR):

DatasetFileExamplesUse For
MBPP Trainingdata/rlvr/mbpp_train_prompts.jsonl374RAFT training
MBPP Fulldata/rlvr/mbpp_train_full.jsonl374SFT training
MBPP Validationdata/rlvr/mbpp_validation.jsonl50Benchmarking
HumanEvaldata/rlvr/humaneval_full.jsonl164Evaluation

C++ Datasets (Competitive Programming):

DatasetFileExamplesUse For
CodeForces C++data/samples/codeforces_cpp_500.jsonl500Raw prompt/response
CodeForces SFTdata/samples/codeforces_cpp_500_sft.jsonl500SFT training
CodeForces Promptsdata/samples/codeforces_cpp_500_prompts.jsonl500RAFT training

Windows Systems Programming (Curriculum Learning):

DatasetFileExamplesUse For
Full RLVRdatasets/windows_curriculum/windows_systems_full_rlvr.jsonl361RAFT with MinGW/MSVC
Full SFTdatasets/windows_curriculum/windows_systems_full_sft.jsonl361SFT training
Tier Orderdatasets/windows_curriculum/curriculum_order_full.json-Curriculum scheduling

This dataset covers Windows API programming across 4 difficulty tiers:

  • Tier 1 (84): Foundations - basic APIs, file I/O, registry
  • Tier 2 (128): Core APIs - processes, threads, memory, IPC
  • Tier 3 (72): Intermediate - PE parsing, security, services
  • Tier 4 (77): Advanced - native APIs, internals, evasion

Quick test - Windows with MinGW (no Windows machine needed):

# Install MinGW cross-compiler
sudo dnf install mingw64-gcc-c++  # Fedora
# or: sudo apt install mingw-w64  # Ubuntu

# Benchmark with MinGW
halo-forge benchmark run \
  --model Qwen/Qwen2.5-Coder-0.5B \
  --prompts datasets/windows_curriculum/windows_systems_full_rlvr.jsonl \
  --verifier mingw \
  --samples 10 \
  --output results/windows/baseline.json

# RAFT training with MinGW
halo-forge raft train \
  --model Qwen/Qwen2.5-Coder-0.5B \
  --prompts datasets/windows_curriculum/windows_systems_full_rlvr.jsonl \
  --verifier mingw \
  --cycles 3 \
  --output models/windows_raft

Note: MinGW can only verify compilation, not execution. For full verification (compile + run + output check), use MSVC with a Windows build server. See docs/WINDOWS_SETUP.md for setup.

Quick test - Python with MBPP:

# Start RAFT training immediately
halo-forge raft train \
  --model Qwen/Qwen2.5-Coder-0.5B \
  --prompts data/rlvr/mbpp_train_prompts.jsonl \
  --verifier mbpp \
  --cycles 2 \
  --output models/quick_test_python

Quick test - C++ with CodeForces:

# SFT on CodeForces C++
halo-forge sft train \
  --data data/samples/codeforces_cpp_500_sft.jsonl \
  --model Qwen/Qwen2.5-Coder-0.5B \
  --output models/quick_sft_cpp \
  --epochs 1

# Then RAFT with GCC verification
halo-forge raft train \
  --checkpoint models/quick_sft_cpp/final_model \
  --prompts data/samples/codeforces_cpp_500_prompts.jsonl \
  --verifier gcc \
  --cycles 2 \
  --output models/quick_raft_cpp

Validate Your Data

Before training, validate your dataset format:

# Check format and get statistics
halo-forge data validate data/samples/codeforces_cpp_500_sft.jsonl

# With preview of examples
halo-forge data validate data/my_dataset.jsonl --preview

Expected output:

============================================================
DATASET VALIDATION REPORT
============================================================

Status: ✓ VALID
Format: sft

Examples:
  Total:   500
  Valid:   500
  Invalid: 0
...

Option B: Download Public Datasets

# List available datasets
halo-forge data prepare --list

# Download CodeForces C++ examples
halo-forge data prepare \
  --dataset codeforces_cpp \
  --output data/codeforces.jsonl

# Download MBPP Python examples
halo-forge data prepare \
  --dataset mbpp \
  --output data/mbpp.jsonl

Option C: Generate with LLM

# List available topics
halo-forge data generate --list

# Generate with DeepSeek (requires API key)
export DEEPSEEK_API_KEY=your_key
halo-forge data generate \
  --topic python_algorithms \
  --backend deepseek \
  --output data/generated.jsonl

# Generate with local Ollama (free)
halo-forge data generate \
  --topic rust_basics \
  --backend ollama \
  --model codellama:13b \
  --output data/rust.jsonl

Create Prompts File

For RAFT training, you need a JSONL file with prompts:

# Extract prompts from training data
cat data/train.jsonl | python3 -c "
import json, sys
for line in sys.stdin:
    d = json.loads(line)
    prompt = d.get('prompt', d.get('text', ''))[:500]
    if prompt:
        print(json.dumps({'prompt': prompt}))
" > data/prompts.jsonl

Or create manually:

{"prompt": "Write a Python function to calculate factorial"}
{"prompt": "Implement binary search in Python"}
{"prompt": "Write a function to check if a string is palindrome"}

SFT (Supervised Fine-Tuning) creates a baseline before RAFT. While optional if using a pre-trained coder model, SFT is highly recommended for domain-specific training. It helps the model learn your specific code style, patterns, and requirements before RAFT refinement.

Example: Complete SFT Run with CodeForces

Here’s a complete example using CodeForces C++ data — a real competitive programming dataset:

# Step 1: Download CodeForces C++ dataset (~4000 examples)
halo-forge data prepare \
  --dataset codeforces_cpp \
  --output data/codeforces_cpp.jsonl

# Step 2: Run SFT training
halo-forge sft train \
  --data data/codeforces_cpp.jsonl \
  --model Qwen/Qwen2.5-Coder-7B \
  --output models/sft_codeforces \
  --epochs 2

# Step 3: Extract prompts for RAFT
cat data/codeforces_cpp.jsonl | python3 -c "
import json, sys
for line in sys.stdin:
    d = json.loads(line)
    if 'prompt' in d:
        print(json.dumps({'prompt': d['prompt'][:2000]}))
" > data/codeforces_prompts.jsonl

# Step 4: Run RAFT with GCC verification
halo-forge raft train \
  --checkpoint models/sft_codeforces/final_model \
  --prompts data/codeforces_prompts.jsonl \
  --verifier gcc \
  --cycles 5 \
  --output models/raft_codeforces

Available Public Datasets

DatasetCommandLanguageExamplesDescription
codeforces_cpp--dataset codeforces_cppC++~4000Competitive programming
codeforces_python--dataset codeforces_pythonPython~1000Competitive programming
codeforces_rust--dataset codeforces_rustRust~500Competitive programming
mbpp--dataset mbppPython~500Basic programming
humaneval--dataset humanevalPython164Evaluation benchmark

Generate Custom Data with LLM

For domain-specific training, generate examples using LLMs:

# Available topics
halo-forge data generate --list

# Generate Rust async examples (requires DEEPSEEK_API_KEY)
export DEEPSEEK_API_KEY=your_key
halo-forge data generate \
  --topic rust_async \
  --backend deepseek \
  --output data/rust_async.jsonl

# Generate C++ algorithms
halo-forge data generate \
  --topic cpp_algorithms \
  --backend deepseek \
  --output data/cpp_algo.jsonl

# Generate with local Ollama (free, no API key)
halo-forge data generate \
  --topic python_testing \
  --backend ollama \
  --model codellama:13b \
  --output data/python_tests.jsonl
TopicLanguageWhat It Generates
rust_asyncRustAsync/await with tokio
python_testingPythonpytest examples
cpp_algorithmsC++Algorithm implementations
go_concurrencyGoGoroutines, channels

Basic SFT

halo-forge sft train \
  --data data/train.jsonl \
  --output models/sft \
  --epochs 3

SFT with Configuration

Create configs/sft.yaml:

model:
  name: Qwen/Qwen2.5-Coder-7B
  trust_remote_code: true

data:
  train_file: data/train.jsonl
  max_seq_length: 2048

lora:
  r: 16
  alpha: 32
  dropout: 0.05

training:
  output_dir: models/sft
  num_train_epochs: 3
  per_device_train_batch_size: 2
  gradient_accumulation_steps: 16
  learning_rate: 2e-4
  bf16: true
  gradient_checkpointing: true
  
  # Critical for Strix Halo
  dataloader_num_workers: 0
  dataloader_pin_memory: false
halo-forge sft train --config configs/sft.yaml

Part 4: RAFT Training

RAFT (Reward-rAnked Fine-Tuning) improves the model through iterative verification.

Understanding the RAFT Cycle

┌─────────────────────────────────────────────────────┐
│               RAFT TRAINING CYCLE                    │
├─────────────────────────────────────────────────────┤
│                                                      │
│  GENERATE ──► VERIFY ──► FILTER ──► TRAIN           │
│      │           │          │          │            │
│      ▼           ▼          ▼          ▼            │
│  8 samples   Compile    Keep top    Fine-tune       │
│  per prompt  + Test     by reward   on winners      │
│                                                      │
│  ◄─────────── REPEAT 5-6 TIMES ────────────────►    │
└─────────────────────────────────────────────────────┘

Basic RAFT Training

halo-forge raft train \
  --model Qwen/Qwen2.5-Coder-7B \
  --prompts data/rlvr/mbpp_train_prompts.jsonl \
  --verifier mbpp \
  --cycles 5 \
  --output models/raft

RAFT with Custom Checkpoint

# Start from your SFT model
halo-forge raft train \
  --checkpoint models/sft/final_model \
  --prompts data/prompts.jsonl \
  --verifier gcc \
  --cycles 5 \
  --output models/raft

Choosing a Verifier

VerifierLanguageTargetCompileRunRequires
gccC/C++Linux ELFYesYesgcc/g++
clangC/C++Linux ELFYesYesclang/clang++
mingwC/C++Windows PEYesNomingw-w64
msvcC/C++Windows PEYesYesWindows build server
rustRustNativeYesYescargo
goGoNativeYesYesgo
humanevalPythonN/AN/AYes(built-in)
mbppPythonN/AN/AYes(built-in)

All compilation verifiers support binary_cache_dir to save compiled binaries for later analysis.

Monitoring Progress

Watch the training output:

RAFT CYCLE 1/5
==============
Generating samples... 374 prompts × 8 samples
Verifying 2992 samples...
  Passed: 1023 (34.2%)
  Failed: 1969

Filtering samples...
  Kept: 512 samples (top 50% above threshold)

Training on filtered samples...
  Loss: 0.856 → 0.342

Saving checkpoint to models/raft/cycle_1_final/

Key Metrics to Watch:

  • Pass rate: Higher is better - indicates model improvement
  • Loss decrease: Should trend downward across cycles
  • Kept samples: More samples = more training signal

TensorBoard Monitoring

Training automatically logs to TensorBoard. View training curves in real-time:

# In a separate terminal (inside toolbox)
tensorboard --logdir models/raft --port 6006

# If remote, forward the port
ssh -L 6006:localhost:6006 user@your-host

Open http://localhost:6006 in your browser to see:

  • Loss curves — Training loss per step
  • Learning rate — LR schedule over time
  • GPU metrics — Memory and utilization (if available)

TensorBoard logs are saved to:

  • SFT: models/sft/logs/
  • RAFT: models/raft/cycle_N/logs/

When to Stop

Monitor pass rate across cycles:

Cycle 1: 34.2% pass rate
Cycle 2: 42.1% pass rate  (+7.9%)
Cycle 3: 48.5% pass rate  (+6.4%)
Cycle 4: 51.2% pass rate  (+2.7%)
Cycle 5: 52.1% pass rate  (+0.9%)  ← Diminishing returns
Cycle 6: 51.8% pass rate  (-0.3%)  ← Stop here

General guidance:

  • Stop when improvement < 2% per cycle
  • Stop if pass rate decreases
  • In our testing, 5-6 cycles often worked well

Part 5: Benchmarking

Run Benchmark

halo-forge benchmark run \
  --model models/raft/cycle_5_final \
  --prompts data/rlvr/mbpp_validation.jsonl \
  --verifier mbpp \
  --samples 10 \
  --k 1,5,10 \
  --output results/benchmark.json

Compare Models

# Benchmark baseline
halo-forge benchmark run \
  --model Qwen/Qwen2.5-Coder-7B \
  --prompts data/rlvr/mbpp_validation.jsonl \
  --output results/baseline.json

# Benchmark RAFT model
halo-forge benchmark run \
  --model models/raft/cycle_5_final \
  --prompts data/rlvr/mbpp_validation.jsonl \
  --output results/raft.json

# Compare
python3 -c "
import json
for name in ['baseline', 'raft']:
    with open(f'results/{name}.json') as f:
        data = json.load(f)
        print(f'{name}: pass@1={data[\"pass_at_k\"][\"1\"]:.1%}')
"

Full Benchmark Suite

# Run full benchmark with before/after comparison
halo-forge benchmark full \
  --model Qwen/Qwen2.5-Coder-7B \
  --prompts data/rlvr/mbpp_train_prompts.jsonl \
  --verifier mbpp \
  --cycles 5 \
  --output results/full_benchmark

Part 6: Advanced Topics

Filtering Strategies

Control which samples are used for training:

# Keep top 50% of samples above 0.5 reward (default)
halo-forge raft train --keep-percent 0.5 --reward-threshold 0.5 ...

# Selective: top 20% only (large datasets)
halo-forge raft train --keep-percent 0.2 --reward-threshold 0.5 ...

# Inclusive: keep all passing (small datasets)
halo-forge raft train --keep-percent 1.0 --reward-threshold 0.3 ...

Curriculum Learning

Increase difficulty over cycles:

# configs/curriculum.yaml
curriculum_strategy: progressive

cycles:
  - verifier: gcc
    reward_threshold: 0.3
  - verifier: gcc
    reward_threshold: 0.5
  - verifier: gcc
    run_after_compile: true
    reward_threshold: 0.7

Custom Verifier

from halo_forge.rlvr.verifiers import Verifier, VerifyResult

class MyVerifier(Verifier):
    def verify(self, code: str) -> VerifyResult:
        # Your verification logic
        success = your_check(code)
        return VerifyResult(
            success=success,
            reward=1.0 if success else 0.0,
            details="Custom verification"
        )

Hyperparameter Tuning

ParameterDefaultTuning Notes
samples_per_prompt8More = better diversity, slower
temperature0.7Higher = more diverse, lower quality
reward_threshold0.5Higher = stricter filtering
keep_top_percent0.5Lower = more selective
learning_rate5e-5Lower if unstable

Memory Optimization (Strix Halo)

For unified memory systems:

training:
  batch_size: 2
  gradient_accumulation: 16
  bf16: true  # NOT 4-bit (slower on Strix Halo)
  gradient_checkpointing: true
  
  # Critical for unified memory
  dataloader_num_workers: 0
  dataloader_pin_memory: false

Troubleshooting

Low Pass Rate

Symptoms: <20% pass rate, many syntax errors

Solutions:

  1. Check prompt quality - are they asking for complete code?
  2. Lower temperature for more consistent output
  3. Add few-shot examples to prompts
  4. Run SFT first to establish baseline

Training Loss Increasing

Symptoms: Loss goes up after cycle 4-5

Solutions:

  1. Stop training - you’ve peaked
  2. Lower learning rate
  3. Increase reward_threshold to filter stricter
  4. Try learning rate decay

GPU Hang

Symptoms: Training freezes, GPU unresponsive

Solutions:

  1. Ensure dataloader_num_workers: 0
  2. Ensure dataloader_pin_memory: false
  3. Add export HSA_ENABLE_SDMA=0

Out of Memory

Symptoms: CUDA/ROCm OOM errors

Solutions:

  1. Reduce batch_size
  2. Enable gradient_checkpointing
  3. Use smaller model
  4. Reduce max_seq_length

Automatic Resume

RAFT automatically caches progress. If a run crashes:

# Just re-run the same command
halo-forge raft train --cycles 5 --output models/raft

# Output:
# Cycle 1 already complete, skipping...
# Cycle 2 already complete, skipping...
# Loading cached samples... (resumes cycle 3)

See Troubleshooting for more solutions.


Next Steps