Documentation
Complete documentation for halo forge RLVR training framework
What is halo forge?
halo forge is an RLVR (Reinforcement Learning from Verifiable Rewards) framework that uses compiler feedback as reward signals for iterative model refinement.
The Problem
| Approach | Limitation |
|---|---|
| SFT only | Distribution mismatch — model outputs differ from training data |
| RLHF | Expensive human labeling, inconsistent judgments |
| Self-evaluation | Models hallucinate correctness, signals can be gamed |
The Approach
A compiler provides deterministic feedback — objective, reproducible results about code correctness.
Architecture
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌───────────┐
│ Data │ ─► │ SFT │ ─► │ RAFT │ ─► │ Benchmark │
└──────────┘ └──────────┘ └──────────┘ └───────────┘
- Data — Gather training examples from public datasets or LLM generation
- SFT — Supervised fine-tuning to establish baseline capability
- RAFT — Iterative verification loop: generate → verify → filter → train
- Benchmark — Evaluate with pass@k metrics
What to Expect
RAFT training typically shows:
| Cycle | What Happens |
|---|---|
| 1-2 | Largest gains as model learns basic patterns |
| 3-4 | Continued improvement at slower rate |
| 5-6 | Diminishing returns; monitor for plateau |
| 7+ | May see degradation; consider stopping earlier |
Results vary significantly based on model, dataset, hardware, and domain. Run benchmarks to measure improvement on your specific use case.
Quick Navigation
Getting Started
- Quick Start — Get running in 30 minutes
- Toolbox Setup — Build the container environment
- Hardware Notes — Strix Halo configuration
Training Pipeline
- How to Train — Complete step-by-step guide (start here!)
- Full Pipeline — Complete training workflow
- Data Generation — Prepare training data
- SFT Training — Supervised fine-tuning
- RAFT Training — Reward-ranked fine-tuning
- Benchmarking — Evaluate with pass@k
- Production Runs — Production training commands
Verifiers
- Verifier Overview — Choose your verification strategy
- Compile Verifiers — GCC, Clang, MinGW, MSVC
- Test Verifiers — pytest, unittest
- Execution Verifiers — Test case verification
- Multi-Language — Auto-detect language
- Custom Verifiers — Build your own
Reference
- Command Index — Every command and flag
- Configuration — Config file reference
- Web UI — Dashboard for training and monitoring
- Windows Setup — MSVC build server
- Troubleshooting — Common issues
Background
- Theory & Research — Research foundations
- Graduated Rewards — Partial credit system
- Learning Rate Strategies — LR recommendations
Experimental
Features under active development and testing:
- Experimental Features — VLM, Audio, Reasoning, Agentic, Inference
Meta
- Changelog — Version history
- Contributing — How to contribute
Command Index
Complete index of all halo-forge commands and flags
Configuration
Complete configuration reference
Full Pipeline
Complete guide to training a code generation model
Quick Start
Get halo forge running in under 30 minutes
Theory & Research
RLVR paradigm and research foundations
Data Generation
Preparing training data for SFT and RAFT
SFT Training
Supervised fine-tuning to establish baseline capability
Toolbox Setup
Build and configure the halo forge container environment
Troubleshooting
Common issues and solutions
Graduated Rewards
Why partial credit matters for RLVR training
Hardware Notes
Configuration for AMD Strix Halo
RAFT Training
Reward-Ranked Fine-Tuning with compiler verification
Learning Rate Strategies
Experimental learning rate recommendations for RAFT training
Windows Build Server
Configure a Windows machine for MSVC verification
Benchmarking
Evaluate model performance with pass@k metrics
Web UI
Dashboard for training, benchmarking, and monitoring
Model Support
Supported models for halo-forge training
Production Training Runs
Step-by-step commands for training all model sizes on the Windows Systems Programming dataset
Code Datasets
Experimental Features
Features under active development: VLM, Audio, Reasoning, Agentic, Inference
Changelog
All notable changes to halo forge
Contributing
How to contribute to halo forge
How to Train
Complete guide to training code generation models with halo forge
Verifiers
Pluggable verification system for RLVR training