Documentation

Complete documentation for halo forge RLVR training framework

What is halo forge?

halo forge is an RLVR (Reinforcement Learning from Verifiable Rewards) framework that uses compiler feedback as reward signals for iterative model refinement.

The Problem

Approach	Limitation
SFT only	Distribution mismatch — model outputs differ from training data
RLHF	Expensive human labeling, inconsistent judgments
Self-evaluation	Models hallucinate correctness, signals can be gamed

The Approach

A compiler provides deterministic feedback — objective, reproducible results about code correctness.

Architecture

┌──────────┐    ┌──────────┐    ┌──────────┐    ┌───────────┐
│   Data   │ ─► │   SFT    │ ─► │   RAFT   │ ─► │ Benchmark │
└──────────┘    └──────────┘    └──────────┘    └───────────┘

Data — Gather training examples from public datasets or LLM generation
SFT — Supervised fine-tuning to establish baseline capability
RAFT — Iterative verification loop: generate → verify → filter → train
Benchmark — Evaluate with pass@k metrics

What to Expect

RAFT training typically shows:

Cycle	What Happens
1-2	Largest gains as model learns basic patterns
3-4	Continued improvement at slower rate
5-6	Diminishing returns; monitor for plateau
7+	May see degradation; consider stopping earlier

Results vary significantly based on model, dataset, hardware, and domain. Run benchmarks to measure improvement on your specific use case.

Getting Started

Quick Start — Get running in 30 minutes
Toolbox Setup — Build the container environment
Hardware Notes — Strix Halo configuration

Training Pipeline

How to Train — Complete step-by-step guide (start here!)
Full Pipeline — Complete training workflow
Data Generation — Prepare training data
SFT Training — Supervised fine-tuning
RAFT Training — Reward-ranked fine-tuning
Benchmarking — Evaluate with pass@k
Production Runs — Production training commands

Verifiers

Verifier Overview — Choose your verification strategy
Compile Verifiers — GCC, Clang, MinGW, MSVC
Test Verifiers — pytest, unittest
Execution Verifiers — Test case verification
Multi-Language — Auto-detect language
Custom Verifiers — Build your own

Reference

Command Index — Every command and flag
Configuration — Config file reference
Web UI — Dashboard for training and monitoring
Windows Setup — MSVC build server
Troubleshooting — Common issues

Background

Theory & Research — Research foundations
Graduated Rewards — Partial credit system
Learning Rate Strategies — LR recommendations

Experimental

Features under active development and testing:

Experimental Features — VLM, Audio, Reasoning, Agentic, Inference

Meta

Changelog — Version history
Contributing — How to contribute

Command Index

Complete index of all halo-forge commands and flags

Configuration

Complete configuration reference

Full Pipeline

Complete guide to training a code generation model

Quick Start

Get halo forge running in under 30 minutes

Theory & Research

RLVR paradigm and research foundations

Data Generation

Preparing training data for SFT and RAFT

SFT Training

Supervised fine-tuning to establish baseline capability

Toolbox Setup

Build and configure the halo forge container environment

Troubleshooting

Common issues and solutions

Graduated Rewards

Why partial credit matters for RLVR training

Hardware Notes

Configuration for AMD Strix Halo

RAFT Training

Reward-Ranked Fine-Tuning with compiler verification

Learning Rate Strategies

Experimental learning rate recommendations for RAFT training

Windows Build Server

Configure a Windows machine for MSVC verification

Benchmarking

Evaluate model performance with pass@k metrics

Web UI

Dashboard for training, benchmarking, and monitoring

Model Support

Supported models for halo-forge training

Production Training Runs

Step-by-step commands for training all model sizes on the Windows Systems Programming dataset

Code Datasets

Experimental Features

Features under active development: VLM, Audio, Reasoning, Agentic, Inference

Changelog

All notable changes to halo forge

Contributing

How to contribute to halo forge

How to Train

Complete guide to training code generation models with halo forge

Verifiers

Pluggable verification system for RLVR training