Documentation

Complete documentation for halo forge RLVR training framework

What is halo forge?

halo forge is an RLVR (Reinforcement Learning from Verifiable Rewards) framework that uses compiler feedback as reward signals for iterative model refinement.

The Problem

ApproachLimitation
SFT onlyDistribution mismatch — model outputs differ from training data
RLHFExpensive human labeling, inconsistent judgments
Self-evaluationModels hallucinate correctness, signals can be gamed

The Approach

A compiler provides deterministic feedback — objective, reproducible results about code correctness.

Architecture

┌──────────┐    ┌──────────┐    ┌──────────┐    ┌───────────┐
│   Data   │ ─► │   SFT    │ ─► │   RAFT   │ ─► │ Benchmark │
└──────────┘    └──────────┘    └──────────┘    └───────────┘
  1. Data — Gather training examples from public datasets or LLM generation
  2. SFT — Supervised fine-tuning to establish baseline capability
  3. RAFT — Iterative verification loop: generate → verify → filter → train
  4. Benchmark — Evaluate with pass@k metrics

What to Expect

RAFT training typically shows:

CycleWhat Happens
1-2Largest gains as model learns basic patterns
3-4Continued improvement at slower rate
5-6Diminishing returns; monitor for plateau
7+May see degradation; consider stopping earlier

Results vary significantly based on model, dataset, hardware, and domain. Run benchmarks to measure improvement on your specific use case.

Quick Navigation

Getting Started

Training Pipeline

Verifiers

Reference

Background

Experimental

Features under active development and testing:

Meta

Command Index

Complete index of all halo-forge commands and flags

Configuration

Complete configuration reference

Full Pipeline

Complete guide to training a code generation model

Quick Start

Get halo forge running in under 30 minutes

Theory & Research

RLVR paradigm and research foundations

Data Generation

Preparing training data for SFT and RAFT

SFT Training

Supervised fine-tuning to establish baseline capability

Toolbox Setup

Build and configure the halo forge container environment

Troubleshooting

Common issues and solutions

Graduated Rewards

Why partial credit matters for RLVR training

Hardware Notes

Configuration for AMD Strix Halo

RAFT Training

Reward-Ranked Fine-Tuning with compiler verification

Learning Rate Strategies

Experimental learning rate recommendations for RAFT training

Windows Build Server

Configure a Windows machine for MSVC verification

Benchmarking

Evaluate model performance with pass@k metrics

Web UI

Dashboard for training, benchmarking, and monitoring

Model Support

Supported models for halo-forge training

Production Training Runs

Step-by-step commands for training all model sizes on the Windows Systems Programming dataset

Public Frontend

Public-facing training, monitor, results, and readiness surface

Code Datasets

Experimental Features

Features under active development: VLM, Audio, Reasoning, Agentic, Inference

Changelog

All notable changes to halo forge

Contributing

How to contribute to halo forge

How to Train

Complete guide to training code generation models with halo forge

Verifiers

Pluggable verification system for RLVR training