Reasoning Training

Math & Reasoning Training

Status: Experimental (v1.0.0)

Train language models on mathematical reasoning tasks using verified answer checking with SymPy.

Overview

The reasoning module enables RLVR training for mathematical problem-solving. Unlike simple string matching, answers are verified using:

  1. Numeric comparison - Direct float comparison with tolerance
  2. Symbolic equivalence - SymPy-based algebraic comparison
  3. Partial credit - Reward for showing reasoning steps

Supported Tasks

TaskVerifierDatasets
Grade School MathMathVerifierGSM8K
Competition MathMathVerifierMATH, AIME

Quick Start

List Available Datasets

halo-forge reasoning datasets
halo-forge sft datasets  # SFT datasets

Full Pipeline (SFT → RAFT → Benchmark)

# Stage 1: SFT with MetaMathQA
halo-forge reasoning sft \
  --dataset metamath \
  --model Qwen/Qwen2.5-3B-Instruct \
  --max-samples 50000 \
  --output models/reasoning_sft

# Stage 2: RAFT with GSM8K
halo-forge reasoning train \
  --model models/reasoning_sft \
  --dataset gsm8k \
  --cycles 4 \
  --output models/reasoning_raft

# Stage 3: Benchmark
halo-forge reasoning benchmark \
  --model models/reasoning_raft \
  --dataset gsm8k \
  --limit 100

Quick RAFT (Skip SFT)

halo-forge reasoning train \
  --model Qwen/Qwen2.5-7B-Instruct \
  --dataset gsm8k \
  --cycles 4 \
  --output models/reasoning_raft

How It Works

Model Completion
       │
       ▼
┌─────────────────────┐
│  AnswerExtractor    │  Extract final answer
│  - \boxed{}         │  from completion
│  - "The answer is"  │
└─────────┬───────────┘
          │
          ▼
┌─────────────────────┐
│  MathVerifier       │  Compare to expected
│  1. Numeric match   │  answer using multiple
│  2. Symbolic match  │  strategies
└─────────┬───────────┘
          │
          ▼
   VerifyResult
   - success: bool
   - reward: 0.0-1.0

Reward Structure

OutcomeRewardDescription
Correct answer1.0Numeric or symbolic match
Wrong + showed work0.2Reasoning steps present
No answer + work0.2Partial credit
No answer, no work0.1Minimal credit

Dependencies

The reasoning module requires SymPy for symbolic verification:

pip install sympy>=1.12

This is included in the halo-forge toolbox containers.

Next Steps