Reasoning Training

Math & Reasoning Training

Status: Experimental (v1.0.0)

Train language models on mathematical reasoning tasks using verified answer checking with SymPy.

Overview

The reasoning module enables RLVR training for mathematical problem-solving. Unlike simple string matching, answers are verified using:

Numeric comparison - Direct float comparison with tolerance
Symbolic equivalence - SymPy-based algebraic comparison
Partial credit - Reward for showing reasoning steps

Supported Tasks

Task	Verifier	Datasets
Grade School Math	MathVerifier	GSM8K
Competition Math	MathVerifier	MATH, AIME

Quick Start

List Available Datasets

halo-forge reasoning datasets
halo-forge sft datasets  # SFT datasets

Full Pipeline (SFT → RAFT → Benchmark)

# Stage 1: SFT with MetaMathQA
halo-forge reasoning sft \
  --dataset metamath \
  --model Qwen/Qwen2.5-3B-Instruct \
  --max-samples 50000 \
  --output models/reasoning_sft

# Stage 2: RAFT with GSM8K
halo-forge reasoning train \
  --model models/reasoning_sft \
  --dataset gsm8k \
  --cycles 4 \
  --output models/reasoning_raft

# Stage 3: Benchmark
halo-forge reasoning benchmark \
  --model models/reasoning_raft \
  --dataset gsm8k \
  --limit 100

Quick RAFT (Skip SFT)

halo-forge reasoning train \
  --model Qwen/Qwen2.5-7B-Instruct \
  --dataset gsm8k \
  --cycles 4 \
  --output models/reasoning_raft

How It Works

Model Completion
       │
       ▼
┌─────────────────────┐
│  AnswerExtractor    │  Extract final answer
│  - \boxed{}         │  from completion
│  - "The answer is"  │
└─────────┬───────────┘
          │
          ▼
┌─────────────────────┐
│  MathVerifier       │  Compare to expected
│  1. Numeric match   │  answer using multiple
│  2. Symbolic match  │  strategies
└─────────┬───────────┘
          │
          ▼
   VerifyResult
   - success: bool
   - reward: 0.0-1.0

Reward Structure

Outcome	Reward	Description
Correct answer	1.0	Numeric or symbolic match
Wrong + showed work	0.2	Reasoning steps present
No answer + work	0.2	Partial credit
No answer, no work	0.1	Minimal credit

Dependencies

The reasoning module requires SymPy for symbolic verification:

pip install sympy>=1.12