Verifiers

Pluggable verification system for RLVR training

Verifiers are the heart of RLVR — they provide the reward signal that guides training.

Built-in Verifiers

Compilation Verifiers

VerifierLanguageTargetCompileRunCross-Compile
GCCVerifierC/C++Linux ELFYesYes-
ClangVerifierC/C++Linux ELFYesYes-
MinGWVerifierC/C++Windows PEYesNo-
RemoteMSVCVerifierC/C++Windows PEYesYesRequires Windows server
RustVerifierRustNative/WindowsYesYesx86_64-pc-windows-gnu
GoVerifierGoNative/WindowsYesYesGOOS=windows
DotNetVerifierC#Windows PEYesNowin-x64
PowerShellVerifierPowerShellScriptSyntaxNo-

Test Verifiers

VerifierLanguageUse Case
PytestVerifierPythonCode with tests
UnittestVerifierPythonunittest format
HumanEvalVerifierPythonHumanEval benchmark
MBPPVerifierPythonMBPP benchmark
SubprocessVerifierAnyCustom commands

Basic Usage

from halo_forge.rlvr.verifiers import GCCVerifier

verifier = GCCVerifier()
result = verifier.verify(code)

print(result.success)   # True/False
print(result.reward)    # 0.0 - 1.0
print(result.details)   # Human-readable message

Graduated Rewards

Binary rewards create sparse gradients. halo-forge uses graduated rewards:

OutcomeRewardSignal
Syntax error0.0Completely wrong
Compiles with warnings0.3Close but imperfect
Compiles clean0.5Correct syntax
Runs without crash0.7Executable
Correct output1.0Fully correct
from halo_forge.rlvr.verifiers import RewardLevel

# Get reward from compile result
reward = RewardLevel.from_compile_result(success=True, has_warnings=False)
# Returns 0.5

# Get reward from execution result
reward = RewardLevel.from_execution_result(
    compiles=True, 
    runs=True, 
    correct=False
)
# Returns 0.7

Batch Verification

Verify multiple samples in parallel:

verifier = GCCVerifier(max_workers=8)
codes = [code1, code2, code3, ...]

results = verifier.verify_batch(codes)  # Parallel execution

for result in results:
    print(f"{result.reward}: {result.details}")

With RAFT Training

from halo_forge.rlvr import RAFTTrainer
from halo_forge.rlvr.verifiers import GCCVerifier

verifier = GCCVerifier(max_workers=8)

trainer = RAFTTrainer(
    verifier=verifier,
    sft_checkpoint="models/sft/final_model"
)

trainer.run(prompts, num_cycles=5)

Verifier Architecture

                         Verifier (base class)
                                │
        ┌───────────────────────┼───────────────────────┐
        │                       │                       │
  CompileVerifier          TestVerifier           CustomVerifier
        │                       │
   ┌────┼────┬────┬────┐   ┌────┴────┐
   │    │    │    │    │   │         │
  GCC MinGW Clang Rust Go Pytest  Unittest
              │
         RemoteMSVC
         DotNet
         PowerShell

Chaining Verifiers

Run multiple verification stages:

from halo_forge.rlvr.verifiers import ChainedVerifier, GCCVerifier

verifier = ChainedVerifier([
    GCCVerifier(),                        # Stage 1: Compile
    GCCVerifier(run_after_compile=True),  # Stage 2: Run
])

result = verifier.verify(code)
# Stops at first failure, accumulates rewards

Cleanup

Always cleanup resources:

verifier = GCCVerifier()

try:
    results = verifier.verify_batch(codes)
finally:
    verifier.cleanup()

# Or use context manager
with GCCVerifier() as verifier:
    results = verifier.verify_batch(codes)