CLI Reference

Complete command-line interface reference

Complete reference for all halo-forge CLI commands and options.

Command Overview

CommandDescription
halo-forge testValidate installation
halo-forge infoShow system information
halo-forge data prepareDownload public datasets
halo-forge data generateGenerate data with LLM
halo-forge data validateValidate dataset format
halo-forge sft trainRun supervised fine-tuning
halo-forge raft trainRun RAFT training
halo-forge benchmark runEvaluate model performance
halo-forge benchmark fullComplete benchmark suite

halo-forge test

Validate your installation at various levels.

halo-forge test [OPTIONS]

Options

OptionTypeDefaultDescription
--levelstringsmokeTest level: smoke, standard, full
--verboseflagfalseShow detailed output
--modelstringQwen/Qwen2.5-Coder-0.5BModel for generation tests

Test Levels

LevelTimeGPUWhat It Tests
smoke5sNoImports, compiler availability, verifier logic
standard2-3 minYesModel loading, code generation, verification
full5 minYesComplete mini-RAFT cycle with training step

Examples

# Quick validation (no GPU needed)
halo-forge test --level smoke

# Standard test with GPU
halo-forge test --level standard

# Full test with verbose output
halo-forge test --level full --verbose

# Test with specific model
halo-forge test --level standard --model Qwen/Qwen2.5-Coder-3B

halo-forge info

Display system and environment information.

halo-forge info [OPTIONS]

Options

OptionTypeDefaultDescription
--gpuflagfalseShow GPU details
--memoryflagfalseShow memory statistics
--allflagfalseShow all available info

Examples

# Basic info
halo-forge info

# GPU details
halo-forge info --gpu

# All information
halo-forge info --all

Sample Output

halo-forge v0.2.0
─────────────────────────────────────────────
Python:     3.13.1
PyTorch:    2.6.0.dev20241201+rocm6.3
GPU:        AMD Radeon Graphics (gfx1151)
ROCm:       /opt/rocm-7.0
Memory:     128GB unified
─────────────────────────────────────────────

halo-forge data prepare

Download and format public datasets for training.

halo-forge data prepare [OPTIONS]

Options

OptionTypeRequiredDescription
--datasetstringYesDataset name to download
--outputpathYesOutput JSONL file path
--templatestringqwenChat template format
--system-promptstring-Custom system prompt
--limitint-Limit number of examples
--listflag-List available datasets

Available Datasets

DatasetLanguageExamplesDescription
codeforces_cppC++~5000Competitive programming
mbppPython974Mostly Basic Programming Problems
humanevalPython164HumanEval benchmark
apps_introPython~5000APPS introductory problems

Examples

# List available datasets
halo-forge data prepare --list

# Download CodeForces C++
halo-forge data prepare \
  --dataset codeforces_cpp \
  --output data/codeforces.jsonl

# Download with limit
halo-forge data prepare \
  --dataset mbpp \
  --output data/mbpp.jsonl \
  --limit 500

# Custom template
halo-forge data prepare \
  --dataset humaneval \
  --output data/humaneval.jsonl \
  --template llama

halo-forge data generate

Generate training data using LLM backends.

halo-forge data generate [OPTIONS]

Options

OptionTypeRequiredDescription
--topicstringYesTopic specification to use
--backendstringYesLLM backend: ollama, deepseek, anthropic, openai
--modelstring-Model name (backend-specific)
--outputpathYesOutput JSONL file path
--templatestringqwenChat template format
--listflag-List available topics

Available Topics

TopicLanguageDescription
python_algorithmsPythonAlgorithm implementations
python_testingPythonTest-driven development
rust_basicsRustRust fundamentals
rust_asyncRustAsync/await patterns
cpp_systemsC++Systems programming
go_concurrencyGoGoroutines and channels

Backend Configuration

Ollama (local, free):

halo-forge data generate \
  --topic python_algorithms \
  --backend ollama \
  --model codellama:13b \
  --output data/generated.jsonl

DeepSeek (API, cheap):

export DEEPSEEK_API_KEY=your_key
halo-forge data generate \
  --topic rust_async \
  --backend deepseek \
  --output data/rust.jsonl

Anthropic (API):

export ANTHROPIC_API_KEY=your_key
halo-forge data generate \
  --topic cpp_systems \
  --backend anthropic \
  --model claude-sonnet-4-20250514 \
  --output data/cpp.jsonl

halo-forge data validate

Validate dataset format and get statistics before training.

halo-forge data validate <file> [OPTIONS]

Options

OptionTypeDefaultDescription
filepathRequiredPath to JSONL file to validate
--preview, -pflagfalseShow preview of examples

Supported Formats

FormatFieldsUse Case
sft{"text": "..."}SFT training (pre-formatted)
prompt_response{"prompt": "...", "response": "..."}Raw data for formatting
prompts_only{"prompt": "..."}RAFT training prompts
messages{"messages": [...]}Chat format

Examples

# Basic validation
halo-forge data validate data/train.jsonl

# With preview of first 3 examples
halo-forge data validate data/train.jsonl --preview

Sample Output

============================================================
DATASET VALIDATION REPORT
============================================================

Status: ✓ VALID
Format: sft

Examples:
  Total:   500
  Valid:   500
  Invalid: 0

Fields Found:
  text:     500
  prompt:   0
  response: 0
  messages: 0

Length Statistics:
  Avg prompt:   1550 chars
  Avg response: 1446 chars
  Max prompt:   3602 chars
  Max response: 6653 chars

============================================================

halo-forge sft train

Run supervised fine-tuning on a dataset.

halo-forge sft train [OPTIONS]

Options

OptionTypeDefaultDescription
--configpath-YAML configuration file
--datapath-Training data JSONL file
--outputpathmodels/sftOutput directory
--modelstringQwen/Qwen2.5-Coder-7BBase model
--epochsint3Number of epochs
--batch-sizeint2Per-device batch size
--lrfloat2e-4Learning rate
--resumepath-Resume from checkpoint

Examples

# Basic SFT
halo-forge sft train \
  --data data/train.jsonl \
  --output models/sft \
  --epochs 3

# With configuration file
halo-forge sft train --config configs/sft.yaml

# Resume training
halo-forge sft train \
  --config configs/sft.yaml \
  --resume models/sft/checkpoint-500

Configuration File

# configs/sft.yaml
model:
  name: Qwen/Qwen2.5-Coder-7B
  trust_remote_code: true
  attn_implementation: eager

data:
  train_file: data/train.jsonl
  validation_split: 0.05
  max_seq_length: 2048

lora:
  r: 16
  alpha: 32
  dropout: 0.05
  target_modules:
    - q_proj
    - k_proj
    - v_proj
    - o_proj

training:
  output_dir: models/sft
  num_train_epochs: 3
  per_device_train_batch_size: 2
  gradient_accumulation_steps: 16
  learning_rate: 2e-4
  warmup_ratio: 0.03
  bf16: true
  gradient_checkpointing: true
  dataloader_num_workers: 0
  dataloader_pin_memory: false

halo-forge raft train

Run RAFT (Reward-rAnked Fine-Tuning) training.

halo-forge raft train [OPTIONS]

Options

OptionTypeDefaultDescription
--configpath-YAML configuration file
--checkpointpath-Starting checkpoint (SFT model)
--modelstringQwen/Qwen2.5-Coder-7BBase model (if no checkpoint)
--promptspathRequiredTraining prompts JSONL
--verifierstringgccVerifier type
--cyclesint5Number of RAFT cycles
--samples-per-promptint8Samples per prompt
--reward-thresholdfloat0.5Minimum reward to keep
--keep-percentfloat0.5Top percentage to keep
--temperaturefloat0.7Generation temperature
--outputpathmodels/raftOutput directory

Verifier Types

VerifierLanguageDescription
gccC/C++GCC compilation
clangC/C++Clang compilation
mingwC/C++Windows cross-compile
msvcC/C++Remote MSVC (requires config)
rustRustCargo build
goGoGo build
humanevalPythonHumanEval tests
mbppPythonMBPP tests

Examples

# Basic RAFT with GCC
halo-forge raft train \
  --model Qwen/Qwen2.5-Coder-7B \
  --prompts data/prompts.jsonl \
  --verifier gcc \
  --cycles 5 \
  --output models/raft

# From SFT checkpoint
halo-forge raft train \
  --checkpoint models/sft/final_model \
  --prompts data/prompts.jsonl \
  --verifier gcc \
  --cycles 5

# Python with MBPP tests
halo-forge raft train \
  --model Qwen/Qwen2.5-Coder-7B \
  --prompts data/rlvr/mbpp_train_prompts.jsonl \
  --verifier mbpp \
  --cycles 5

# Selective filtering (large dataset)
halo-forge raft train \
  --model Qwen/Qwen2.5-Coder-7B \
  --prompts data/prompts.jsonl \
  --verifier gcc \
  --keep-percent 0.2 \
  --reward-threshold 0.5

# High exploration
halo-forge raft train \
  --model Qwen/Qwen2.5-Coder-7B \
  --prompts data/prompts.jsonl \
  --verifier gcc \
  --samples-per-prompt 16 \
  --temperature 0.9

Configuration File

# configs/raft.yaml
sft_checkpoint: models/sft/final_model
output_dir: models/raft
prompts: data/prompts.jsonl

raft:
  num_cycles: 5
  samples_per_prompt: 8
  reward_threshold: 0.5
  keep_top_percent: 0.5

generation:
  max_new_tokens: 1024
  temperature: 0.7
  top_p: 0.95
  batch_size: 4

training:
  epochs: 1
  batch_size: 2
  gradient_accumulation_steps: 16
  learning_rate: 5e-5
  dataloader_num_workers: 0
  dataloader_pin_memory: false

verifier:
  type: gcc
  run_after_compile: false

hardware:
  bf16: true
  gradient_checkpointing: true

halo-forge benchmark run

Evaluate model performance on a benchmark.

halo-forge benchmark run [OPTIONS]

Options

OptionTypeDefaultDescription
--modelpathRequiredModel to evaluate
--promptspathRequiredBenchmark prompts
--verifierstringgccVerifier for evaluation
--samplesint10Samples per problem
--kstring1,5,10k values for pass@k
--temperaturefloat0.7Generation temperature
--outputpath-Output JSON file

Examples

# Basic benchmark
halo-forge benchmark run \
  --model models/raft/cycle_5_final \
  --prompts data/test.jsonl \
  --verifier gcc

# With output file
halo-forge benchmark run \
  --model models/raft/cycle_5_final \
  --prompts data/rlvr/mbpp_validation.jsonl \
  --verifier mbpp \
  --samples 20 \
  --k 1,5,10,20 \
  --output results/benchmark.json

# Compare baseline vs trained
halo-forge benchmark run \
  --model Qwen/Qwen2.5-Coder-7B \
  --prompts data/test.jsonl \
  --output results/baseline.json

halo-forge benchmark run \
  --model models/raft/cycle_5_final \
  --prompts data/test.jsonl \
  --output results/trained.json

Output Format

{
  "model": "models/raft/cycle_5_final",
  "prompts": "data/test.jsonl",
  "num_problems": 100,
  "samples_per_problem": 10,
  "pass_rate": 0.523,
  "pass_at_k": {
    "1": 0.312,
    "5": 0.478,
    "10": 0.523
  },
  "generation_time": 1234.5,
  "verification_time": 45.2
}

halo-forge benchmark full

Run complete benchmark with before/after comparison.

halo-forge benchmark full [OPTIONS]

Options

OptionTypeDefaultDescription
--modelstringQwen/Qwen2.5-Coder-0.5BBase model
--promptspath-Training prompts
--verifierstringgccVerifier type
--cyclesint2RAFT cycles to run
--suitestringsmallBenchmark suite: small, medium, all
--outputpathresults/benchmarkOutput directory

Examples

# Quick validation benchmark
halo-forge benchmark full \
  --model Qwen/Qwen2.5-Coder-0.5B \
  --cycles 2

# Full production benchmark
halo-forge benchmark full \
  --model Qwen/Qwen2.5-Coder-7B \
  --prompts data/rlvr/mbpp_train_prompts.jsonl \
  --verifier mbpp \
  --cycles 5 \
  --output results/production

# All model sizes
halo-forge benchmark full --suite all

Environment Variables

VariableDescription
DEEPSEEK_API_KEYDeepSeek API key for data generation
ANTHROPIC_API_KEYAnthropic API key for data generation
OPENAI_API_KEYOpenAI API key for data generation
HF_TOKENHuggingFace token for private models
HSA_ENABLE_SDMASet to 0 to prevent GPU hangs
PYTORCH_HIP_ALLOC_CONFPyTorch memory allocation config

Exit Codes

CodeDescription
0Success
1General error
2Invalid arguments
3Configuration error
4GPU not available
5Verification failed