Command Index

Complete index of all halo-forge commands and flags

Command Index

Complete reference for all halo forge commands, subcommands, and flags.


Command Hierarchy

halo-forge
├── config
│   └── validate
├── data
│   ├── prepare
│   ├── generate
│   └── validate
├── sft
│   └── train
├── raft
│   └── train
├── benchmark
│   ├── run
│   └── full
├── inference          [EXPERIMENTAL]
│   ├── optimize
│   ├── export
│   └── benchmark
├── vlm                [EXPERIMENTAL]
│   ├── train
│   ├── benchmark
│   └── datasets
├── audio              [EXPERIMENTAL]
│   ├── train
│   ├── benchmark
│   └── datasets
├── reasoning          [EXPERIMENTAL]
│   ├── train
│   ├── benchmark
│   └── datasets
├── agentic            [EXPERIMENTAL]
│   ├── train
│   ├── benchmark
│   └── datasets
├── info
└── test

Core Commands (Production Ready)

halo-forge config validate

Validate a configuration file.

FlagShortTypeRequiredDefaultDescription
config-pathYes-Path to config file
--type-tstringNoautoConfig type: raft, sft, auto
--verbose-vflagNofalseShow config contents
halo-forge config validate configs/raft_windows.yaml
halo-forge config validate configs/sft.yaml --type sft --verbose

halo-forge data prepare

Download and prepare public datasets.

FlagShortTypeRequiredDefaultDescription
--dataset-dstringNo-Dataset name
--output-opathNo-Output file path
--template-stringNoqwenChat template
--system-prompt-stringNo-Override system prompt
--list-flagNofalseList available datasets
halo-forge data prepare --list
halo-forge data prepare --dataset humaneval --output data/humaneval.jsonl
halo-forge data prepare --dataset mbpp --template qwen --system-prompt "You are a Python expert."

halo-forge data generate

Generate training data using LLM.

FlagShortTypeRequiredDefaultDescription
--topic-tstringNo-Topic name
--backend-bstringNodeepseekLLM backend
--model-stringNo-Model name for backend
--output-opathNo-Output file path
--template-stringNoqwenChat template
--list-flagNofalseList available topics
halo-forge data generate --list
halo-forge data generate --topic windows_api --backend deepseek --output data/windows.jsonl

halo-forge data validate

Validate dataset format.

FlagShortTypeRequiredDefaultDescription
file-pathYes-Path to JSONL file
--preview-pflagNofalseShow preview of examples
halo-forge data validate data/training.jsonl
halo-forge data validate data/training.jsonl --preview

halo-forge sft train

Run supervised fine-tuning.

FlagShortTypeRequiredDefaultDescription
--config-cpathNo-Config file path
--model-mstringNoQwen/Qwen2.5-Coder-7BBase model
--data-pathNo-Training data file
--output-opathNomodels/sftOutput directory
--epochs-intNo3Number of epochs
--resume-pathNo-Resume from checkpoint
halo-forge sft train --model Qwen/Qwen2.5-Coder-3B --data data/sft.jsonl --output models/sft_3b
halo-forge sft train --config configs/sft.yaml --resume models/sft/checkpoint-500

halo-forge raft train

Run RAFT (Reward-Ranked Fine-Tuning).

FlagShortTypeRequiredDefaultDescription
--config-cpathNo-Config file path
--model-mstringNoQwen/Qwen2.5-Coder-3BBase model
--checkpoint-pathNo-SFT checkpoint path
--prompts-ppathNo-Prompts file
--output-opathNomodels/raftOutput directory
--cycles-intNo6Number of RAFT cycles
--verifier-stringNogccVerifier type (see below)
--keep-percent-floatNo0.5Keep top X% of passing samples
--reward-threshold-floatNo0.5Minimum reward to pass
--curriculum-stringNononeCurriculum strategy
--reward-shaping-stringNofixedReward shaping strategy
--lr-decay-floatNo0.85LR decay per cycle
--min-lr-floatNo1e-6Minimum learning rate
--system-prompt-stringNo(Windows prompt)System prompt
--host-stringNo-MSVC verifier host
--user-stringNo-MSVC verifier user
--ssh-key-pathNo-MSVC verifier SSH key

Verifier choices: gcc, mingw, msvc, rust, go, dotnet, powershell, auto

Curriculum choices: none, complexity, progressive, adaptive

Reward shaping choices: fixed, annealing, adaptive, warmup

# Basic RAFT training
halo-forge raft train \
  --model Qwen/Qwen2.5-Coder-3B \
  --prompts data/prompts.jsonl \
  --verifier mingw \
  --cycles 6 \
  --output models/raft_3b

# With SFT checkpoint and LR decay
halo-forge raft train \
  --checkpoint models/sft_3b/final \
  --prompts data/prompts.jsonl \
  --verifier auto \
  --lr-decay 0.85 \
  --cycles 6

# With MSVC verifier
halo-forge raft train \
  --prompts data/windows.jsonl \
  --verifier msvc \
  --host 10.0.0.152 \
  --user keys \
  --ssh-key ~/.ssh/win

halo-forge benchmark run

Run pass@k benchmark.

FlagShortTypeRequiredDefaultDescription
--model-mpathYes-Model path
--prompts-ppathYes-Prompts file
--output-opathNo-Output file path
--samples-intNo10Samples per prompt
--k-stringNo1,5,10k values (comma-separated)
--max-prompts-intNo-Max prompts to evaluate
--verifier-stringNogccVerifier type
--base-model-stringNoQwen/Qwen2.5-Coder-7BBase model
--system-prompt-stringNo(Windows prompt)System prompt
--host-stringNo-MSVC host
--user-stringNo-MSVC user
--ssh-key-pathNo-MSVC SSH key
--cross-compile-flagNofalseWindows cross-compile (rust/go)
--run-after-compile-flagNofalseRun after compile
halo-forge benchmark run \
  --model models/raft_3b/cycle_6 \
  --prompts data/test.jsonl \
  --verifier mingw \
  --samples 10 \
  --output results/benchmark.json

halo-forge benchmark full

Run comprehensive RAFT benchmark.

FlagShortTypeRequiredDefaultDescription
--model-mstringNo*-Model to benchmark
--suite-sstringNo*-Predefined suite
--cycles-cintNo2RAFT cycles
--output-opathNoresults/benchmarksOutput directory
--quiet-qflagNofalseMinimal output

*Either --model or --suite is required.

Suite choices: all (0.5B, 1.5B, 3B), small (0.5B), medium (0.5B, 1.5B)

halo-forge benchmark full --model Qwen/Qwen2.5-Coder-0.5B --cycles 2
halo-forge benchmark full --suite all --output results/full_benchmark

halo-forge info

Show hardware and system information.

halo-forge info

halo-forge test

Run pipeline validation tests.

FlagShortTypeRequiredDefaultDescription
--level-lstringNostandardTest level
--model-mstringNoQwen/Qwen2.5-Coder-0.5BModel for testing
--verbose-vflagNofalseVerbose output

Level choices: smoke (no GPU), standard (with GPU), full (with training)

halo-forge test --level smoke
halo-forge test --level standard --verbose
halo-forge test --level full --model Qwen/Qwen2.5-Coder-1.5B

Experimental Commands

These commands are in active development. APIs may change.

halo-forge inference optimize

Optimize model for deployment.

FlagShortTypeRequiredDefaultDescription
--model-mpathYes-Model path
--target-precision-stringNoint4Target precision
--target-latency-floatNo50.0Target latency (ms)
--calibration-data-pathNo-Calibration data JSONL
--output-opathNomodels/optimizedOutput directory

Precision choices: int4, int8, fp16

halo-forge inference optimize \
  --model models/raft_7b/cycle_6 \
  --target-precision int4 \
  --output models/optimized

halo-forge inference export

Export model to deployment format.

FlagShortTypeRequiredDefaultDescription
--model-mpathYes-Model path
--format-fstringYes-Export format
--quantization-qstringNoQ4_K_MGGUF quantization
--output-opathYes-Output path

Format choices: gguf, onnx

GGUF quantization types: Q4_K_M, Q4_K_S, Q8_0, F16

halo-forge inference export \
  --model models/trained \
  --format gguf \
  --quantization Q4_K_M \
  --output models/model.gguf

halo-forge inference benchmark

Benchmark inference latency.

FlagShortTypeRequiredDefaultDescription
--model-mpathYes-Model path
--prompts-ppathNo-Test prompts JSONL
--num-prompts-intNo10Number of prompts
--max-tokens-intNo100Max tokens to generate
--warmup-intNo3Warmup iterations
--measure-memory-flagNofalseMeasure memory usage
halo-forge inference benchmark \
  --model models/optimized \
  --num-prompts 50 \
  --measure-memory

halo-forge vlm train

Train VLM with RAFT.

FlagShortTypeRequiredDefaultDescription
--model-mstringNoQwen/Qwen2-VL-7B-InstructVLM model
--dataset-dstringYes-Dataset name or JSONL
--output-opathNomodels/vlm_raftOutput directory
--cycles-intNo6RAFT cycles
--samples-per-prompt-intNo4Samples per prompt
--perception-weight-floatNo0.3Perception weight
--reasoning-weight-floatNo0.4Reasoning weight
--output-weight-floatNo0.3Output weight
--lr-decay-floatNo0.85LR decay per cycle
--temperature-floatNo0.7Generation temperature
--limit-intNo-Limit dataset samples

Dataset choices: textvqa, docvqa, chartqa, realworldqa, mathvista

halo-forge vlm train \
  --model Qwen/Qwen2-VL-7B-Instruct \
  --dataset textvqa \
  --cycles 6 \
  --output models/vlm_textvqa

halo-forge vlm benchmark

Benchmark VLM on dataset.

FlagShortTypeRequiredDefaultDescription
--model-mpathYes-VLM model path
--dataset-dstringNotextvqaDataset name
--split-stringNovalidationDataset split
--limit-intNo100Limit samples
--output-opathNo-Output file
halo-forge vlm benchmark \
  --model models/vlm_raft/cycle_6 \
  --dataset docvqa \
  --limit 200 \
  --output results/vlm_benchmark.json

halo-forge vlm datasets

List available VLM datasets.

halo-forge vlm datasets

halo-forge audio train

Train audio model with RAFT.

FlagShortTypeRequiredDefaultDescription
--model-mstringNoopenai/whisper-smallAudio model
--dataset-dstringYes-Dataset name
--task-stringNoasrTask: asr, tts, classification
--output-opathNomodels/audio_raftOutput directory
--cycles-intNo4RAFT cycles
--lr-floatNo1e-5Learning rate
--lr-decay-floatNo0.85LR decay per cycle
--limit-intNo-Limit dataset samples

Dataset choices: librispeech, common_voice, audioset, speech_commands

halo-forge audio train \
  --model openai/whisper-small \
  --dataset librispeech \
  --task asr \
  --cycles 4 \
  --output models/audio_asr

halo-forge audio benchmark

Benchmark audio model on dataset.

FlagShortTypeRequiredDefaultDescription
--model-mpathYes-Audio model path
--dataset-dstringNolibrispeechDataset name
--task-stringNoasrTask type
--limit-intNo100Limit samples
--output-opathNo-Output file
halo-forge audio benchmark \
  --model openai/whisper-small \
  --dataset librispeech \
  --limit 50 \
  --output results/audio_benchmark.json

halo-forge audio datasets

List available audio datasets.

halo-forge audio datasets

halo-forge reasoning train

Train on math/reasoning datasets with RAFT.

FlagShortTypeRequiredDefaultDescription
--model-mstringNoQwen/Qwen2.5-7B-InstructBase model
--dataset-dstringYes-Dataset name
--output-opathNomodels/reasoning_raftOutput directory
--cycles-intNo4RAFT cycles
--lr-floatNo1e-5Learning rate
--lr-decay-floatNo0.85LR decay per cycle
--limit-intNo-Limit dataset samples

Dataset choices: gsm8k, math, aime

halo-forge reasoning train \
  --model Qwen/Qwen2.5-7B-Instruct \
  --dataset gsm8k \
  --cycles 4 \
  --output models/reasoning_gsm8k

halo-forge reasoning benchmark

Benchmark on math/reasoning dataset.

FlagShortTypeRequiredDefaultDescription
--model-mpathYes-Model path
--dataset-dstringNogsm8kDataset name
--limit-intNo100Limit samples
--output-opathNo-Output file
halo-forge reasoning benchmark \
  --model Qwen/Qwen2.5-7B-Instruct \
  --dataset gsm8k \
  --limit 100 \
  --output results/reasoning_benchmark.json

halo-forge reasoning datasets

List available math/reasoning datasets.

halo-forge reasoning datasets

Agentic Commands (Experimental)

halo-forge agentic train

Train on tool calling datasets with RAFT.

FlagShortTypeRequiredDefaultDescription
--model-mstringNoQwen/Qwen2.5-7B-InstructBase model
--dataset-dstringNoxlamDataset: xlam, glaive
--output-opathNomodels/agentic_raftOutput directory
--cycles-intNo5RAFT cycles
--lr-floatNo5e-5Learning rate
--lr-decay-floatNo0.85LR decay per cycle
--limit-intNo-Limit dataset samples
--dry-run-flagNofalseValidate config only
halo-forge agentic train \
  --model Qwen/Qwen2.5-7B-Instruct \
  --dataset xlam \
  --cycles 5 \
  --output models/agentic_raft

halo-forge agentic benchmark

Benchmark tool calling accuracy.

FlagShortTypeRequiredDefaultDescription
--model-mstringNoQwen/Qwen2.5-7B-InstructModel to benchmark
--dataset-dstringNoxlamDataset: xlam, glaive
--limit-intNo100Limit samples
--output-opathNo-Output file
halo-forge agentic benchmark \
  --model Qwen/Qwen2.5-7B-Instruct \
  --dataset xlam \
  --limit 100 \
  --output results/agentic_benchmark.json

halo-forge agentic datasets

List available tool calling datasets.

halo-forge agentic datasets

Output:

Available Agentic / Tool Calling Datasets
============================================================
  xlam         [Tool Calling] - 60k verified, 3,673 APIs
  glaive       [Tool Calling] - 113k samples, irrelevance
  toolbench    [Tool Calling] - 188k samples, 16k APIs
  hermes       [Tool Calling] - Format reference

Exit Codes

CodeDescription
0Success
1General error
2Invalid arguments
3Configuration error
4GPU not available
5Verification failed

Environment Variables

VariableDescription
PYTORCH_HIP_ALLOC_CONFROCm memory configuration
HF_HOMEHuggingFace cache directory
CUDA_VISIBLE_DEVICESGPU selection
HIP_VISIBLE_DEVICESAMD GPU selection

Quick Reference

Most Common Commands

# Test installation
halo-forge test --level smoke

# Train with RAFT
halo-forge raft train --prompts data/prompts.jsonl --verifier mingw --cycles 6

# Benchmark model
halo-forge benchmark run --model models/raft/cycle_6 --prompts data/test.jsonl

# Show info
halo-forge info

Verifier Quick Reference

VerifierLanguageCross-compileRequires
gccC/C++Nogcc installed
mingwC/C++Yes (Windows PE)mingw-w64
msvcC/C++Yes (Windows)SSH to Windows
rustRustYes (Windows)rustc, cargo
goGoYes (Windows)go installed
dotnetC#Yes (Windows PE)dotnet-sdk
powershellPowerShellNopwsh
autoMulti-langVariesDepends on detected