Model Support

Supported models for halo-forge training

halo forge supports various causal language models for code generation training.

Recommended Models

Qwen2.5-Coder Series (Default)

Model	VRAM Usage	Notes
`Qwen/Qwen2.5-Coder-0.5B`	~2 GB	Fast iteration, testing
`Qwen/Qwen2.5-Coder-1.5B`	~4 GB	Good balance
`Qwen/Qwen2.5-Coder-3B`	~8 GB	Recommended for quality
`Qwen/Qwen2.5-Coder-7B`	~14 GB	Best quality, slower

Qwen2.5-Coder is the default and best-tested model family.

# 3B model (recommended starting point)
halo-forge raft train --model Qwen/Qwen2.5-Coder-3B --prompts data/prompts.jsonl

# 7B model (higher quality)
halo-forge raft train --model Qwen/Qwen2.5-Coder-7B --prompts data/prompts.jsonl

Other Tested Models

Family	Models	Notes
DeepSeek-Coder	1.3B, 6.7B, 33B	Strong code generation
CodeLlama	7B, 13B, 34B	Meta’s code models
StarCoder2	3B, 7B, 15B	BigCode, multi-language

Model Selection

By Use Case

Use Case	Recommended
Quick testing	Qwen2.5-Coder-0.5B
Development	Qwen2.5-Coder-3B
Production	Qwen2.5-Coder-7B
Low VRAM (<8GB)	Qwen2.5-Coder-1.5B

By Available VRAM

VRAM	Recommended Models
8 GB	0.5B - 3B
16 GB	Up to 7B
24 GB+	7B with larger batches
48 GB+	13B - 15B
80 GB+	33B+

Unified Memory (AMD APUs)

On AMD APUs like Strix Halo with unified memory:

System RAM	Usable for GPU	Recommended
32 GB	~16 GB	7B (tight)
64 GB	~32 GB	7B with headroom
128 GB	~80 GB	13B-33B

LoRA Configuration

Default LoRA target modules:

target_modules = [
    "q_proj", "k_proj", "v_proj", "o_proj",  # Attention
    "gate_proj", "up_proj", "down_proj"       # MLP
]

These work for Qwen and Llama-style models. Adjust for other architectures.

Adding New Models

Ensure it’s a HuggingFace causal LM (AutoModelForCausalLM)
Check trust_remote_code=True support if needed
Verify LoRA target modules match the architecture
Test with a small prompt set:

halo-forge raft train \
    --model YOUR_MODEL \
    --prompts data/test_prompts.jsonl \
    --cycles 1 \
    --output models/test

Known Issues

DeepSeek-Coder V2: May require specific transformers version
CodeLlama-34B: Needs multi-GPU or very large VRAM
Phi models: Different architecture, may need config adjustments

LiquidAI LFM Models

Model	Status	Notes
LFM2-1.2B, LFM2.5-1.2B-Base	Supported	Standard CausalLM
LFM2.5-VL-1.6B	Unsupported	Custom architecture
LFM2.5-Audio-1.5B	Unsupported	Non-standard processor

For LFM text models, use standard loading. For VLM/Audio benchmarks, use Qwen2-VL or Whisper instead.