Model Support

Supported models for halo-forge training

halo forge supports various causal language models for code generation training.

Qwen2.5-Coder Series (Default)

ModelVRAM UsageNotes
Qwen/Qwen2.5-Coder-0.5B~2 GBFast iteration, testing
Qwen/Qwen2.5-Coder-1.5B~4 GBGood balance
Qwen/Qwen2.5-Coder-3B~8 GBRecommended for quality
Qwen/Qwen2.5-Coder-7B~14 GBBest quality, slower

Qwen2.5-Coder is the default and best-tested model family.

# 3B model (recommended starting point)
halo-forge raft train --model Qwen/Qwen2.5-Coder-3B --prompts data/prompts.jsonl

# 7B model (higher quality)
halo-forge raft train --model Qwen/Qwen2.5-Coder-7B --prompts data/prompts.jsonl

Other Tested Models

FamilyModelsNotes
DeepSeek-Coder1.3B, 6.7B, 33BStrong code generation
CodeLlama7B, 13B, 34BMeta’s code models
StarCoder23B, 7B, 15BBigCode, multi-language

Model Selection

By Use Case

Use CaseRecommended
Quick testingQwen2.5-Coder-0.5B
DevelopmentQwen2.5-Coder-3B
ProductionQwen2.5-Coder-7B
Low VRAM (<8GB)Qwen2.5-Coder-1.5B

By Available VRAM

VRAMRecommended Models
8 GB0.5B - 3B
16 GBUp to 7B
24 GB+7B with larger batches
48 GB+13B - 15B
80 GB+33B+

Unified Memory (AMD APUs)

On AMD APUs like Strix Halo with unified memory:

System RAMUsable for GPURecommended
32 GB~16 GB7B (tight)
64 GB~32 GB7B with headroom
128 GB~80 GB13B-33B

LoRA Configuration

Default LoRA target modules:

target_modules = [
    "q_proj", "k_proj", "v_proj", "o_proj",  # Attention
    "gate_proj", "up_proj", "down_proj"       # MLP
]

These work for Qwen and Llama-style models. Adjust for other architectures.

Adding New Models

  1. Ensure it’s a HuggingFace causal LM (AutoModelForCausalLM)
  2. Check trust_remote_code=True support if needed
  3. Verify LoRA target modules match the architecture
  4. Test with a small prompt set:
halo-forge raft train \
    --model YOUR_MODEL \
    --prompts data/test_prompts.jsonl \
    --cycles 1 \
    --output models/test

Known Issues

  • DeepSeek-Coder V2: May require specific transformers version
  • CodeLlama-34B: Needs multi-GPU or very large VRAM
  • Phi models: Different architecture, may need config adjustments

LiquidAI LFM Models

ModelStatusNotes
LFM2-1.2B, LFM2.5-1.2B-BaseSupportedStandard CausalLM
LFM2.5-VL-1.6BUnsupportedCustom architecture
LFM2.5-Audio-1.5BUnsupportedNon-standard processor

For LFM text models, use standard loading. For VLM/Audio benchmarks, use Qwen2-VL or Whisper instead.