Web UI
Dashboard for training, benchmarking, and monitoring
The halo-forge web interface provides a modern dashboard for training, benchmarking, and monitoring LLM fine-tuning jobs.
Quick Start
# Launch the UI
halo-forge ui
# Custom host/port
halo-forge ui --host 0.0.0.0 --port 8888
# With auto-reload for development
halo-forge ui --reload
# Headless-safe (default behavior)
halo-forge ui --no-browser
# Explicit browser auto-open
halo-forge ui --open-browser
The UI will be available at http://127.0.0.1:8080 by default.
Startup logs print canonical route URLs (/, /training, /benchmark, /inference).
Pages Overview
Dashboard (/)
The main landing page showing:
- Training Launcher: Primary start controls for SFT/RAFT/modality training
- GPU Status: Real-time GPU utilization
- Active Jobs: Currently running training/benchmark jobs
- Completed/Failed Counts: Job statistics
- Training History Chart: Loss curves from recent runs
- Benchmark Scores Chart: Pass@1 comparisons across models
- Recent Runs: Quick access to completed jobs
- Advanced Diagnostics Tools (Optional): status summary with direct link to
/research-hub
Training (/training)
Configure and launch training jobs:
- Quickstart mode (default): minimal required inputs for first successful run
- Advanced mode: full tuning controls when needed
- Guided onboarding panel:
Start Heresummary of required fields and first-run defaults - Preflight launch checks: structured input/path checks before spawn (
errors,warnings,resolved paths, suggested fixes) - Output scaffold: one-click
Create output scaffoldfor missing output directories - Setup advisory (non-blocking): diagnostics status is informational; invalid form inputs are the only launch blockers
- Diagnostics actions: available in Advanced Diagnostics Tools only
SFT (Supervised Fine-Tuning)
- Model selection (HuggingFace or local path)
- Dataset selection (Alpaca, MetaMath, GSM8K, xLAM)
- Training hyperparameters (epochs, batch size, learning rate)
- LoRA configuration (rank, alpha)
- Gradient checkpointing toggle
RAFT (Reward-Ranked Fine-Tuning)
- Preset configurations: Conservative, Aggressive, Custom
- Verifier selection (HumanEval, MBPP, LiveCodeBench, Math)
- RAFT-specific parameters (cycles, samples per prompt, temperature, keep percent)
- Reward threshold configuration
Benchmark (/benchmark)
Purpose: Run standardized benchmarks for model comparison (not training).
Note: This page is for benchmark reporting — comparing your trained model to published results. For training verification, use the Verifiers page to test the native training verifiers.
| Type | Benchmarks | Models |
|---|---|---|
| Code | HumanEval, MBPP, LiveCodeBench | Qwen2.5-Coder, DeepSeek-Coder |
| VLM | TextVQA, DocVQA, ChartQA | Qwen2-VL, LLaVA, Phi-3-Vision |
| Audio | LibriSpeech, CommonVoice | Whisper (tiny to large-v3) |
| Reasoning | GSM8K, MATH | Qwen2.5-Instruct, Mistral-Instruct |
| Agentic | xLAM Function Calling | Qwen2.5-Instruct, Mistral-Instruct |
Uses community tools: VLM benchmarks use VLMEvalKit when available for standardized, comparable results.
Features:
- Model autocomplete with popular presets
- Sample limit slider
- Custom output directory
- One-click launch with redirect to Monitor
- Quickstart mode + optional Advanced mode
- Setup advisory (non-blocking), with advanced checks moved to Advanced Diagnostics Tools
Monitor (/monitor)
Real-time job monitoring with:
- Live Duration Counter: Updates every second
- Job-Type-Aware Progress:
- Training: epoch/cycle + step progress
- Benchmark:
evaluated / totalprogress - Inference/utility/diagnostics: explicit indeterminate progress when no true denominator exists
- Job-Type Panels:
- Training: loss/update metrics
- Benchmark: evaluation metrics (
pass@k,pass_rate, output path) - Inference/Utility/Diagnostics: run-specific status and artifact fields
- Durable Log Continuity: streamed lines are persisted to
job.log_file_path, so refresh/reopen keeps log history - Stop Button: idempotent stop with safe terminal-state handling
- Failure recovery panel: concise actions for failed/stopped runs (
Fix input,Re-open launch form,Retry with same config)
Config (/config)
YAML configuration editor:
- Syntax highlighting
- Schema validation (checks for valid halo-forge config keys)
- Save to file
- Template presets
- All-module readiness banner for config contract status
Verifiers (/verifiers)
Purpose: Test native training verifiers — these provide reward signals for RAFT, not benchmark scores.
Note: Verifiers are training infrastructure. They provide graduated rewards (0.0 to 1.0) for the RAFT training loop. For final model evaluation with comparable metrics, use the Benchmark page.
Available verifiers:
- HumanEval (Python): HumanEval test suite verification
- MBPP (Python basics): Mostly Basic Python Problems tests
- Execution (Multi-language): Compile + run verification
- Math (Numerical): Answer extraction and numeric comparison
- GSM8K (Grade-school math): Math reasoning verification
Interactive testing:
- Select a verifier
- Enter code snippet
- Click “Run Verification”
- See graduated reward result (not just pass/fail)
Datasets (/datasets)
Browse available datasets:
- Public datasets from HuggingFace
- Local JSONL files
- Preview samples
- Filter by type/source
Results (/results)
Run results table:
- Training, benchmark, inference, utility, and diagnostics outputs
- Multi-select for comparison
- Sort by any column
- Export to JSON/CSV
- Diagnostics runs are hidden by default behind the advanced toggle
Inference (/inference)
Launch inference optimize and benchmark jobs from the UI:
inference optimizelaunch contract (precision/latency/calibration)inference benchmarklaunch contract (prompts, tokens, warmup)- Durable launch context + monitor/relaunch parity
- Quickstart mode + optional Advanced mode
- Setup advisory (non-blocking), with advanced checks moved to Advanced Diagnostics Tools
Benchmark Advanced (/benchmark-advanced)
Batch orchestration for non-code benchmark runs:
- VLM/audio/reasoning/agentic batch launch
- Per-domain dataset selection
- Monitor handoff to first launched job
- All-module readiness banner for non-code benchmark contract status
Advanced Diagnostics Tools (/research-hub)
Cross-module ops readiness visibility:
- Reads canonical ops readiness report when available
- Reads canonical all-module readiness report when available:
results/readiness/all_modules_readiness.v1.json
- Reads canonical all-module qualification report when available:
results/readiness/all_module_qualification.v1.json
- Reads canonical all-module bootstrap report when available:
results/readiness/all_module_bootstrap.v1.json
- Reads canonical all-module live execution report when available:
results/readiness/all_module_live_execution.v1.json
- Falls back to live contract checks when report missing/corrupt
- Shows actionable pass/warn/fail evidence by module
- Shows optional dataset burn-in provenance when available:
burnin_report_presentburnin_generated_atburnin_status
- Shows qualification lifecycle provenance when available:
qualification_report_presentqualification_generated_atqualification_statusqualification_profile
- Shows bootstrap evidence-generation provenance when available:
bootstrap_report_presentbootstrap_generated_atbootstrap_statusbootstrap_profile
- Shows live execution provenance when available:
live_report_presentlive_generated_atlive_statuslive_profile
- Supports
Generate setup artifactsaction (tracked, non-blocking job in Monitor) - Supports
Run system health checkaction (tracked, non-blocking job in Monitor) - Supports module-level
Generate Setup Artifacts (Advanced)actions to bootstrap evidence roots on demand - Supports module-level
Run System Health Check (Advanced)actions for bounded per-module execution checks - Supports
Run setup checkaction (tracked, non-blocking job in Monitor)
Deep-Link Query Parameters
UI routes support deterministic preselection via query params:
/training?mode=<sft|raft|vlm|audio|reasoning|agentic>&ui_mode=<quickstart|advanced>&preset=<name>/benchmark?view=<code|non_code>&ui_mode=<quickstart|advanced>&preset=<name>/benchmark-advanced?domains=<csv>(example:domains=vlm,audio)/inference?mode=<optimize|benchmark>&ui_mode=<quickstart|advanced>&preset=<name>/ops-console?module=<config|data|info|plot>&execution_mode=<contract|live>/research-hub?module=<module_key>
Unknown values are ignored safely and pages fall back to default selections.
Readiness Semantics (Warn-and-Launch)
UI readiness is contract-based and does not block launches for missing historical evidence.
- PASS: Required contracts are healthy.
- WARN: Evidence missing/stale; launch is still allowed.
- FAIL: Contract/preflight issue. Launch may be blocked when
launch_blocked=true.
Banner wording:
Evidence missing (non-blocking)means files like priortraining_summary.jsonor benchmark outputs were not found yet.Setup check not satisfied (advanced diagnostics)means setup checks found issues; users can still correct form inputs and run training.Qualification issuemeans an explicit lifecycle check failed in qualification mode (separate from normal launch readiness).Bootstrap issuemeans evidence generation encountered a contract/probe failure.Live probe issuemeans bounded live probe execution failed for the module in the selected profile.
Advanced remediation actions are available in Advanced Diagnostics Tools:
Generate setup artifactsfor bounded bootstrap artifacts.Run system health checkfor bounded live command probes.Run setup checkfor bounded contract checks.
Architecture
Services Layer
The UI uses a services architecture that connects NiceGUI pages to halo-forge backends:
UI Pages (Dashboard, Training, Monitor, ...)
│
▼
UI Services (TrainingService, BenchmarkService, HardwareMonitor, ...)
│
▼
halo-forge Core (CLI Commands, Trainers, Verifiers)
Event Bus
Real-time updates are powered by an event bus system:
JOB_CREATED,JOB_STARTED,JOB_COMPLETED,JOB_FAILED,JOB_STOPPEDMETRICS_UPDATE: Loss, learning rate, step progressLOG_LINE: Streaming log outputGPU_UPDATE: Real-time GPU utilizationCHECKPOINT_SAVED: Checkpoint save notifications
Pages subscribe to events and update UI elements without polling.
State Management
Job state is managed centrally in ui/state.py:
- Job creation and tracking
- Metrics history for charts
- Progress tracking (epoch, step, cycle)
AMD Strix Halo Optimization
The UI automatically applies optimized environment variables for AMD Strix Halo:
HSA_OVERRIDE_GFX_VERSION=11.5.1
PYTORCH_ROCM_ARCH=gfx1151
HIP_VISIBLE_DEVICES=0
PYTORCH_HIP_ALLOC_CONF=backend:native,expandable_segments:True,...
HSA_ENABLE_SDMA=0
These are set when launching any training or benchmark subprocess.
Customization
Theme Colors
Colors are defined in ui/theme.py:
COLORS = {
"bg_primary": "#0f1318",
"bg_secondary": "#161b22",
"bg_card": "#1c2128",
"primary": "#7C9885", # Sage green
"secondary": "#8BA888",
"accent": "#9BC4A8",
"success": "#7C9885",
"running": "#4C9AFF",
"error": "#F85149",
...
}
Adding New Pages
- Create page component in
ui/pages/ - Add route in
ui/app.py - Add navigation item in
ui/components/sidebar.py
Feature Flags (Default-On Kill Switches)
The following pages are enabled by default:
/inference/benchmark-advanced/research-hub
Disable any page with env vars:
HALO_UI_ENABLE_INFERENCE_PAGE=0
HALO_UI_ENABLE_BENCHMARK_ADVANCED_PAGE=0
HALO_UI_ENABLE_RESEARCH_HUB_PAGE=0
Accepted false values: 0, false, no, off.
Troubleshooting
“gio: Operation not supported”
Use halo-forge ui --no-browser (default) in headless environments.
Only use --open-browser when desktop browser integration is available.
Burn-in provenance unavailable
If burn-in status is unavailable in Dashboard or Advanced Diagnostics Tools, generate the report:
python3 scripts/run_ops_dataset_burnin.py \
--burnin-profile tiny-v1 \
--write-report \
--report-file results/readiness/ops_dataset_burnin.v1.json
All-module readiness unavailable
If coding/non-coding readiness is unavailable in Dashboard or Advanced Diagnostics Tools, generate the canonical report:
python3 scripts/run_all_module_matrix.py \
--fixture-pack v1 \
--write-report \
--report-file results/readiness/all_modules_readiness.v1.json
All-module qualification unavailable
If qualification status is unavailable in Dashboard or Advanced Diagnostics Tools, generate the canonical qualification report:
python3 scripts/run_all_module_qualification.py \
--qualification-profile fixture-v1 \
--fixture-pack v1 \
--write-report \
--report-file results/readiness/all_module_qualification.v1.json
All-module live execution unavailable
If live execution status is unavailable in Dashboard or Advanced Diagnostics Tools, generate the canonical live report:
python3 scripts/run_all_module_live_matrix.py \
--live-profile live-smoke-v1 \
--write-report \
--report-file results/readiness/all_module_live_execution.v1.json
Duration/Progress not updating
Ensure the training process is emitting progress to stdout. The MetricsParser looks for patterns like:
Epoch X/YStep X/Yloss: X.XXXlr: X.XXe-XX
GPU not detected
Check that ROCm is properly installed and rocm-smi is accessible.