Testing Guide

Audio Module Testing Guide

Step-by-step guide to testing the halo forge Audio training module.

Prerequisites

Required Dependencies

pip install torch transformers torchaudio

Optional Dependencies

DependencyRequired ForInstall
torchaudioAudio processingpip install torchaudio
librosaFeature extractionpip install librosa
jiwerWER calculationpip install jiwer
speechbrainSpeaker embeddingspip install speechbrain

Hardware Requirements

  • Minimum: 8 GB RAM, 4 GB VRAM (for whisper-tiny)
  • Recommended: 16 GB RAM, 8 GB VRAM (for whisper-small)
  • ROCm Support: AMD Radeon RX 7900 series or higher

Unit Tests

Run the full audio test suite:

# From project root
cd /path/to/halo-forge

# Run all audio tests
pytest tests/test_audio.py -v

# Run specific test class
pytest tests/test_audio.py::TestASRChecker -v
pytest tests/test_audio.py::TestAudioRAFTTrainer -v

Expected Test Coverage

Test ClassTestsDescription
TestAudioSample3AudioSample dataclass
TestAudioProcessor3Audio loading and processing
TestASRChecker5WER/CER verification
TestAudioClassificationChecker4Classification verification
TestTTSChecker3TTS quality verification
TestAudioVerifier3Multi-task verifier
TestWhisperAdapter2Whisper model adapter
TestAudioRAFTTrainer2Trainer configuration
TestIntegration3End-to-end tests

Dry Run Testing

Test commands without actual training:

# Test CLI parsing and config validation
halo-forge audio train \
  --model openai/whisper-small \
  --dataset librispeech \
  --cycles 3 \
  --output /tmp/audio-test \
  --dry-run

# Expected output:
# Dry run mode - validating configuration only
# 
# Dependencies:
#   ✓ torchaudio
#   ✓ jiwer
# 
# ✓ Dataset: librispeech
# 
# Configuration validated successfully.

Manual Testing Steps

1. Test Dataset Loading

# List available datasets
halo-forge audio datasets

# Expected output:
# Available Audio Datasets
# ============================================================
#   librispeech        [ASR           ] - Clean audiobook speech (960h)
#   common_voice       [ASR           ] - Crowdsourced multilingual (2000h+)
#   audioset           [Classification] - Sound event detection (5M clips)
#   speech_commands    [Classification] - Keyword spotting (105k)

Test loading in Python:

from halo_forge.audio.data import load_audio_dataset

# Load small subset
dataset = load_audio_dataset("librispeech", limit=10)
print(f"Loaded {len(dataset)} samples")

for sample in dataset:
    print(f"  Text: {sample.text[:50]}...")
    print(f"  Duration: {sample.duration:.2f}s")

2. Test ASR Verification

from halo_forge.audio.verifiers.asr import ASRChecker

checker = ASRChecker(wer_threshold=0.3)

# Test perfect match
result = checker.verify("hello world", "hello world")
print(f"Perfect match - WER: {result.wer:.1%}, Reward: {result.reward:.2f}")

# Test partial match
result = checker.verify("hello word", "hello world")
print(f"Partial match - WER: {result.wer:.1%}, Reward: {result.reward:.2f}")

# Test mismatch
result = checker.verify("foo bar", "hello world")
print(f"Mismatch - WER: {result.wer:.1%}, Reward: {result.reward:.2f}")

3. Test Classification Verification

from halo_forge.audio.verifiers.classification import AudioClassificationChecker

checker = AudioClassificationChecker()

# Correct classification
result = checker.verify("dog", "dog")
print(f"Correct - Reward: {result.reward}")

# Wrong classification
result = checker.verify("cat", "dog")
print(f"Wrong - Reward: {result.reward}")

# With label aliases
checker_alias = AudioClassificationChecker(
    label_aliases={"dog": ["canine", "puppy"]}
)
result = checker_alias.verify("canine", "dog")
print(f"Alias match - Reward: {result.reward}")

4. Test Audio Benchmark

# Benchmark Whisper on LibriSpeech
halo-forge audio benchmark \
  --model openai/whisper-tiny \
  --dataset librispeech \
  --limit 20

# Expected output:
# Audio Benchmark
# ============================================================
# Model: openai/whisper-tiny
# Dataset: librispeech
# Task: asr
# Limit: 20
# 
# Results:
#   Samples: 20
#   Success rate: XX.X%
#   Average reward: X.XXX
#   Average WER: XX.X%

5. Test Model Adapters

from halo_forge.audio.models import get_audio_adapter
import numpy as np

# Create adapter
adapter = get_audio_adapter("openai/whisper-tiny")

# Load model (this downloads the model)
adapter.load()

# Test transcription
audio = np.random.randn(16000) * 0.1  # 1 second noise
result = adapter.transcribe(audio)
print(f"Transcription: {result.text}")

Validation Checklist

After running tests, verify:

  • All unit tests pass (pytest tests/test_audio.py)
  • Dry run mode validates configuration
  • Dataset listing works (halo-forge audio datasets)
  • ASRChecker calculates WER correctly
  • ClassificationChecker matches labels
  • Model adapters load and transcribe
  • Benchmark runs on small dataset

Troubleshooting

“torchaudio not installed”

This is required for audio processing:

pip install torchaudio

For ROCm, ensure compatible version:

pip install torchaudio --index-url https://download.pytorch.org/whl/rocm6.1

“jiwer not installed”

Required for accurate WER calculation:

pip install jiwer

Without it, fallback Levenshtein distance is used.

Model Download Errors

Whisper models are downloaded from HuggingFace:

# Check HuggingFace login
huggingface-cli login

# Or set token
export HF_TOKEN=your_token_here

Out of Memory (OOM)

  • Use smaller models (whisper-tiny instead of whisper-small)
  • Reduce --limit for testing
  • Use CPU with --device cpu

Audio Loading Errors

Check audio file format:

import torchaudio

# Check supported formats
print(torchaudio.list_audio_backends())

# Try loading
waveform, sr = torchaudio.load("test.wav")
print(f"Sample rate: {sr}, Shape: {waveform.shape}")

ROCm-specific Issues

Set environment variables:

export TORCH_ROCM_AOTRITON_ENABLE_EXPERIMENTAL=1
export HSA_OVERRIDE_GFX_VERSION=11.5.1  # For Strix Halo

Next Steps