Multi-Language Verifier
Multi-Language Verifier
The MultiLanguageVerifier automatically detects the programming language from code patterns and routes to the appropriate language-specific verifier.
Overview
Instead of manually specifying a verifier for each language, the multi-language verifier:
- Analyzes the code for language-specific patterns
- Detects the programming language
- Routes to the appropriate verifier
- Returns the verification result
Supported Languages
| Language | Detection Patterns | Verifier Used |
|---|---|---|
| C++ | #include <iostream>, std::, cout << | GCCVerifier |
| C | #include <stdio.h>, printf( | GCCVerifier |
| Python | def , import , print( | PytestVerifier |
| Rust | fn main(), use std::, println! | RustVerifier |
| Go | package main, func main(), fmt. | GoVerifier |
| C# | using System, Console.WriteLine | DotNetVerifier |
| PowerShell | $var =, Write-Host, Get- | PowerShellVerifier |
Basic Usage
from halo_forge.rlvr.verifiers import MultiLanguageVerifier
verifier = MultiLanguageVerifier()
# C++ code - auto-detected
cpp_code = '''
#include <iostream>
int main() {
std::cout << "Hello" << std::endl;
return 0;
}
'''
result = verifier.verify(cpp_code)
print(f"Detected: {result.metadata['detected_language']}") # cpp
# Python code - auto-detected
python_code = '''
def hello():
print("Hello")
if __name__ == "__main__":
hello()
'''
result = verifier.verify(python_code)
print(f"Detected: {result.metadata['detected_language']}") # python
# Rust code - auto-detected
rust_code = '''
fn main() {
println!("Hello");
}
'''
result = verifier.verify(rust_code)
print(f"Detected: {result.metadata['detected_language']}") # rust
CLI Usage
Use --verifier auto to enable multi-language detection:
# Benchmark with auto-detection
halo-forge benchmark run \
--model Qwen/Qwen2.5-Coder-7B \
--prompts data/multi_lang_prompts.jsonl \
--verifier auto \
--samples 10 \
--output results/multi_lang.json
# RAFT training with auto-detection
halo-forge raft train \
--prompts data/mixed_prompts.jsonl \
--verifier auto \
--model Qwen/Qwen2.5-Coder-3B \
--cycles 6 \
--output models/multi_lang_raft
Explicit Language Override
You can override auto-detection:
verifier = MultiLanguageVerifier()
# Force Rust verification even if code looks like C++
result = verifier.verify(code, language='rust')
Configuration Options
verifier = MultiLanguageVerifier(
default_language='python', # Fallback if detection fails
max_workers=8, # Parallel verification workers
run_after_compile=False, # Run binaries after compile
binary_cache_dir='binaries/', # Cache compiled binaries
)
Language Detection Priority
Languages are checked in priority order:
- C++ (priority 10) - Most specific patterns
- Rust (priority 9) - Distinctive syntax
- Go (priority 9) - Distinctive syntax
- Python (priority 8) - Common patterns
- C# (priority 7) - .NET patterns
- PowerShell (priority 6) - Cmdlet patterns
- C (priority 5) - Basic C patterns
Higher priority languages are checked first. The first matching pattern wins.
Custom Language Configuration
Add or modify language detection:
from halo_forge.rlvr.verifiers import MultiLanguageVerifier, LanguageConfig
# Custom language config
my_configs = {
'typescript': LanguageConfig(
name='typescript',
patterns=[r'^import .* from', r': string', r': number'],
verifier_class='NodeVerifier', # Your custom verifier
priority=8
)
}
verifier = MultiLanguageVerifier(language_configs=my_configs)
Batch Verification
Verify multiple code samples with different languages:
codes = [cpp_code, python_code, rust_code, go_code]
results = verifier.verify_batch(codes)
for code, result in zip(codes, results):
print(f"Language: {result.metadata['detected_language']}")
print(f"Success: {result.success}")
Using with Mixed Datasets
For datasets containing multiple languages:
import json
from halo_forge.rlvr.verifiers import MultiLanguageVerifier
verifier = MultiLanguageVerifier()
# Load mixed-language prompts
with open('data/mixed_lang_prompts.jsonl') as f:
prompts = [json.loads(line) for line in f]
# The verifier handles each language automatically
for prompt in prompts:
result = verifier.verify(prompt['completion'])
print(f"{result.metadata['detected_language']}: {result.success}")
Supported Languages
Check what languages are available:
verifier = MultiLanguageVerifier()
print(verifier.supported_languages)
# ['cpp', 'c', 'python', 'rust', 'go', 'csharp', 'powershell']
Performance Considerations
- Lazy loading: Verifiers are created on-demand, not at initialization
- Caching: Once a language verifier is created, it’s reused
- Parallel: Batch verification uses thread pool
Comparison with Single-Language Verifiers
| Aspect | Single-Language | MultiLanguage |
|---|---|---|
| Setup | Specify verifier | Auto-detect |
| Performance | Slightly faster | Small overhead |
| Flexibility | One language | All languages |
| Use case | Homogeneous data | Mixed datasets |
Best Practices
- Use for mixed datasets - When prompts may generate different languages
- Set default_language - Provide fallback for ambiguous code
- Check detected_language - Verify detection is correct in metadata
- Consider explicit language - When you know the expected language
Alias
AutoVerifier is an alias for MultiLanguageVerifier:
from halo_forge.rlvr.verifiers import AutoVerifier
# Same as MultiLanguageVerifier
verifier = AutoVerifier()