API Documentation
Explore Velma-2 voice models - Deepfake Detection, Transcription, Emotion Detection, and more.
Velma-2 Models
Velma-2 handles transcription, emotion, accent, and engagement detection across 70+ languages. Choose the model that fits your use case.
| Batch English Fast | Batch Multilingual | Streaming Multilingual | |
|---|---|---|---|
| Description | High-throughput English batch processing with >200x real-time speed | Multilingual batch transcription in 70+ languages with full feature set | Real-time streaming transcription in 70+ languages via WebSocket |
| Endpoint | /api/velma-2-stt-batch-english-vfast | /api/velma-2-stt-batch | /api/velma-2-stt-streaming |
| Type | Batch | Batch | Streaming |
| API | REST API | REST API | WebSocket |
| Accepted Files | AAC, AIFF, FLAC, MOV, MP3, MP4, OGG, Opus, WAV, WebM | AAC, AIFF, FLAC, MOV, MP3, MP4, OGG, Opus, WAV, WebM | AAC, AIFF, FLAC, MOV, MP3, MP4, OGG, Opus, WAV, WebM |
| Pricing | $0.025 / hour | $0.03 / hour | $0.06 / hour |
| Built-in Features | |||
| Transcription | |||
| Auto Capitalization | |||
| Auto Punctuation | |||
| Language | English | 70+ languages | 70+ languages |
| Real-Time | |||
| Optional Features | |||
| Speaker Diarization | |||
| Emotion Detection | |||
| Accent Identification | |||
| PII/PHI Tagging | |||
| Deepfake Detection | |||
Authentication & Rate Limits
Authentication
REST endpoints accept either a Console Admin key or a Console Read-Only key via the X-API-Key header. Read-only keys work on GET requests; write requests (POST, PUT, PATCH, DELETE) require a Console Admin key. The WebSocket endpoint uses an api_key query parameter and a Model key type (separate from console keys).
REST: X-API-Key: your_api_key_here WebSocket: wss://...?api_key=your_api_key_hereRate Limits & Billing
- Per-model concurrency and monthly usage quotas
- Credit-based billing with free tier included
- Usage tracked in real time via the Usage Dashboard