API Documentation

Explore Velma-2 voice models - Deepfake Detection, Transcription, Emotion Detection, and more.

Velma-2 Models

Velma-2 handles transcription, emotion, accent, and engagement detection across 70+ languages. Choose the model that fits your use case.

	Batch English Fast	Batch Multilingual	Streaming Multilingual
Description	High-throughput English batch processing with >200x real-time speed	Multilingual batch transcription in 70+ languages with full feature set	Real-time streaming transcription in 70+ languages via WebSocket
Endpoint	`/api/velma-2-stt-batch-english-vfast`	`/api/velma-2-stt-batch`	`/api/velma-2-stt-streaming`
Type	Batch	Batch	Streaming
API	REST API	REST API	WebSocket
Accepted Files	AAC, AIFF, FLAC, MOV, MP3, MP4, OGG, Opus, WAV, WebM	AAC, AIFF, FLAC, MOV, MP3, MP4, OGG, Opus, WAV, WebM	AAC, AIFF, FLAC, MOV, MP3, MP4, OGG, Opus, WAV, WebM
Pricing	$0.025 / hour	$0.03 / hour	$0.06 / hour
Built-in Features
Transcription
Auto Capitalization
Auto Punctuation
Language	English	70+ languages	70+ languages
Real-Time
Optional Features
Speaker Diarization
Emotion Detection
Accent Identification
PII/PHI Tagging
Deepfake Detection

Authentication & Rate Limits

Authentication

REST endpoints accept either a Console Admin key or a Console Read-Only key via the X-API-Key header. Read-only keys work on GET requests; write requests (POST, PUT, PATCH, DELETE) require a Console Admin key. The WebSocket endpoint uses an api_key query parameter and a Model key type (separate from console keys).

REST: X-API-Key: your_api_key_here WebSocket: wss://...?api_key=your_api_key_here

Rate Limits & Billing

Per-model concurrency and monthly usage quotas
Credit-based billing with free tier included
Usage tracked in real time via the Usage Dashboard