API Documentation

Explore Velma-2 voice models - Deepfake Detection, Transcription, Emotion Detection, and more.

Velma-2 Models

Velma-2 handles transcription, emotion, accent, and engagement detection across 70+ languages. Choose the model that fits your use case.

Batch English FastBatch MultilingualStreaming Multilingual
DescriptionHigh-throughput English batch processing with >200x real-time speedMultilingual batch transcription in 70+ languages with full feature setReal-time streaming transcription in 70+ languages via WebSocket
Endpoint/api/velma-2-stt-batch-english-vfast/api/velma-2-stt-batch/api/velma-2-stt-streaming
TypeBatchBatchStreaming
APIREST APIREST APIWebSocket
Accepted FilesAAC, AIFF, FLAC, MOV, MP3, MP4, OGG, Opus, WAV, WebMAAC, AIFF, FLAC, MOV, MP3, MP4, OGG, Opus, WAV, WebMAAC, AIFF, FLAC, MOV, MP3, MP4, OGG, Opus, WAV, WebM
Pricing$0.025 / hour$0.03 / hour$0.06 / hour
Built-in Features
Transcription
Auto Capitalization
Auto Punctuation
LanguageEnglish70+ languages70+ languages
Real-Time
Optional Features
Speaker Diarization
Emotion Detection
Accent Identification
PII/PHI Tagging
Deepfake Detection

Authentication & Rate Limits

Authentication

REST endpoints accept either a Console Admin key or a Console Read-Only key via the X-API-Key header. Read-only keys work on GET requests; write requests (POST, PUT, PATCH, DELETE) require a Console Admin key. The WebSocket endpoint uses an api_key query parameter and a Model key type (separate from console keys).

REST: X-API-Key: your_api_key_here WebSocket: wss://...?api_key=your_api_key_here
Rate Limits & Billing
  • Per-model concurrency and monthly usage quotas
  • Credit-based billing with free tier included
  • Usage tracked in real time via the Usage Dashboard