Velma-2 Models
Velma-2 handles transcription, emotion, accent, and engagement detection across 70+ languages. Choose the model that fits your use case.
Velma-2 Batch
Best-in-class multilingual batch transcription in 70+ languages with speaker diarization, emotion detection, accent identification, and PII/PHI tagging. Supports AAC, AIFF, FLAC, MP3, MP4, MOV, OGG, Opus, WAV, and WebM up to 100MB.
/api/velma-2-stt-batch
REST API
Velma-2 Batch English Fast
English-only batch transcription with extremely high throughput and automatic capitalization and punctuation. Opus format audio only. Ideal for high-scale batch processing.
/api/velma-2-stt-batch-english-vfast
REST API
Velma-2 Streaming
Real-time streaming transcription via WebSocket with full voice-native understanding: multilingual support, speaker diarization, emotion, accent, and PII/PHI tagging.
/api/velma-2-stt-streaming
WebSocket
Feature Comparison
| Feature | Velma-2 Batch | Velma-2 Batch English Fast | Velma-2 Streaming |
|---|---|---|---|
| Transcription | |||
| Speaker Diarization | |||
| Emotion Detection | |||
| Accent Identification | |||
| PII/PHI Tagging | |||
| Multilingual | |||
| Auto Capitalization | |||
| Auto Punctuation | |||
| Real-Time |
Authentication & Rate Limits
Authentication
REST endpoints require an API key via the
X-API-Key header. The WebSocket endpoint uses
an api_key query parameter.
REST: X-API-Key: your_api_key_here
WebSocket: wss://...?api_key=your_api_key_here
Rate Limits & Billing
- Per-model concurrency and monthly usage quotas
- Credit-based billing with free tier included
- Usage tracked in real time via the Usage Dashboard
Models
Select a model
Select a model from the sidebar to view its documentation, API spec, and quickstart guide.
Example Projects Coming Soon
We're preparing example projects to help you get started with the Velma-2 API. Check back soon for ready-to-use code samples and integration guides.
API Specs
Select an API spec
Select an API spec from the sidebar to view its OpenAPI definition.