Build with Velma-2

Velma-2 transcribes speech in 70+ languages and natively detects emotion, accent, speaker identity, and engagement directly from audio.

Get Started For Free

Best-in-Class Transcription

Transcription in 70+ languages with specialized vocabulary for medical, geographic, and political terms. Low cost per audio hour.

Voice-Native Understanding

Velma-2 reads emotion, sentiment, energy, and engagement directly from the audio signal, not from text.

Ensemble Listening

Combine transcription, emotion, and engagement signals in a single API call for full conversation analysis.

Built for Developers

REST and WebSocket APIs
Simple API key authentication
Credit-based pricing with free tier
Usage dashboard and billing controls

Example API call

curl -X POST \
  https://modulate-developer-apis.com/api/velma-2-stt-batch \
  -H "X-API-Key: YOUR_API_KEY" \
  -F "upload_file=@audio.mp3" \
  -F "speaker_diarization=true" \
  -F "emotion_signal=true"

Learn More

Modulate.ai

Learn about Modulate's mission and technology

Ensemble Listening Models

Multi-model voice analysis for conversations

Velma Preview

Try Modulate's voice analysis in the browser

Ready to Build?

Create a free account and start making API calls in minutes.

Get Started