Build with Velma-2

Velma-2 transcribes speech in 70+ languages and natively detects emotion, accent, speaker identity, and engagement directly from audio.

Best-in-Class Transcription

Transcription in 70+ languages with specialized vocabulary for medical, geographic, and political terms. Low cost per audio hour.

Voice-Native Understanding

Velma-2 reads emotion, sentiment, energy, and engagement directly from the audio signal, not from text.

Ensemble Listening

Combine transcription, emotion, and engagement signals in a single API call for full conversation analysis.

Built for Developers

  • REST and WebSocket APIs
  • Simple API key authentication
  • Credit-based pricing with free tier
  • Usage dashboard and billing controls

Example API call

curl -X POST \
  https://modulate-developer-apis.com/api/velma-2-stt-batch \
  -H "X-API-Key: YOUR_API_KEY" \
  -F "upload_file=@audio.mp3" \
  -F "speaker_diarization=true" \
  -F "emotion_signal=true"

Learn More

Ready to Build?

Create a free account and start making API calls in minutes.

Get Started