Velma-2 transcribes speech in 70+ languages and natively detects emotion, accent, speaker identity, and engagement directly from audio.
Transcription in 70+ languages with specialized vocabulary for medical, geographic, and political terms. Low cost per audio hour.
Velma-2 reads emotion, sentiment, energy, and engagement directly from the audio signal, not from text.
Combine transcription, emotion, and engagement signals in a single API call for full conversation analysis.
Example API call
curl -X POST \
https://modulate-developer-apis.com/api/velma-2-stt-batch \
-H "X-API-Key: YOUR_API_KEY" \
-F "upload_file=@audio.mp3" \
-F "speaker_diarization=true" \
-F "emotion_signal=true"