Skip to main content
Back to registry

speech-to-text

elevenlabs/skills

Transcribe audio to text with Scribe v2 - supports 90+ languages, speaker diarization, and word-level timestamps.

Installs1
Install command
npx skills add https://github.com/elevenlabs/skills --skill speech-to-text
About this skill
Transcribe audio to text with Scribe v2 - supports 90+ languages, speaker diarization, and word-level timestamps. Setup: See Installation Guide . For JavaScript, use @elevenlabs/* packages only. Word-level timestamps include type classification and speaker identification: Identify WHO said WHAT - the model labels each word with a speaker ID, useful for meetings, interviews, or any multi-speaker audio: Help the model recognize specific words it might otherwise mishear - product names, technical jargon, or unusual spellings (up to 100 terms): Automatic detection with optional language hint: Audio: MP3, WAV, M4A, FLAC, OGG, WebM, AAC, AIFF, Opus Video: MP4, AVI, MKV, MOV, WMV, FLV, WebM, MPEG, 3GPP Limits: Up to 3GB file size, 10 hours duration Word types: Common errors: Monitor usage via request-id response header: For live transcription with ultra-low latency (~150ms), use the real-time API. The real-time API produces two types of transcripts: A "commit" tells the model to finalize the current segment. You can commit manually (e.g., when the user pauses) or use Voice Activity Detection (VAD) to auto-commit on silence. See real-time references for complete documentation. - word - An actual spoken word - spacing - Whitespace between words (useful for precise timing) - audio_event - Non-speech sounds the model detected (laughter, applause, music, etc.) - 401 : Invalid API key -...

Source description provided by the upstream skill listing. Community reviews and install context appear in the sections below.

Community Reviews

Latest reviews

Sign in to review

No community reviews yet. Be the first to review.

Browse this skill in context
FAQ
What does speech-to-text do?

Transcribe audio to text with Scribe v2 - supports 90+ languages, speaker diarization, and word-level timestamps.

Is speech-to-text good?

speech-to-text does not have approved reviews yet, so SkillJury cannot publish a community verdict.

What agent does speech-to-text work with?

speech-to-text currently lists compatibility with codex, gemini-cli, opencode, kimi-cli, github-copilot, openclaw, claude-code.

What are alternatives to speech-to-text?

Skills in the same category include telegram-bot-builder, flutter-app-size, sharp-edges, iterative-retrieval.

How do I install speech-to-text?

npx skills add https://github.com/elevenlabs/skills --skill speech-to-text

Related skills

More from elevenlabs/skills

Related skills

Alternatives in Software Engineering