Skip to main content
Back to the directory
elevenlabs/skillsSoftware EngineeringFrontend and Design

speech-to-text

Transcribe audio and video to text with speaker identification, word-level timestamps, and 90+ language support.

SkillJury keeps community verdicts, source metadata, and external repository signals in separate lanes so ranking data never pretends to be a review.

SkillJury verdict
Pending

No approved reviews yet

Would recommend
Pending

Waiting on enough review volume

Install signal
3

Weekly or total install activity from catalog data

Sign in to review
0 review requests
Install command
npx skills add https://github.com/elevenlabs/skills --skill speech-to-text
SkillJury does not have enough approved reviews to publish a community verdict yet. Source metadata and repository proof are still available above.
SkillJury Signal Summary

As of Apr 30, 2026, speech-to-text has 3 weekly installs, 0 community reviews on SkillJury. Community votes currently stand at 0 upvotes and 0 downvotes. Source: elevenlabs/skills. Canonical URL: https://skills.sh/elevenlabs/skills/speech-to-text.

Security audits
Gen Agent Trust HubPASS
SocketPASS
SnykWARN
About this skill
Transcribe audio and video to text with speaker identification, word-level timestamps, and 90+ language support. Transcribe audio to text with Scribe v2 - supports 90+ languages, speaker diarization, and word-level timestamps. Setup: See Installation Guide . For JavaScript, use @elevenlabs/* packages only. Word-level timestamps include type classification and speaker identification: Identify WHO said WHAT - the model labels each word with a speaker ID, useful for meetings, interviews, or any multi-speaker audio: For call recordings, the batch API can label diarized speakers as agent and customer by setting detect_speaker_roles=true alongside diarize=true . This option is not compatible with use_multi_channel=true . Help the model recognize specific words it might otherwise mishear - product names, technical jargon, or unusual spellings (up to 100 terms): Automatic detection with optional language hint: Audio: MP3, WAV, M4A, FLAC, OGG, WebM, AAC, AIFF, Opus Video: MP4, AVI, MKV, MOV, WMV, FLV, WebM, MPEG, 3GPP Limits: Up to 3GB file size, 10 hours duration Word types: Common errors: Monitor usage via request-id response header: For live transcription with ultra-low latency (~150ms), use the real-time API. The real-time API produces two types of transcripts: A "commit" tells the model to finalize the current segment.

Source description provided by the upstream listing. Community review signal and install context stay separate from this narrative layer.

Community reviews

Latest reviews

No community reviews yet. Be the first to review.

Browse this skill in context
FAQ
What does speech-to-text do?

Transcribe audio and video to text with speaker identification, word-level timestamps, and 90+ language support.

Is speech-to-text good?

speech-to-text does not have approved reviews yet, so SkillJury cannot publish a community verdict.

Which AI agents support speech-to-text?

speech-to-text currently lists compatibility with Skills CLI.

Is speech-to-text safe to install?

speech-to-text has been scanned by security audit providers tracked on SkillJury. Check the security audits section on this page for detailed results from Socket.dev and Snyk.

What are alternatives to speech-to-text?

Skills in the same category include grimoire-morpho-blue, conversation-memory, second-brain-ingest, zai-tts.

How do I install speech-to-text?

Run the following command to install speech-to-text: npx skills add https://github.com/elevenlabs/skills --skill speech-to-text

Related skills

More from elevenlabs/skills

Related skills

Alternatives in Software Engineering