Skip to main content
Back to registry

ai-multimodal

mrgoonie/claudekit-skills

Process audio, images, videos, documents, and generate images using Google Gemini's multimodal API. Unified interface for all multimedia content understanding and generation.

Installs196
Install command
npx skills add https://github.com/mrgoonie/claudekit-skills --skill ai-multimodal
Security audits
Gen Agent Trust HubPASS
SocketPASS
SnykWARN
About this skill
Process audio, images, videos, documents, and generate images using Google Gemini's multimodal API. Unified interface for all multimedia content understanding and generation. API Key Setup : Supports both Google AI Studio and Vertex AI. The skill checks for GEMINI_API_KEY in this order: Get API key : For Vertex AI : Install SDK : Transcribe Audio : Analyze Image : Process Video : Extract from PDF : Generate Image : Optimize Media : Convert Documents to Markdown : For detailed implementation guidance, see: Input Pricing : Token Rates : TTS Pricing : Free Tier : YouTube Limits : Storage Limits : Common errors and solutions: All scripts support unified API key detection and error handling: gemini_batch_process.py : Batch process multiple media files media_optimizer.py : Prepare media for Gemini API document_converter.py : Convert documents to PDF Run any script with --help for detailed usage. - Transcription with timestamps (up to 9.5 hours) - Audio summarization and analysis - Speech understanding and speaker identification - Music and environmental sound analysis - Text-to-speech generation with controllable voice - Image captioning and description - Object detection with bounding boxes (2.0+) - Pixel-level segmentation (2.5+) - Visual question answering - Multi-image comparison (up to 3,600 images) - OCR and text extraction - Scene detection and summarization - Video Q&A with...

Source description provided by the upstream skill listing. Community reviews and install context appear in the sections below.

Community Reviews

Latest reviews

Sign in to review

No community reviews yet. Be the first to review.

Browse this skill in context
FAQ
What does ai-multimodal do?

Process audio, images, videos, documents, and generate images using Google Gemini's multimodal API. Unified interface for all multimedia content understanding and generation.

Is ai-multimodal good?

ai-multimodal does not have approved reviews yet, so SkillJury cannot publish a community verdict.

What agent does ai-multimodal work with?

ai-multimodal currently lists compatibility with codex, gemini-cli, opencode, cursor, github-copilot, claude-code.

What are alternatives to ai-multimodal?

Skills in the same category include telegram-bot-builder, flutter-app-size, sharp-edges, iterative-retrieval.

How do I install ai-multimodal?

npx skills add https://github.com/mrgoonie/claudekit-skills --skill ai-multimodal

Related skills

More from mrgoonie/claudekit-skills

Related skills

Alternatives in Software Engineering