ai-multimodal
Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image, editing, composition, refinement). Use when working with audio/video files, analyzing images or screenshots, processing PDF documents, extracting structured data from media, creating images from text prompts, or implementing multimodal AI features. Supports multiple models (Gemini 2.5/2.0) with context windows up to 2M tokens.
Signing
SignedSLSA L2- Signed by
- skills-hub.ai distributor
- Method
- Distributor-signed by skills-hub.aiCryptographically signed by the skills-hub.ai distributor key at publish time.
- Signed
Install this skill
Run this command in your terminal. No account required — it auto-detects your AI tool and installs the skill file.
npx @skills-hub-ai/cli install claudekit-ai-multimodalSetup by platform
Install
One-click setup for your editorRun in your project root
npx @skills-hub-ai/cli install claudekit-ai-multimodal --target claude-codeInstructions
Security
Reviews (0)
Frequently asked questions about ai-multimodal
What does the ai-multimodal skill do?
Process and generate multimedia content using Google Gemini API. Capabilities include analyze audio files (transcription with timestamps, summarization, speech understanding, music/sound analysis up to 9.5 hours), understand images (captioning, object detection, OCR, visual Q&A, segmentation), process videos (scene detection, Q&A, temporal analysis, YouTube URLs, up to 6 hours), extract from documents (PDF tables, forms, charts, diagrams, multi-page), generate images (text-to-image, editing, composition, refinement). Use when working with audio/video files, analyzing images or screenshots, processing PDF documents, extracting structured data from media, creating images from text prompts, or implementing multimodal AI features. Supports multiple models (Gemini 2.5/2.0) with context windows up to 2M tokens. It's a reusable SKILL.md instruction set that loads into your AI coding assistant on demand, no prompt engineering, no copy-pasting every session.
How do I install the ai-multimodal skill?
Run `npx @skills-hub-ai/cli install claudekit-ai-multimodal` from your terminal. The CLI writes the SKILL.md to the correct location for your AI tool (e.g. ~/.claude/skills/claudekit-ai-multimodal/ for Claude Code or ~/.cursor/skills/ for Cursor with --target cursor) and adds it to your project's .skills.json lockfile.
Which AI tools does ai-multimodal work with?
ai-multimodal runs in Claude Code. It follows the open Agent Skills standard (SKILL.md), so the same skill works in every supported tool without modification.
Is the ai-multimodal skill free?
Yes. Every skill on skills-hub.ai is free and open-source. There are no premium tiers, paywalls, or usage limits. You only pay for whatever AI assistant you're already using.
How do I use ai-multimodal after installing it?
In Claude Code, type `/claudekit-ai-multimodal` (or whatever slash command the skill registers) and the AI follows the skill's instructions immediately. You can also reference it by name in natural language, your AI loads the skill into context when relevant.
Can I share the ai-multimodal skill with my team?
Yes. Commit your project's .skills.json lockfile and teammates run `npx @skills-hub-ai/cli install` (no args) to install every skill at the exact version you pinned. Organization-scoped installs work via skills-hub.ai organizations.