Skip to main content

serving-llms-vllm

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with limited GPU memory. Supports OpenAI-compatible endpoints, quantization (GPTQ/AWQ/FP8), and tensor parallelism.

v1.0.0New
0

Signing

SignedSLSA L2
Signed by
skills-hub.ai distributor
Method
Distributor-signed by skills-hub.aiCryptographically signed by the skills-hub.ai distributor key at publish time.
Signed

Install this skill

Run this command in your terminal. No account required — it auto-detects your AI tool and installs the skill file.

npx @skills-hub-ai/cli install ai-research-serving-llms-vllm
Or download directly:
Browse all CLI commands →

Setup by platform

Claude Code

~/.claude/skills/<skill>/SKILL.md

Setup guide →

Install

One-click setup for your editor

Run in your project root

npx @skills-hub-ai/cli install ai-research-serving-llms-vllm --target claude-code

Instructions

This skill doesn’t include stateful context yet, instructions only. Learn about stateful skills.

Security

Loading security scan...

Reviews (0)

Frequently asked questions about serving-llms-vllm

What does the serving-llms-vllm skill do?

Serves LLMs with high throughput using vLLM's PagedAttention and continuous batching. Use when deploying production LLM APIs, optimizing inference latency/throughput, or serving models with limited GPU memory. Supports OpenAI-compatible endpoints, quantization (GPTQ/AWQ/FP8), and tensor parallelism. It's a reusable SKILL.md instruction set that loads into your AI coding assistant on demand, no prompt engineering, no copy-pasting every session.

How do I install the serving-llms-vllm skill?

Run `npx @skills-hub-ai/cli install ai-research-serving-llms-vllm` from your terminal. The CLI writes the SKILL.md to the correct location for your AI tool (e.g. ~/.claude/skills/ai-research-serving-llms-vllm/ for Claude Code or ~/.cursor/skills/ for Cursor with --target cursor) and adds it to your project's .skills.json lockfile.

Which AI tools does serving-llms-vllm work with?

serving-llms-vllm runs in Claude Code. It follows the open Agent Skills standard (SKILL.md), so the same skill works in every supported tool without modification.

Is the serving-llms-vllm skill free?

Yes. Every skill on skills-hub.ai is free and open-source. There are no premium tiers, paywalls, or usage limits. You only pay for whatever AI assistant you're already using.

How do I use serving-llms-vllm after installing it?

In Claude Code, type `/ai-research-serving-llms-vllm` (or whatever slash command the skill registers) and the AI follows the skill's instructions immediately. You can also reference it by name in natural language, your AI loads the skill into context when relevant.

Can I share the serving-llms-vllm skill with my team?

Yes. Commit your project's .skills.json lockfile and teammates run `npx @skills-hub-ai/cli install` (no args) to install every skill at the exact version you pinned. Organization-scoped installs work via skills-hub.ai organizations.