Skip to main content
Back to registry

serving-llms-vllm

davila7/claude-code-templates

vLLM achieves 24x higher throughput than standard transformers through PagedAttention (block-based KV cache) and continuous batching (mixing prefill/decode requests).

Installs196
Install command
npx skills add https://github.com/davila7/claude-code-templates --skill serving-llms-vllm
Security audits
Gen Agent Trust HubPASS
SocketPASS
SnykWARN
Community Reviews

Latest reviews

Sign in to review

No community reviews yet. Be the first to review.

Browse this skill in context
FAQ
What does serving-llms-vllm do?

vLLM achieves 24x higher throughput than standard transformers through PagedAttention (block-based KV cache) and continuous batching (mixing prefill/decode requests).

Is serving-llms-vllm good?

serving-llms-vllm does not have approved reviews yet, so SkillJury cannot publish a community verdict.

What agent does serving-llms-vllm work with?

serving-llms-vllm currently lists compatibility with codex, gemini-cli, opencode, cursor, github-copilot, claude-code.

What are alternatives to serving-llms-vllm?

Skills in the same category include telegram-bot-builder, flutter-app-size, sharp-edges, iterative-retrieval.

How do I install serving-llms-vllm?

npx skills add https://github.com/davila7/claude-code-templates --skill serving-llms-vllm

Related skills

More from davila7/claude-code-templates

Related skills

Alternatives in Software Engineering