Skills-hub Blog · RFC

Stateful Skills: Extending SKILL.md with Memory

Static skills start every user from zero. Stateful skills compound. Today we're publishing a draft RFC for a backward-compatible extension that lets a skill carry the operational context its practitioners have accumulated.

By tinh2May 19, 2026

TL;DR

Static skills (SKILL.md, GPT JSON, Replit Apps) start every new user at zero. The patterns that experienced practitioners pick up from running the same skill 60 times never land in the published artifact.
Stateful skills add three optional sibling files, MEMORY.md, EXAMPLES.md, and CALIBRATION.md , so a skill carries what worked, the examples that anchor the right behavior, and the domain-specific calibrations its users converge on.
Fully backward compatible. Open RFC. Read the normative version at /spec/stateful-skills; comments at github.com/tinh2/skills-hub/discussions.

The problem

Every skill marketplace today, Claude Skills, GPT Store, Replit Agent Market, MCP Hub, Hugging Face Spaces, ships prompt templates. A skill is a file. You install it. You run it. The file does not change.

But a practitioner who runs the same skill 60 times learns things. They learn that the test-generation skill flakes on async generators. They learn that the legal-summary skill needs an extra guardrail when the document is a UCC filing. They learn five killer examples that anchor the right output every time.

None of that ever lands in the skill file. It sits in a Notion doc on someone's laptop, or in their head. The next user who installs the skill starts from zero. The flywheel that should exist between a skill and the people running it does not exist, because the format does not have a place to put what they learn.

The proposal

A skill directory MAY now contain up to three optional siblings alongside SKILL.md:

my-skill/
├── SKILL.md          # the existing spec, instructions
├── MEMORY.md         # what worked, what didn't, edge cases
├── EXAMPLES.md       # curated input/output pairs
└── CALIBRATION.md    # domain-specific tweaks

MEMORY.mdis the ledger of operational experience, dated entries grouped by what worked, what didn't, edge cases found in production. Frontmatter tracks version, contributor count, and last-updated date.

EXAMPLES.md is a curated list of input/output pairs. Each example is an H2 block with ### Input and ### Output subheadings, plus an optional ### Why this example commentary block.

CALIBRATION.md captures domain-specific tweaks: preferred terminology, output format conventions (SOAP notes vs. IRAC vs. PRD), anti-patterns to avoid in the domain, regulatory constraints (HIPAA, FCRA, FDA). The file that turns a generic summarizer into a medical-records summarizer.

Normative load order, versioning rules, and frontmatter schemas are in the spec at /spec/stateful-skills. This post is intentionally less formal, the spec is the source of truth.

Why now

Memory became a first-class architectural concern in 2026. Mem0, Letta, Zep, Cognee, Cloudflare Agent Memory, LinkedIn's Cognitive Memory Agent, every serious agent stack has a memory layer now. The SDKs are converging on the idea that an agent without memory is a chatbot.

And yet: no marketplace ties memory to the published artifact. Memory lives in your runtime. It belongs to whoever runs the skill, not to the skill itself. When you install a skill from a registry, you get a file. You do not get what the last hundred users learned while running it.

That is the format gap. The SDK ecosystem has solved memory at runtime. The artifact ecosystem has not. The bet behind this RFC is that the published artifact is where the leverage compounds, one file, indexed once, picked up by every runtime, every nightly sync, every search result. Memory at the artifact layer is community memory. Memory at the runtime layer is private.

Why open

Anthropic open-sourced SKILL.md in December 2025, and the format spread to Cursor, Codex, Windsurf, and every MCP-compatible client within a quarter. The lesson is straightforward: skills outlive runtimes, and a format that lives inside one vendor's wall is a format that dies when the vendor moves on.

So we are publishing this as an open RFC rather than a proprietary skills-hub.ai feature. If the proposal is good, it should be something Anthropic, Microsoft, Google, and the community can adopt without licensing anything from us. If the proposal is bad, it should be possible for someone to publish a counter-proposal that replaces it. Both outcomes are better than a private format that locks the memory layer to a single registry.

What's next

skills-hub.ai is shipping the reference implementation in this same sprint:

Parser support for the three sibling files in @skills-hub-ai/skill-parser. The 62 external sources we already sync nightly will pick up MEMORY/EXAMPLES/CALIBRATION files automatically on their next run, zero per-publisher effort.
Detail-page rendering on /skills/[slug]: three collapsible sections, contribution counts, and a ?stateful=1 browse filter so practitioners can find skills that actually carry experience.
Cosign/SLSA attestation extended to memory contributions, so a verifying installer can prove no entry was tampered with after the skill was signed.

The normative spec lives at /spec/stateful-skills. Comments, objections, and proposed amendments are welcome at github.com/tinh2/skills-hub/discussions. If you ship a runtime or a registry and want to coordinate on adoption, the discussion thread is the best place to raise it.