agents.json vs WebMCP vs llms.txt
If you've started reading about making your site "AI-agent-ready" you've probably hit three different acronyms in three different posts, all promising to solve the same problem. Spoiler: they don't. They solve different layers of the same problem.
This post is the comparison nobody seems to be writing. We'll go through each standard, what it actually does, where the three overlap, and — most usefully — which ones you should ship today, which to wait on, and what they look like working together.
The three-layer mental model
The cleanest way to keep these straight is to think of agentic browsing as three layers stacked on top of each other:
| Layer | Question it answers | The file |
|---|---|---|
| Reading | "What on this site is worth my limited reading budget?" | llms.txt |
| Acting | "What can I do on this site? Search? Buy? Book?" | agents.json |
| Behaving | "How should I do those things — what's allowed?" | agent-instructions.md |
WebMCP is a fourth thing — it doesn't replace any of the three. It moves the acting layer from a sidecar manifest into the page itself. More on that in a minute.
llms.txt — the reading list
What it is. A plain-text file at the root of your domain
(/llms.txt) listing the pages on your site worth reading, in
priority order, with one-line summaries.
The format. Markdown-flavored plain text. No schema, no XML, no JSON. Just headings and bullet lists. The spec lives at llmstxt.org.
What problem it solves. LLMs have tiny context budgets relative to
the size of a modern website. When an LLM lands on your domain it can
realistically read 3–20 URLs before context fills up. Without
llms.txt the model picks those URLs based on whatever heuristics its
upstream crawler decided were important. With llms.txt you say "read
these, in this order, here's why each one matters."
Who consumes it. ChatGPT search, Claude, Perplexity, Gemini AI Mode, and a growing number of agentic-browsing audit tools.
When to ship it. Today. It's the cheapest, highest-ROI of the three files and Lighthouse's Agentic Browsing category already audits for it.
agents.json — the control panel
What it is. A structured JSON file at the root
(/agents.json) declaring the actions an agent can take on your
site — search, request quote, book, add to cart, subscribe.
The format. JSON. Each action has an id, a human-readable name,
a method and endpoint, and a typed parameters array. Spec is a
small subset of OpenAPI's philosophy, intentionally lightweight
so non-developers can edit it.
What problem it solves. Once an agent has read your content (the
job of llms.txt) and decided to do something, it needs to know
how. Without agents.json, agents fall back to scraping forms —
which is slow, error-prone, and breaks every time you ship a UI change.
With agents.json, the action surface is explicit and stable.
Who consumes it. Operator-class agents (OpenAI Operator, Anthropic Computer Use, Google Project Mariner), commercial agent SDKs, and Lighthouse's Agentic Browsing category (which audits both presence and parameter typing).
When to ship it. Today, if your site has any transactional surface — e-commerce, booking, contact, quotes, newsletter. Skip it only if your site is pure read-only content.
agent-instructions.md — the runbook
What it is. A plain Markdown file at the root
(/agent-instructions.md) with human-readable guidance: how to quote
prices, what tone to use, where the canonical answer lives when pages
disagree, which content to summarize vs. link.
The format. Free-form Markdown. There's no formal schema and intentionally so — this is the file you want a human to be able to write or edit in 10 minutes.
What problem it solves. Agents calling agents.json get what
they can do. They don't get how to do it well. Without a
runbook, agents make plausible-sounding but wrong choices: misquoting
prices without VAT, summarizing your refund policy with phrases you
never said, picking the wrong contact email. The runbook is the
behavioral guardrail.
Who consumes it. Frontier-class models (GPT-4-class and up) read and follow these reliably. Smaller models partial-follow. Adoption is strengthening as agent platforms add it to their default-read list.
When to ship it. Today, alongside llms.txt and agents.json.
Cheap to write, high signal.
WebMCP — the in-page declaration
What it is. A proposal (currently led by Google) to move the
action manifest from a sidecar file (agents.json) into the page
itself, as annotations on <button>, <form>, <input>, and other
interactive elements.
The format. HTML attributes and microdata-style markup on
existing DOM elements. The agent reads the page, sees a
<form data-mcp-action="search_products">, and knows it can call that
form as a typed action without fetching a manifest.
What problem it solves. Two things agents.json doesn't.
First, the manifest is always in sync with the page — the page is
the manifest. No drift between what agents.json says exists and
what's actually there. Second, the agent can act while
browsing, not as a separate fetch step. Lower latency, fewer round
trips.
Who consumes it. Eventually: Chrome itself (Lighthouse already
defines a webmcp-annotations audit slot), Operator-class agents,
agent SDKs. Today: very few real implementations. The spec is moving.
When to ship it. Not yet. The standard is unstable, the audit is
weighted low, and shipping early annotations risks rework when the
final spec lands. We'll ship WebMCP-annotated kit output as a
free add-on the moment the spec stabilizes. Until then,
agents.json is the practical baseline.
How they coexist
The four standards don't replace each other — they layer:
Agent visits yourdomain.com
│
▼
┌────────────────────────────┐
│ /llms.txt │ → "Here is what to read first."
└────────────┬───────────────┘
│
▼ Agent reads the priority pages
┌────────────────────────────┐
│ /agents.json │ → "Here is what you can do."
└────────────┬───────────────┘
│
▼ Agent picks an action
┌────────────────────────────┐
│ /agent-instructions.md │ → "Here is HOW to do it well."
└────────────┬───────────────┘
│
▼ (Future, post-WebMCP)
┌────────────────────────────┐
│ Per-element annotations │ → Inline action invocation.
│ on the page itself │
└────────────────────────────┘
You ship the top three today. You ship WebMCP when the spec lands. That's the whole sequencing question.
What about robots.txt and sitemap.xml?
robots.txt and sitemap.xml are not in this list because they solve
a different problem. robots.txt is access control — "you may
crawl X, you may not crawl Y." sitemap.xml is a URL inventory —
"here are all the URLs that exist." Both are pre-AI standards and both
remain important; agents read them.
But neither tells the agent what's worth reading
(llms.txt's job), what's actionable
(agents.json's job), or how to behave
(agent-instructions.md's job). The new standards layer on top of
the old ones — they don't replace them.
The 2026 baseline
If your site does any transaction — sells anything, books anything, captures any leads — the minimum AI-agent surface today is:
robots.txtwith aSitemap:reference. (Old. Still required.)sitemap.xmlthat's actually current. (Old. Still required.)llms.txtat the root, well-formed.agents.jsonat the root, with at least search + contact typed as actions.agent-instructions.mdat the root, with brand voice + pricing guidance + escalation contact.- Schema.org JSON-LD on homepage and primary product/article/FAQ pages.
- Three
<link rel="alternate">tags in the homepage<head>pointing to (3), (4), and (5) for auto-discovery.
Skip (3)–(7) and you fail Chrome's Agentic Browsing audit. Fail the audit and you don't appear in agent answer panels. Don't appear in agent answer panels and you lose a meaningful and growing share of high-intent traffic to whichever competitor did ship.
Items (3), (4), (5), and (7) are exactly what the BridgeToAgent kit generates from your real DOM in under two minutes, for a one-time $49. (1), (2), and (6) live inside your CMS and our readiness audit flags any gaps so you know what to fix yourself.
WebMCP comes later. Don't let it block you from shipping the three that work today.