agents.json vs WebMCP vs llms.txt

If you've started reading about making your site "AI-agent-ready" you've probably hit three different acronyms in three different posts, all promising to solve the same problem. Spoiler: they don't. They solve different layers of the same problem.

This post is the comparison nobody seems to be writing. We'll go through each standard, what it actually does, where the three overlap, and — most usefully — which ones you should ship today, which to wait on, and what they look like working together.

The three-layer mental model

The cleanest way to keep these straight is to think of agentic browsing as three layers stacked on top of each other:

Layer	Question it answers	The file
Reading	"What on this site is worth my limited reading budget?"	`llms.txt`
Acting	"What can I do on this site? Search? Buy? Book?"	`agents.json`
Behaving	"How should I do those things — what's allowed?"	`agent-instructions.md`

WebMCP is a fourth thing — it doesn't replace any of the three. It moves the acting layer from a sidecar manifest into the page itself. More on that in a minute.

`llms.txt` — the reading list

What it is. A plain-text file at the root of your domain (/llms.txt) listing the pages on your site worth reading, in priority order, with one-line summaries.

The format. Markdown-flavored plain text. No schema, no XML, no JSON. Just headings and bullet lists. The spec lives at llmstxt.org.

What problem it solves. LLMs have tiny context budgets relative to the size of a modern website. When an LLM lands on your domain it can realistically read 3–20 URLs before context fills up. Without llms.txt the model picks those URLs based on whatever heuristics its upstream crawler decided were important. With llms.txt you say "read these, in this order, here's why each one matters."

Who consumes it. ChatGPT search, Claude, Perplexity, Gemini AI Mode, and a growing number of agentic-browsing audit tools.

When to ship it. Today. It's the cheapest, highest-ROI of the three files and Lighthouse's Agentic Browsing category already audits for it.

Full spec →

`agents.json` — the control panel

What it is. A structured JSON file at the root (/agents.json) declaring the actions an agent can take on your site — search, request quote, book, add to cart, subscribe.

The format. JSON. Each action has an id, a human-readable name, a method and endpoint, and a typed parameters array. Spec is a small subset of OpenAPI's philosophy, intentionally lightweight so non-developers can edit it.

What problem it solves. Once an agent has read your content (the job of llms.txt) and decided to do something, it needs to know how. Without agents.json, agents fall back to scraping forms — which is slow, error-prone, and breaks every time you ship a UI change. With agents.json, the action surface is explicit and stable.

Who consumes it. Operator-class agents (OpenAI Operator, Anthropic Computer Use, Google Project Mariner), commercial agent SDKs, and Lighthouse's Agentic Browsing category (which audits both presence and parameter typing).

When to ship it. Today, if your site has any transactional surface — e-commerce, booking, contact, quotes, newsletter. Skip it only if your site is pure read-only content.

Full spec →

`agent-instructions.md` — the runbook

What it is. A plain Markdown file at the root (/agent-instructions.md) with human-readable guidance: how to quote prices, what tone to use, where the canonical answer lives when pages disagree, which content to summarize vs. link.

The format. Free-form Markdown. There's no formal schema and intentionally so — this is the file you want a human to be able to write or edit in 10 minutes.

What problem it solves. Agents calling agents.json get what they can do. They don't get how to do it well. Without a runbook, agents make plausible-sounding but wrong choices: misquoting prices without VAT, summarizing your refund policy with phrases you never said, picking the wrong contact email. The runbook is the behavioral guardrail.

Who consumes it. Frontier-class models (GPT-4-class and up) read and follow these reliably. Smaller models partial-follow. Adoption is strengthening as agent platforms add it to their default-read list.

When to ship it. Today, alongside llms.txt and agents.json. Cheap to write, high signal.

Full spec →

WebMCP — the in-page declaration

What it is. A proposal (currently led by Google) to move the action manifest from a sidecar file (agents.json) into the page itself, as annotations on <button>, <form>, <input>, and other interactive elements.

The format. HTML attributes and microdata-style markup on existing DOM elements. The agent reads the page, sees a <form data-mcp-action="search_products">, and knows it can call that form as a typed action without fetching a manifest.

What problem it solves. Two things agents.json doesn't. First, the manifest is always in sync with the page — the page is the manifest. No drift between what agents.json says exists and what's actually there. Second, the agent can act while browsing, not as a separate fetch step. Lower latency, fewer round trips.

Who consumes it. Eventually: Chrome itself (Lighthouse already defines a webmcp-annotations audit slot), Operator-class agents, agent SDKs. Today: very few real implementations. The spec is moving.

When to ship it. Not yet. The standard is unstable, the audit is weighted low, and shipping early annotations risks rework when the final spec lands. We'll ship WebMCP-annotated kit output as a free add-on the moment the spec stabilizes. Until then, agents.json is the practical baseline.

How they coexist

The four standards don't replace each other — they layer:

Agent visits yourdomain.com
       │
       ▼
  ┌────────────────────────────┐
  │  /llms.txt                 │  → "Here is what to read first."
  └────────────┬───────────────┘
               │
               ▼ Agent reads the priority pages
  ┌────────────────────────────┐
  │  /agents.json              │  → "Here is what you can do."
  └────────────┬───────────────┘
               │
               ▼ Agent picks an action
  ┌────────────────────────────┐
  │  /agent-instructions.md    │  → "Here is HOW to do it well."
  └────────────┬───────────────┘
               │
               ▼ (Future, post-WebMCP)
  ┌────────────────────────────┐
  │  Per-element annotations   │  → Inline action invocation.
  │  on the page itself        │
  └────────────────────────────┘

You ship the top three today. You ship WebMCP when the spec lands. That's the whole sequencing question.

What about `robots.txt` and `sitemap.xml`?

robots.txt and sitemap.xml are not in this list because they solve a different problem. robots.txt is access control — "you may crawl X, you may not crawl Y." sitemap.xml is a URL inventory — "here are all the URLs that exist." Both are pre-AI standards and both remain important; agents read them.

But neither tells the agent what's worth reading (llms.txt's job), what's actionable (agents.json's job), or how to behave (agent-instructions.md's job). The new standards layer on top of the old ones — they don't replace them.

The 2026 baseline

If your site does any transaction — sells anything, books anything, captures any leads — the minimum AI-agent surface today is:

robots.txt with a Sitemap: reference. (Old. Still required.)
sitemap.xml that's actually current. (Old. Still required.)
llms.txt at the root, well-formed.
agents.json at the root, with at least search + contact typed as actions.
agent-instructions.md at the root, with brand voice + pricing guidance + escalation contact.
Schema.org JSON-LD on homepage and primary product/article/FAQ pages.
Three <link rel="alternate"> tags in the homepage <head> pointing to (3), (4), and (5) for auto-discovery.

Skip (3)–(7) and you fail Chrome's Agentic Browsing audit. Fail the audit and you don't appear in agent answer panels. Don't appear in agent answer panels and you lose a meaningful and growing share of high-intent traffic to whichever competitor did ship.

Items (3), (4), (5), and (7) are exactly what the BridgeToAgent kit generates from your real DOM in under two minutes, for a one-time $49. (1), (2), and (6) live inside your CMS and our readiness audit flags any gaps so you know what to fix yourself.

WebMCP comes later. Don't let it block you from shipping the three that work today.

agents.json vs WebMCP vs llms.txt

The three-layer mental model

llms.txt — the reading list

agents.json — the control panel

agent-instructions.md — the runbook

WebMCP — the in-page declaration

How they coexist

What about robots.txt and sitemap.xml?

The 2026 baseline

Related

`llms.txt` — the reading list

`agents.json` — the control panel

`agent-instructions.md` — the runbook

What about `robots.txt` and `sitemap.xml`?