BridgeToAgent
Explainer7 min read

What Cloudflare's four dimensions actually check — a working walkthrough

Cloudflare's isitagentready.com scores your site on Discoverability, Content, Bot Access Control, and Capabilities. The dimension names tell you the category. They don't tell you what's actually being checked. Here's the per-check walkthrough — what each one looks for, what open standard sits behind it, and where Cloudflare's scoring sits relative to that standard.

BridgeToAgentEditorial team

What Cloudflare's four dimensions actually check — a working walkthrough

If you've run your URL through Cloudflare's isitagentready.com, you've seen the score break down across four dimensions: Discoverability, Content, Bot Access Control, and Capabilities. The names are clear enough. What's under each name is less obvious — Cloudflare ships a per-failure coding-agent prompt for the things you missed, but the per-check rationale (what spec, what RFC, what file format) sits one click away from the score view.

This post is the per-check walkthrough. For each of the four dimensions, here is what's actually being looked for, what open standard it traces back to, and where Cloudflare's scoring sits relative to that standard. The goal is to make the verdict legible enough that you can decide which failing checks to act on first and which ones are vendor-aligned rather than ecosystem-aligned.

This is the working-depth companion to the Cloudflare vs Lighthouse pillar — that post compares the two frameworks against each other; this one zooms in on Cloudflare alone.


Why four dimensions

Cloudflare's framing maps the agent-readiness problem onto four sequential surfaces an autonomous agent encounters when visiting your site for the first time:

  1. Discoverability — can the agent enumerate what's on the site?
  2. Content — once content is found, can the agent read it in a structured way?
  3. Bot Access Control — does the site say anything explicit about what AI systems may do with the content?
  4. Capabilities — what programmatic surfaces (APIs, actions, MCP servers) can the agent invoke?

Each dimension is scored independently. A site can be strong on Discoverability and Content (typical for content-heavy WordPress sites or well-built Shopify stores) while scoring zero on Bot Access Control (no per-bot rules) and Capabilities (no MCP server, no API catalog). The dimensions are not weighted equally and Cloudflare doesn't publish the intra-dimension weighting algorithm.


Dimension 1 — Discoverability

The check is whether an autonomous agent can enumerate your site's URL surface and find your important pages without rendering JavaScript or guessing.

What Cloudflare actually checks:

  • robots.txt exists and parses cleanly per RFC 9309 (the robots exclusion standard's formal IETF spec). Cloudflare verifies the file is present, returns 200, and is structurally valid.
  • sitemap.xml is referenced from robots.txt via a Sitemap: directive. Bare-fact check; the sitemap doesn't need to be huge, it just needs to be discoverable from robots.txt.
  • HTTP Link headers per RFC 8288 — Cloudflare looks for Link headers that expose key resources via the HTTP envelope rather than via HTML parsing. Sites that emit Link headers for canonical URLs, alternate-content URLs, or feed URLs score higher here.
  • Crawler-enumeration hygiene — Cloudflare runs heuristics on whether your sitemap can be reasonably-completely enumerated without rendering JavaScript. A heavily JS-routed SPA where the sitemap points at routes that 404 without hydration will score worse than a server-rendered site whose every sitemap URL returns 200 on first byte.

Common failure modes:

  • robots.txt present but malformed (extra trailing characters, BOM, the wrong line endings on Windows-edited files)
  • Sitemap exists at /sitemap.xml but isn't referenced from robots.txt
  • Sitemap URLs that return 404 without JavaScript (SPA without server-side rendering)

Quick-fix path: Confirm robots.txt includes a Sitemap: line. Confirm three random sitemap URLs return 200 when fetched without a browser. If the sitemap is JS-dependent, that's a larger infrastructure decision (server-side rendering or static prerendering) and not a kit-level fix.


Dimension 2 — Content

The check is whether the agent, having discovered URLs, can consume the content in a form designed for agents rather than humans.

What Cloudflare actually checks:

  • llms.txt present at the root, parseable against the llmstxt.org reference format. The check is more strict than "file exists" — Cloudflare verifies a top-level H1, structured sections, real URLs with descriptions. Hand-written files with malformed Markdown often pass the presence check and fail the format check.
  • Markdown-for-agents fallback URLs — the emerging convention of serving .md variants of HTML pages (so an agent can fetch /about.md to get the Markdown source of /about). Cloudflare scores higher when these fallback URLs exist and serve the right content-type.
  • Plain-text accessibility of primary content — heuristics on whether your home page's main content survives stripping JavaScript and CSS. A site whose homepage renders to empty <body> when JS is disabled scores worse on this check.

Common failure modes:

  • llms.txt present but generated by a tool that ships hardcoded summary text instead of real content
  • Markdown fallback URLs not implemented (this is rare to find on any site in 2026 — it's the most forward-looking check Cloudflare scores)
  • Homepage entirely client-side-rendered with no SSR or prerendering

Quick-fix path: Ship a parseable llms.txt first. The BridgeToAgent kit generates one from a sitemap-driven crawl that validates against the llmstxt.org reference parser as a build gate. Markdown-fallback URLs are a larger project; they're not a high-weight check inside the Content dimension and can wait.


Dimension 3 — Bot Access Control

The check is whether your site declares — explicitly, in a machine-readable way — what AI systems may do with its content. This is the dimension Cloudflare invests the most in conceptually, because access control is the part of the agent-readiness problem closest to Cloudflare's product line (the edge network gates bot traffic regardless of what scoring frameworks say).

What Cloudflare actually checks:

  • Per-bot rules in robots.txt for the major AI crawlers: GPTBot (OpenAI), ClaudeBot and Claude-Web (Anthropic), PerplexityBot (Perplexity), Google-Extended (Google's AI-training-only opt-in), CCBot (Common Crawl, which feeds many training datasets), Applebot-Extended, Bytespider (ByteDance). The check is whether each bot has an explicit User-agent: <bot> block — even an empty Disallow: is treated as more explicit than absence.
  • Content Signals — the emerging convention (still being standardized) of declaring training-data permissions separately from inference-grounding permissions separately from search-indexing permissions. Sites that adopt early-mover signals here score higher; the spec is moving and Cloudflare scores on the version of the convention they steward.
  • Web Bot Auth signals — emerging authentication standards for verifying bot identity (so a site can trust that a request claiming to be ClaudeBot is actually Anthropic-originated rather than spoofed). Adoption is early; checking is more about whether your site responds to challenges than whether it actively enforces.

Common failure modes:

  • robots.txt with a single global User-agent: * block and no per-bot rules — the most common pattern on the web today and the one Cloudflare scores worst on this dimension
  • Old-format AI bot blocks (using deprecated user-agent strings — the AI crawler ecosystem renames bots regularly)
  • Conflicting signals between robots.txt and meta robots tags

Quick-fix path: Decide your publisher position on AI training, AI search, and AI inference grounding. Then encode that decision in per-bot rules in robots.txt. This is a publisher decision more than a technical fix — a kit can't write your AI policy for you. The Cloudflare vs Lighthouse pillar frames why this dimension is more relevant to enterprise / publisher use cases than to SMB transactional sites.


Dimension 4 — Capabilities

The check is what programmatic surfaces your site exposes to agents — APIs they can call, actions they can invoke, MCP servers they can connect to. This is the dimension most aligned with Cloudflare's own product stack.

What Cloudflare actually checks:

  • MCP Server Card at /.well-known/mcp/server-card.json — the Model Context Protocol discovery convention. Cloudflare scores presence and structural validity. The MCP spec is real, open, and Anthropic-stewarded; the server card discovery convention is Cloudflare-stewarded alongside the broader ecosystem.
  • API Catalog discoverability — whether your site exposes a machine-readable index of its public API endpoints (OpenAPI specs, GraphQL introspection, hypermedia API entry points). Sites without a public API score lower here by definition, which means the dimension penalizes content-only sites that don't have one and were never going to.
  • OAuth server discovery at /.well-known/oauth-authorization-server per RFC 8414. Whether your site can be authenticated against by an agent on behalf of a user. Most sites don't ship this and most sites don't need to.
  • Agent Skills declarations — Cloudflare-stewarded specification for declaring reusable agent actions. Adoption is early and concentrated in the Cloudflare-adjacent ecosystem.

Common failure modes:

  • No MCP server at all (true for ~99% of SMB sites in 2026)
  • agents.json present but doesn't contribute to this dimension because it's a different file format from the MCP Server Card (the two share intent but not file location or schema)
  • Public API exists but no OpenAPI spec or discovery convention

Quick-fix path: Honest answer — most of this dimension is out of scope for SMB sites. If you don't run an MCP server, you don't have an MCP Server Card to ship. If you don't have a public API, you don't have an OpenAPI spec to expose. The honest move is to score this dimension low and accept the score reflects the dimension's vendor-alignment more than your site's actual agent-readiness for the buyers who will visit it. Ship agents.json for the partial Capabilities credit and the cross-framework win on Lighthouse.


The pattern across dimensions

A useful read of Cloudflare's score: dimensions 1 and 2 (Discoverability + Content) reward standards adoption that benefits any agent, regardless of which vendor stack the agent runs on. Dimensions 3 and 4 (Bot Access Control + Capabilities) reward standards adoption that's more weighted toward Cloudflare-stewarded conventions (Content Signals, MCP Server Cards, Agent Skills).

This isn't a criticism — Cloudflare is open about which parts of their score are vendor-aligned versus standards-aligned, and the per-failure prompts they ship help technical teams act on either. It is a useful framing for buyers who want to know whether a low score is "your site is missing something AI agents need" or "your site is missing something Cloudflare's product line cares about."

For SMBs without an existing MCP server, without a public API, and without a strong publisher position on AI training rights: dimensions 1 and 2 are the high-leverage place to act. Dimensions 3 and 4 are honest to score lower until you have a reason to invest in them.


What this means in practice

If you ran isitagentready.com and got a confusing score:

  1. Read the per-dimension breakdown, not the headline number. A 50-point score with strong Discoverability + Content and weak Bot Access Control + Capabilities is a perfectly reasonable score for most SMB sites. A 70-point score where Bot Access Control is the only thing carrying you isn't.

  2. Decide which dimensions are in your scope to fix. Discoverability + Content are file-presence and structure problems — solvable with the three kit files and a sitemap-from-robots-txt reference. Bot Access Control is a policy decision you encode in robots.txt. Capabilities depends on whether your business actually has a programmatic surface to expose.

  3. Cross-reference with Lighthouse Agentic Browsing. Six of the nine Lighthouse audits flip from fail to pass when you ship the three kit files; Cloudflare's Content dimension and partial Capabilities credit move simultaneously. The same handful of files is the highest-leverage move regardless of which framework you started with.

  4. Re-run both audits in 30 days to confirm the changes propagated. If only one of the two scores moved, the deploy probably missed something — most commonly the <link rel="alternate"> snippet on the homepage <head> that auto-discovery depends on.


Related

All posts →