We built a free `llms.txt` validator — open-source, no signup, MIT-licensed

Live today at bridgetoagent.com/tools/llms-txt-validator. Also on npm as @bridgetoagent-com/llms-txt-validator and on GitHub at github.com/bridgetoagent/llms-txt-validator. MIT license. No telemetry. No external dependencies beyond fetch.

Why

We catalogued six failure modes that show up in roughly 80% of free-generator llms.txt output last week — link rot, hallucinated URLs, malformed Markdown, missing sections, generic placeholders. The post ended with "run a 30-second spot-check before you deploy." That post should have ended with "run this tool." We built it.

Three audiences this is for:

Anyone who used a free generator and wants to verify the file before shipping.
Developers writing llms.txt by hand who want a CI-runnable lint instead of eyeballing the spec.
AI agent / spec community who want a reference implementation of the llmstxt.org spec as a parser they can fork, audit, or wrap.

What it checks

Three layers, top to bottom by enforcement strictness:

1 · Structure (always on)

# Title is present and is the first non-blank content
No duplicate H1 headings
Optional > blockquote description captured immediately after the title
## Section headings used for resource groups (H2 level)
H3+ headings flagged as info (uncommon in llms.txt)
Empty sections flagged

2 · Link bullets

- [text](url) or - [text](url): description syntax — anything else inside a section's bullet list flagged as malformed-link
Empty link text or URL
Relative URLs (/docs/foo) — llms.txt is consumed by external agents, must be absolute
Fragment-only URLs (#anchor) — meaningless to agents
mailto: URLs flagged as unusual
http:// URLs flagged (prefer https:// so agents that refuse insecure fetches don't drop the link)
Duplicate URLs as a warning, duplicate link text as info

3 · Reachability (opt-in)

Check the box, or pass --check-links on the CLI, to actually request every link:

Bounded-concurrency HEAD requests (8 in parallel by default)
Falls back to GET when HEAD returns 405 or 501
link-non-2xx for 4xx/5xx responses
link-unreachable for network errors and timeouts (5 second default)
link-slow warning for links above the 3-second threshold

User-agent: bridgetoagent-llms-txt-validator/0.1 (+https://github.com/bridgetoagent/llms-txt-validator) — identifies itself so server logs aren't anonymous.

How to use it

Hosted

Paste, upload, or fetch by URL at /tools/llms-txt-validator. Result renders inline. No data leaves the validation request — we don't store the content you paste.

CLI

# Local file
npx @bridgetoagent-com/llms-txt-validator ./public/llms.txt

# Fetch from URL
npx @bridgetoagent-com/llms-txt-validator https://example.com/llms.txt

# With reachability check
npx @bridgetoagent-com/llms-txt-validator ./public/llms.txt --check-links

# Machine-readable for CI
npx @bridgetoagent-com/llms-txt-validator ./public/llms.txt --json > report.json

Exit codes: 0 pass, 1 warnings only, 2 errors. Drop it into CI:

- name: Validate llms.txt
  run: npx @bridgetoagent-com/llms-txt-validator ./public/llms.txt --check-links

Library

import { validate } from "@bridgetoagent-com/llms-txt-validator";

const source = await fs.readFile("./llms.txt", "utf8");
const report = await validate(source, { checkReachability: true });

console.log(report.status);   // "pass" | "pass_with_warnings" | "fail"
console.log(report.issues);   // [{ severity, code, message, line, ... }]
console.log(report.parsed);   // parsed document tree

Use parse(source) if you just want the parsed document tree without validation — useful if you want to write your own custom checks downstream.

How this differs from the homepage audit

The homepage audit (bridgetoagent.com) is a lead-gen tool. It captures email, runs against five audits in 90 seconds, and routes you toward the $49 kit.

This validator is the opposite shape: no email, no signup, no time limit, no recommendation. You paste a file, you get a report. It does one thing — validate llms.txt content — and exposes the result as both UI and machine-readable JSON.

The two share validation logic. The same engine runs inside the kit's generator before every output ships, so customers don't get malformed files. We extracted it, added a CLI + npm package wrapper, hardened it for arbitrary input, and made it public.

Issue codes

Stable identifiers for every kind of finding. Pin on these in CI or downstream tooling — we don't rename them once they're shipped.

Code	Severity	Meaning
`missing-title`	error	No `# Title` heading found
`title-not-first`	error	Content appears before the title
`title-not-h1`	error	First heading is not H1
`duplicate-title`	warning	Multiple H1 headings
`section-wrong-level`	info	H3+ heading where H2 is conventional
`empty-section`	warning	Section has no link bullets
`malformed-link`	error	Bullet line is not a valid Markdown link
`link-missing-url`	error	Link `[text]()` has no URL
`link-empty-text`	warning	Link `[](url)` has no display text
`link-relative-url`	warning	Root-relative URL — must be absolute
`link-hash-only`	warning	Fragment-only URL
`link-mailto`	info	`mailto:` URL — unusual
`link-non-https`	warning	`http://` — prefer `https://`
`duplicate-url`	warning	Same URL appears more than once
`duplicate-link-text`	info	Same link text appears more than once
`link-unreachable`	error	Network error or timeout
`link-non-2xx`	error	HTTP 4xx or 5xx response
`link-slow`	warning	Response above slow threshold
`no-content-after-title`	warning	Title exists but nothing else
`trailing-whitespace`	info	Line has trailing whitespace
`tabs-instead-of-spaces`	info	Line contains tab characters

The full list lives in the README.

What it deliberately doesn't do

A list of things people will ask for and our reasoning for not shipping them:

Score the content quality. We can verify the parser-level conformance and link health. We can't verify that the descriptions are good or that the sections are useful without making a judgment call. That's a different tool — one that the BridgeToAgent kit does because it can read your real DOM to ground the answer.
Auto-fix the file. Validators report. Fixers rewrite. They're different products with different risk profiles. We may add a --fix flag for safe rewrites (trailing whitespace, http→https) in a future version. Anything beyond that is the kit's territory.
Crawl the URLs. Reachability check verifies the URLs resolve. It doesn't verify the linked content is what you said it is. We considered MIME-type assertions but decided they'd add noise without proportional signal — llms.txt doesn't constrain what type of resource each link points at.

What's next

Two more tools from the same track ship in the coming weeks:

agents.json validator (target: week 6 of the Q3 campaign) — schema conformance for the agents.json spec, typed-parameter completeness check that maps directly to the Lighthouse agents-json-actions-typed audit, endpoint reachability.
Standalone Lighthouse Agentic scorer (target: week 9) — runs all 9 Lighthouse Agentic Browsing audits against a URL, gives per-audit pass/fail + score breakdown, no signup. Will use the validators above as primitives.

If you have a real-world llms.txt file we miss-parse, open an issue — that's the fastest way to make the validator better.

We built a free llms.txt validator — open-source, no signup, MIT-licensed

We built a free `llms.txt` validator — open-source, no signup, MIT-licensed

Why

What it checks

1 · Structure (always on)

2 · Link bullets

3 · Reachability (opt-in)

How to use it

Hosted

CLI

Library

How this differs from the homepage audit

Issue codes

What it deliberately doesn't do

What's next

Related reading

WebMCP for store owners — what it is and what to do about it

UCP is Shopify-native — here's what the Agent Kit adds on top

The 2026 agent-traffic landscape — who actually drives requests at your site

We built a free llms.txt validator — open-source, no signup, MIT-licensed

Why

What it checks

1 · Structure (always on)

2 · Link bullets

3 · Reachability (opt-in)

How to use it

Hosted

CLI

Library

How this differs from the homepage audit

Issue codes

What it deliberately doesn't do

What's next

Related reading

Keep reading

WebMCP for store owners — what it is and what to do about it

UCP is Shopify-native — here's what the Agent Kit adds on top

The 2026 agent-traffic landscape — who actually drives requests at your site

We built a free `llms.txt` validator — open-source, no signup, MIT-licensed