We built a free llms.txt validator — open-source, no signup, MIT-licensed
Ships today as both a hosted web tool at bridgetoagent.com/tools/llms-txt-validator and an MIT-licensed npm package (@bridgetoagent-com/llms-txt-validator). Checks parser conformance against the llmstxt.org reference spec, malformed link syntax, missing required sections, duplicate URLs, and optional link reachability. Same validation engine we run inside the BridgeToAgent kit's generator — extracted, hardened, and made public so anyone can use it before, during, or instead of buying the kit.
We built a free llms.txt validator — open-source, no signup, MIT-licensed
Live today at bridgetoagent.com/tools/llms-txt-validator. Also on npm as @bridgetoagent-com/llms-txt-validator and on GitHub at github.com/bridgetoagent/llms-txt-validator. MIT license. No telemetry. No external dependencies beyond fetch.
Why
We catalogued six failure modes that show up in roughly 80% of free-generator llms.txt output last week — link rot, hallucinated URLs, malformed Markdown, missing sections, generic placeholders. The post ended with "run a 30-second spot-check before you deploy." That post should have ended with "run this tool." We built it.
Three audiences this is for:
- Anyone who used a free generator and wants to verify the file before shipping.
- Developers writing
llms.txtby hand who want a CI-runnable lint instead of eyeballing the spec. - AI agent / spec community who want a reference implementation of the llmstxt.org spec as a parser they can fork, audit, or wrap.
What it checks
Three layers, top to bottom by enforcement strictness:
1 · Structure (always on)
# Titleis present and is the first non-blank content- No duplicate H1 headings
- Optional
> blockquotedescription captured immediately after the title ## Sectionheadings used for resource groups (H2 level)- H3+ headings flagged as info (uncommon in
llms.txt) - Empty sections flagged
2 · Link bullets
- [text](url)or- [text](url): descriptionsyntax — anything else inside a section's bullet list flagged asmalformed-link- Empty link text or URL
- Relative URLs (
/docs/foo) —llms.txtis consumed by external agents, must be absolute - Fragment-only URLs (
#anchor) — meaningless to agents mailto:URLs flagged as unusualhttp://URLs flagged (preferhttps://so agents that refuse insecure fetches don't drop the link)- Duplicate URLs as a warning, duplicate link text as info
3 · Reachability (opt-in)
Check the box, or pass --check-links on the CLI, to actually request every link:
- Bounded-concurrency HEAD requests (8 in parallel by default)
- Falls back to GET when HEAD returns 405 or 501
link-non-2xxfor 4xx/5xx responseslink-unreachablefor network errors and timeouts (5 second default)link-slowwarning for links above the 3-second threshold
User-agent: bridgetoagent-llms-txt-validator/0.1 (+https://github.com/bridgetoagent/llms-txt-validator) — identifies itself so server logs aren't anonymous.
How to use it
Hosted
Paste, upload, or fetch by URL at /tools/llms-txt-validator. Result renders inline. No data leaves the validation request — we don't store the content you paste.
CLI
# Local file
npx @bridgetoagent-com/llms-txt-validator ./public/llms.txt
# Fetch from URL
npx @bridgetoagent-com/llms-txt-validator https://example.com/llms.txt
# With reachability check
npx @bridgetoagent-com/llms-txt-validator ./public/llms.txt --check-links
# Machine-readable for CI
npx @bridgetoagent-com/llms-txt-validator ./public/llms.txt --json > report.json
Exit codes: 0 pass, 1 warnings only, 2 errors. Drop it into CI:
- name: Validate llms.txt
run: npx @bridgetoagent-com/llms-txt-validator ./public/llms.txt --check-links
Library
import { validate } from "@bridgetoagent-com/llms-txt-validator";
const source = await fs.readFile("./llms.txt", "utf8");
const report = await validate(source, { checkReachability: true });
console.log(report.status); // "pass" | "pass_with_warnings" | "fail"
console.log(report.issues); // [{ severity, code, message, line, ... }]
console.log(report.parsed); // parsed document tree
Use parse(source) if you just want the parsed document tree without validation — useful if you want to write your own custom checks downstream.
How this differs from the homepage audit
The homepage audit (bridgetoagent.com) is a lead-gen tool. It captures email, runs against five audits in 90 seconds, and routes you toward the $49 kit.
This validator is the opposite shape: no email, no signup, no time limit, no recommendation. You paste a file, you get a report. It does one thing — validate llms.txt content — and exposes the result as both UI and machine-readable JSON.
The two share validation logic. The same engine runs inside the kit's generator before every output ships, so customers don't get malformed files. We extracted it, added a CLI + npm package wrapper, hardened it for arbitrary input, and made it public.
Issue codes
Stable identifiers for every kind of finding. Pin on these in CI or downstream tooling — we don't rename them once they're shipped.
| Code | Severity | Meaning |
|---|---|---|
missing-title | error | No # Title heading found |
title-not-first | error | Content appears before the title |
title-not-h1 | error | First heading is not H1 |
duplicate-title | warning | Multiple H1 headings |
section-wrong-level | info | H3+ heading where H2 is conventional |
empty-section | warning | Section has no link bullets |
malformed-link | error | Bullet line is not a valid Markdown link |
link-missing-url | error | Link [text]() has no URL |
link-empty-text | warning | Link [](url) has no display text |
link-relative-url | warning | Root-relative URL — must be absolute |
link-hash-only | warning | Fragment-only URL |
link-mailto | info | mailto: URL — unusual |
link-non-https | warning | http:// — prefer https:// |
duplicate-url | warning | Same URL appears more than once |
duplicate-link-text | info | Same link text appears more than once |
link-unreachable | error | Network error or timeout |
link-non-2xx | error | HTTP 4xx or 5xx response |
link-slow | warning | Response above slow threshold |
no-content-after-title | warning | Title exists but nothing else |
trailing-whitespace | info | Line has trailing whitespace |
tabs-instead-of-spaces | info | Line contains tab characters |
The full list lives in the README.
What it deliberately doesn't do
A list of things people will ask for and our reasoning for not shipping them:
- Score the content quality. We can verify the parser-level conformance and link health. We can't verify that the descriptions are good or that the sections are useful without making a judgment call. That's a different tool — one that the BridgeToAgent kit does because it can read your real DOM to ground the answer.
- Auto-fix the file. Validators report. Fixers rewrite. They're different products with different risk profiles. We may add a
--fixflag for safe rewrites (trailing whitespace, http→https) in a future version. Anything beyond that is the kit's territory. - Crawl the URLs. Reachability check verifies the URLs resolve. It doesn't verify the linked content is what you said it is. We considered MIME-type assertions but decided they'd add noise without proportional signal —
llms.txtdoesn't constrain what type of resource each link points at.
What's next
Two more tools from the same track ship in the coming weeks:
agents.jsonvalidator (target: week 6 of the Q3 campaign) — schema conformance for theagents.jsonspec, typed-parameter completeness check that maps directly to the Lighthouseagents-json-actions-typedaudit, endpoint reachability.- Standalone Lighthouse Agentic scorer (target: week 9) — runs all 9 Lighthouse Agentic Browsing audits against a URL, gives per-audit pass/fail + score breakdown, no signup. Will use the validators above as primitives.
If you have a real-world llms.txt file we miss-parse, open an issue — that's the fastest way to make the validator better.
Related reading
- Chrome just made Agentic Browsing default in Lighthouse — why this validator matters more right now: Lighthouse 13.3.0 (May 7, 2026) moved Agentic Browsing into the default config, so
llms-txt-well-formedis now an audit every PageSpeed Insights run returns - What free
llms.txtgenerators don't tell you — the failure modes this validator catches llms.txtskepticism, half right — context on why the spec is worth taking seriously despite the criticsllms.txtvsrobots.txt— how the two files relate- Every Lighthouse Agentic Browsing audit, every fix — the
llms-txt-well-formedaudit this validator helps you pass