BridgeToAgent
Explainer7 min read

We built a free llms.txt validator — open-source, no signup, MIT-licensed

Ships today as both a hosted web tool at bridgetoagent.com/tools/llms-txt-validator and an MIT-licensed npm package (@bridgetoagent-com/llms-txt-validator). Checks parser conformance against the llmstxt.org reference spec, malformed link syntax, missing required sections, duplicate URLs, and optional link reachability. Same validation engine we run inside the BridgeToAgent kit's generator — extracted, hardened, and made public so anyone can use it before, during, or instead of buying the kit.

BridgeToAgentEditorial team

We built a free llms.txt validator — open-source, no signup, MIT-licensed

Live today at bridgetoagent.com/tools/llms-txt-validator. Also on npm as @bridgetoagent-com/llms-txt-validator and on GitHub at github.com/bridgetoagent/llms-txt-validator. MIT license. No telemetry. No external dependencies beyond fetch.

Why

We catalogued six failure modes that show up in roughly 80% of free-generator llms.txt output last week — link rot, hallucinated URLs, malformed Markdown, missing sections, generic placeholders. The post ended with "run a 30-second spot-check before you deploy." That post should have ended with "run this tool." We built it.

Three audiences this is for:

  1. Anyone who used a free generator and wants to verify the file before shipping.
  2. Developers writing llms.txt by hand who want a CI-runnable lint instead of eyeballing the spec.
  3. AI agent / spec community who want a reference implementation of the llmstxt.org spec as a parser they can fork, audit, or wrap.

What it checks

Three layers, top to bottom by enforcement strictness:

1 · Structure (always on)

  • # Title is present and is the first non-blank content
  • No duplicate H1 headings
  • Optional > blockquote description captured immediately after the title
  • ## Section headings used for resource groups (H2 level)
  • H3+ headings flagged as info (uncommon in llms.txt)
  • Empty sections flagged

2 · Link bullets

  • - [text](url) or - [text](url): description syntax — anything else inside a section's bullet list flagged as malformed-link
  • Empty link text or URL
  • Relative URLs (/docs/foo) — llms.txt is consumed by external agents, must be absolute
  • Fragment-only URLs (#anchor) — meaningless to agents
  • mailto: URLs flagged as unusual
  • http:// URLs flagged (prefer https:// so agents that refuse insecure fetches don't drop the link)
  • Duplicate URLs as a warning, duplicate link text as info

3 · Reachability (opt-in)

Check the box, or pass --check-links on the CLI, to actually request every link:

  • Bounded-concurrency HEAD requests (8 in parallel by default)
  • Falls back to GET when HEAD returns 405 or 501
  • link-non-2xx for 4xx/5xx responses
  • link-unreachable for network errors and timeouts (5 second default)
  • link-slow warning for links above the 3-second threshold

User-agent: bridgetoagent-llms-txt-validator/0.1 (+https://github.com/bridgetoagent/llms-txt-validator) — identifies itself so server logs aren't anonymous.

How to use it

Hosted

Paste, upload, or fetch by URL at /tools/llms-txt-validator. Result renders inline. No data leaves the validation request — we don't store the content you paste.

CLI

# Local file
npx @bridgetoagent-com/llms-txt-validator ./public/llms.txt

# Fetch from URL
npx @bridgetoagent-com/llms-txt-validator https://example.com/llms.txt

# With reachability check
npx @bridgetoagent-com/llms-txt-validator ./public/llms.txt --check-links

# Machine-readable for CI
npx @bridgetoagent-com/llms-txt-validator ./public/llms.txt --json > report.json

Exit codes: 0 pass, 1 warnings only, 2 errors. Drop it into CI:

- name: Validate llms.txt
  run: npx @bridgetoagent-com/llms-txt-validator ./public/llms.txt --check-links

Library

import { validate } from "@bridgetoagent-com/llms-txt-validator";

const source = await fs.readFile("./llms.txt", "utf8");
const report = await validate(source, { checkReachability: true });

console.log(report.status);   // "pass" | "pass_with_warnings" | "fail"
console.log(report.issues);   // [{ severity, code, message, line, ... }]
console.log(report.parsed);   // parsed document tree

Use parse(source) if you just want the parsed document tree without validation — useful if you want to write your own custom checks downstream.

How this differs from the homepage audit

The homepage audit (bridgetoagent.com) is a lead-gen tool. It captures email, runs against five audits in 90 seconds, and routes you toward the $49 kit.

This validator is the opposite shape: no email, no signup, no time limit, no recommendation. You paste a file, you get a report. It does one thing — validate llms.txt content — and exposes the result as both UI and machine-readable JSON.

The two share validation logic. The same engine runs inside the kit's generator before every output ships, so customers don't get malformed files. We extracted it, added a CLI + npm package wrapper, hardened it for arbitrary input, and made it public.

Issue codes

Stable identifiers for every kind of finding. Pin on these in CI or downstream tooling — we don't rename them once they're shipped.

CodeSeverityMeaning
missing-titleerrorNo # Title heading found
title-not-firsterrorContent appears before the title
title-not-h1errorFirst heading is not H1
duplicate-titlewarningMultiple H1 headings
section-wrong-levelinfoH3+ heading where H2 is conventional
empty-sectionwarningSection has no link bullets
malformed-linkerrorBullet line is not a valid Markdown link
link-missing-urlerrorLink [text]() has no URL
link-empty-textwarningLink [](url) has no display text
link-relative-urlwarningRoot-relative URL — must be absolute
link-hash-onlywarningFragment-only URL
link-mailtoinfomailto: URL — unusual
link-non-httpswarninghttp:// — prefer https://
duplicate-urlwarningSame URL appears more than once
duplicate-link-textinfoSame link text appears more than once
link-unreachableerrorNetwork error or timeout
link-non-2xxerrorHTTP 4xx or 5xx response
link-slowwarningResponse above slow threshold
no-content-after-titlewarningTitle exists but nothing else
trailing-whitespaceinfoLine has trailing whitespace
tabs-instead-of-spacesinfoLine contains tab characters

The full list lives in the README.

What it deliberately doesn't do

A list of things people will ask for and our reasoning for not shipping them:

  • Score the content quality. We can verify the parser-level conformance and link health. We can't verify that the descriptions are good or that the sections are useful without making a judgment call. That's a different tool — one that the BridgeToAgent kit does because it can read your real DOM to ground the answer.
  • Auto-fix the file. Validators report. Fixers rewrite. They're different products with different risk profiles. We may add a --fix flag for safe rewrites (trailing whitespace, http→https) in a future version. Anything beyond that is the kit's territory.
  • Crawl the URLs. Reachability check verifies the URLs resolve. It doesn't verify the linked content is what you said it is. We considered MIME-type assertions but decided they'd add noise without proportional signal — llms.txt doesn't constrain what type of resource each link points at.

What's next

Two more tools from the same track ship in the coming weeks:

  • agents.json validator (target: week 6 of the Q3 campaign) — schema conformance for the agents.json spec, typed-parameter completeness check that maps directly to the Lighthouse agents-json-actions-typed audit, endpoint reachability.
  • Standalone Lighthouse Agentic scorer (target: week 9) — runs all 9 Lighthouse Agentic Browsing audits against a URL, gives per-audit pass/fail + score breakdown, no signup. Will use the validators above as primitives.

If you have a real-world llms.txt file we miss-parse, open an issue — that's the fastest way to make the validator better.

Related reading

All posts →