Inside Chrome 146 — what lighthouse --only-categories=agentic-browsing actually does
Running Lighthouse Agentic Browsing from the command line skips Chrome DevTools, scripts cleanly into CI, and emits JSON that you can pipe into dashboards. This post unpacks what the CLI flag does internally, the JSON output shape, the flags that matter (--throttling-method, --emulated-form-factor, --screenEmulation), and a reference CI workflow that turns the audit into a per-PR gate.
Inside Chrome 146 — what lighthouse --only-categories=agentic-browsing actually does
If you're shipping the kit files programmatically — generating them in a build pipeline, validating them in CI, gating PRs on the audit passing — the right tool is Lighthouse on the command line, not Chrome DevTools. This post is the technical deep-dive on what the CLI flag does, the JSON it emits, the flags that matter for stability, and the CI shape that turns the audit into an enforced gate.
This is the cluster-2 supporting post written for developers — if you're an SMB-owner trying to manually verify your install, the 5-minute install audit is the right tab. If you're wiring Lighthouse into your build, keep reading.
Install
npm install -g lighthouse
# or
npx lighthouse --version
The Agentic Browsing category requires Lighthouse 12.4 or higher (matches Chrome 146's category-bundling). Older versions silently ignore the --only-categories=agentic-browsing flag and run nothing.
Confirm:
lighthouse --help | grep -A 2 only-categories
Should list agentic-browsing alongside performance, accessibility, best-practices, seo, and pwa.
The basic invocation
lighthouse https://yourdomain.com \
--only-categories=agentic-browsing \
--output=json \
--output-path=./audit.json \
--chrome-flags="--headless"
What this does:
- Launches a headless Chrome instance.
- Navigates to the URL.
- Runs the Agentic Browsing category audits — all nine of them.
- Emits a JSON report at the output path.
- Exits 0 (success) or non-zero (audit failures, depending on
--fail-on=).
--chrome-flags="--headless" is the load-bearing flag in CI environments where there's no display server. Locally you can drop it to watch Chrome actually load the page.
The JSON output shape
Lighthouse's JSON output is structured around categories and audits:
{
"categories": {
"agentic-browsing": {
"id": "agentic-browsing",
"title": "Agentic Browsing",
"score": 0.78,
"auditRefs": [
{ "id": "llms-txt-present", "weight": 10 },
{ "id": "llms-txt-well-formed", "weight": 10 },
{ "id": "agents-json-present", "weight": 10 },
{ "id": "agents-json-actions-typed", "weight": 15 },
{ "id": "agent-runbook-present", "weight": 10 },
{ "id": "auto-discovery-links", "weight": 10 },
{ "id": "sitemap-discoverable", "weight": 10 },
{ "id": "schema-org-density", "weight": 20 },
{ "id": "webmcp-annotations", "weight": 5 }
]
}
},
"audits": {
"llms-txt-present": {
"id": "llms-txt-present",
"title": "...",
"score": 1,
"displayValue": "Present"
},
"agents-json-actions-typed": {
"id": "agents-json-actions-typed",
"score": 0,
"details": {
"items": [
{
"action": "search",
"issue": "Parameter 'q' has no declared type"
}
]
}
}
}
}
Per-audit weights inside the category as of the experimental spec: schema-org-density weighs the most (20%), agents-json-actions-typed next (15%), the file-presence audits 10% each, webmcp-annotations lowest (5% because the spec is still moving). Weights shift as the category stabilizes — re-read the spec page before relying on weights for prioritization.
The details.items array on failing audits is the load-bearing diagnostic — it tells you exactly which file/action/parameter triggered the failure. Pipe it into your CI logs.
The flags that matter
--throttling-method
Default is simulate (Lighthouse models a slower network in software). For agent-readiness audits, the kit files are tiny and the network throttling doesn't change scores — set --throttling-method=devtools or --throttling-method=provided to skip the network simulation and shave 5-10 seconds off the run.
--emulated-form-factor
Lighthouse defaults to mobile emulation. For Agentic Browsing audits this rarely matters (the audits check files and head tags, not viewport), but if your site renders different HTML to mobile vs desktop, run twice with --emulated-form-factor=desktop and --emulated-form-factor=mobile and compare the per-audit scores.
--screenEmulation.disabled
If your CI runner is non-graphical and you're seeing screen-emulation errors, pass --screenEmulation.disabled=true. The Agentic Browsing category doesn't rely on visual rendering — the audits inspect HTML structure and file existence, not pixels.
--max-wait-for-load
Default 45000ms. If your site has slow third-party scripts (chat widgets, analytics), Lighthouse waits for the load event before scoring. Set to a lower value (--max-wait-for-load=15000) to time-bound the audit in CI. Files served quickly will pass; pages with slow loads might get partial scores.
--save-assets
Useful for debugging — emits screenshots, traces, and DevTools metrics alongside the JSON. Generates ~50MB of artifacts per run; skip in production CI.
CI shape — gating PRs on the audit
A minimal GitHub Actions workflow that runs the audit on every PR and fails if the Agentic Browsing score drops below 80:
name: Lighthouse Agentic Browsing audit
on:
pull_request:
branches: [main]
jobs:
lighthouse:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: 20
- name: Build site
run: npm ci && npm run build
- name: Start preview server
run: npm run preview &
env:
PORT: 3000
- name: Wait for server
run: npx wait-on http://localhost:3000
- name: Install Lighthouse
run: npm install -g lighthouse
- name: Run Agentic Browsing audit
run: |
lighthouse http://localhost:3000 \
--only-categories=agentic-browsing \
--output=json \
--output-path=./audit.json \
--chrome-flags="--headless --no-sandbox" \
--quiet
- name: Check score threshold
run: |
SCORE=$(node -e "
const r = require('./audit.json');
console.log(r.categories['agentic-browsing'].score * 100);
")
echo "Agentic Browsing score: $SCORE"
if (( $(echo "$SCORE < 80" | bc -l) )); then
echo "Score below 80, failing PR"
exit 1
fi
- name: Upload audit JSON
uses: actions/upload-artifact@v4
with:
name: lighthouse-audit
path: audit.json
This runs on every PR, builds the site, starts a preview server, runs Lighthouse against the local URL, parses the score from the JSON, and fails the PR if the score drops below 80. The audit JSON is uploaded as an artifact for review.
Caveats:
- The audit needs publicly-resolvable kit files. If your build process generates
agents.json/llms.txt/agent-instructions.mdat runtime (not at build time), they won't be present in the preview server. Generate at build time so they're served as static files. - Headless Chrome on Ubuntu needs
--no-sandbox— the GitHub Actions Ubuntu runner doesn't allow sandboxed Chrome. The flag is GitHub-Actions-specific; remove it for self-hosted runners. - Score thresholds are noisy. Lighthouse scores vary ±2 points between runs even on identical content. Set the threshold conservatively (75 or 80) rather than tight (90+) unless you know your audits all pass at maximum scores.
Comparing the CLI to the DevTools panel
| Dimension | DevTools | CLI |
|---|---|---|
| Setup | Built into Chrome | npm install -g lighthouse |
| Output | Visual report | JSON / HTML |
| Scripting | Manual | Scriptable into CI |
| Throttling | Real network | Simulated by default |
| Comparing runs | Manual | jq against multiple JSON files |
| CI integration | None | Native |
| First-time learning curve | Low | Medium |
The DevTools panel is right for ad-hoc audits during development. The CLI is right for build pipelines, PR gates, and tracking scores over time. Many teams use both — DevTools for "is this fix working" iteration, CLI for "the audit must stay green" enforcement.
Tracking scores over time
If you want a dashboard of your Agentic Browsing score over time, the easiest path is Lighthouse CI — Google's reference server for ingesting CLI output. Self-host it on Vercel or Render in an afternoon. Configure your CI workflow to upload audit JSON after every successful run, and the LHCI dashboard plots score deltas per-build.
Alternative: log scores to your existing observability stack (Grafana, Datadog, simple Postgres table) as a separate lighthouse_score metric per category. The data shape is straightforward — timestamp, branch, commit SHA, category, score, per-audit pass/fail. Five rows per run.
Related
- Lighthouse Agentic Browsing FAQ — every common question, answered → — Long-tail Q&A coverage including this audit's most-searched questions.
- Lighthouse Agentic Browsing — every audit, every fix → — the canonical per-audit fix reference
- Chrome added an Agentic Browsing audit to Lighthouse → — the category announcement
- Cloudflare's Agent Readiness Score vs Chrome Lighthouse → — when to use each
- How the BridgeToAgent kit maps to every audit → — what's kit-fixable, what's CMS/theme-side
- The 5-minute install audit for SMBs who can't read code → — the non-developer companion to this CLI post
- Agent-readiness scoring frameworks reference → — Lighthouse, Cloudflare, Bridge AI, The Optimisers compared
- Lighthouse Agentic landing page → — the BTA Lighthouse-focused reference page