Specs Without Practice: 20 of 66 Agent-Readiness Checks Have Nothing to Measure

In April 2026, a correlation study measured 66 per-check agent-readiness signals across 908 leading B2B SaaS brands and three independent scanners. Twenty of those signals could not be analyzed for predictive power. Adoption was below the variance threshold. Less than 5% of brands shipped them. The most-discussed protocol family in the agent-readiness category, the one trade press and vendor pages cover most actively, clusters near zero across the brand sample.

This piece reads the 20-of-66 number as the headline of the study, not a footnote. The specs are real and well-designed. The practice has not arrived yet. The two are different problems, and only one of them is a measurement story. We measure it.

Why do 20 of 66 agent-readiness checks have nothing to measure?

Less than 5% of brands have shipped them. A correlation study of 908 leading B2B SaaS brands measured 66 per-check agent-readiness signals across three independent scanners. Twenty signals failed the variance threshold because adoption is too thin to study. The bleeding-edge protocol family clusters near zero. The specs exist. The practice has not arrived.

The full correlation study was published 2026-04-26 by Respectarium. It scanned brands with Cloudflare's isitagentready.com, Fern's open-source afdocs CLI, and Respectarium's own scanner, then tested 50 evaluable predictors against five LLM-visibility outcomes spanning Claude, GPT, and Gemini. The 20 zero-variance checks were filtered out before the regression even ran. The filter is mechanical: any check where one outcome (typically "fail" or "neutral") covers 95% or more of the sample is excluded because there is nothing to correlate against.

The 20 split into two buckets. Seventeen are below the 5% variance threshold because real-world adoption sits between 0.5% and 3.6% of the brand sample. Three are flat by measurement convention rather than thin adoption: Respectarium's llms-txt-exists, llms-txt-valid, and llms-txt-size return "neutral" rather than "fail" when the file is missing, so they hold a single value across all 908 brands. Both buckets get reported separately rather than collapsed.

The named specs and their adoption rates across the 908-brand sample (per signals.csv and the per-scanner adoption breakdown):

Spec	Where checked	Adoption %	What it is
OAuth Protected Resource	`/.well-known/oauth-protected-resource`	0.5% - 1.5%	RFC 9728 metadata pointing at the auth server
MCP Server Card	`/.well-known/mcp/server-card.json`	0.7% - 1.1%	JSON declaring an MCP-tool endpoint to clients
AGENTS.md	`/AGENTS.md`	0.9%	Top-level convention file for coding agents
Agent Skills Discovery	`/.well-known/agent-skills/index.json`	1.1% - 1.7%	Index of capabilities exposed as discrete skills
Web Bot Auth	HTTP Message Signatures Directory	1.2% - 2.2%	Bot identity via signed requests
A2A Agent Card	`/.well-known/agent-card.json`	1.2% - 2.4%	Card describing the service to other agents
Content Signals	robots.txt directives	1.3% - 2.1%	Cloudflare / IETF AIPREF content-usage signals
API Catalog	`/.well-known/api-catalog`	1.5%	RFC 9727 linkset pointing at OpenAPI specs
OAuth / OIDC Discovery	`/.well-known/oauth-authorization-server`	3.4%	RFC 8414 endpoint discovery
llms.txt directive (Fern)	`/llms.txt` directive variant	3.4%	One detection variant of the reading-list file
Tabbed-content serialization	content layer	3.6%	Whether tabs are serialized for agents

Adoption percentages are rates within the 908-brand sample. The brand population is itself a selection effect: every brand entered the dataset because at least one of Claude, GPT, or Gemini named it in a category-level "top brands" prompt. Findings describe behavior among already-LLM-mentioned brands. Adoption among non-LLM-mentioned brands is plausibly lower, not higher. The selection direction does not flatter the headline.

Two specs from the same study did clear the variance filter and graduated to PROMOTE_SCORED: Cloudflare's aggregate readiness level (0 to 5) and Respectarium's markdown-negotiation check. Specs that test for content-layer behavior already in production have variance to study. Specs that test for the new agent-protocol stack do not yet.

Article 2 in this series reported that four signals which DO have variance move Claude and GPT in opposite directions on brand selection, both directions clearing FDR significance. Where adoption exists, the LLMs structurally disagree. Where adoption does not exist, no LLM has enough data to disagree about. This piece is about the second set.

Which agent-readiness specs have not yet arrived in practice?

Eleven distinct specs sit in the zero-variance bucket once per-scanner duplicates collapse. Each is a real published artifact, checked by at least one mainstream agent-readiness scanner, implemented by less than 4% of brands in the sample. The list groups into five protocol families.

Identity and auth-discovery layer. MCP Server Cards, A2A Agent Cards, RFC 9728 OAuth Protected Resource Metadata, Agent Skills Discovery, and the older RFC 8414 authorization-server discovery let agents discover what a service exposes and where to authenticate. Adoption: 0.5% to 3.4%. Cloudflare's own Radar data reports MCP Server Cards and API Catalogs together on fewer than 15 of the 200,000 most visited domains. Both samples show very thin adoption; the 908-brand sample reads higher, as the selection effect predicts.

Bot-identity layer. Web Bot Auth and Cloudflare's Content Signals let a site declare what it allows agents to do and verify which bots it trusts. 1% to 2% adoption. Both relatively new.

Coding-agent layer. AGENTS.md is a top-level file that codifies project-specific instructions for coding agents. Adoption: 0.9% in the brand sample. Adoption is higher in open-source repositories targeting coding agents, which this sample does not measure. Among commercial brand surfaces, AGENTS.md has not landed.

Commerce-payments layer. Cloudflare's payments-protocol family (x402, mpp, ucp, acp, ap2) cleared the variance filter at 11% to 24% but dropped for predictive nullity. The "variance" comes from Cloudflare emitting "neutral" for the 90% of sites flagged non-commerce and "fail" for the 10% commerce sites that have not implemented. Among the commerce subset, adoption is near-zero. The most visible of these protocols, x402, has collapsed in the wild per CoinDesk: $913,000 daily volume in October 2025, $28,000 daily volume in March 2026. Spec arrived. Practice moved away.

Reading-list layer. llms.txt is the most-discussed agent-readiness file. Fern's scanner finds it on 3.4% of brands. Respectarium's three llms.txt checks return "neutral" by design convention. The wider context is that the file's consumers (the LLMs themselves) largely do not fetch it. As Google Search Advocate John Mueller posted on Bluesky in June 2025, "FWIW no AI system currently uses llms.txt." Two-sided ecosystem nascency. Publishers have not shipped at scale. Consumers have not started reading.

The only mainstream brand-level spec ship logged in the prior 90 days at the time of measurement was Yoast's llms.txt support for Shopify on 2026-03-31. One ship in 90 days, against roughly fourteen vendor-side moves in the same window. Spec releases, scanner launches, rebrands. The asymmetry between announcement rate and deployment rate is the shape of the category right now.

Why is this nascency, not spec failure?

Specs without practice usually means one of two things. Either the specs are not designed for what the world wants, or the world has not caught up to specs that are fit for purpose. The data here favors the second reading.

The agent-protocol surface checked in this study was authored by serious organizations with serious review processes. RFC 9728 is IETF standards-track. Web Bot Auth originates from Cloudflare's bot-identity work. MCP Server Cards come from Anthropic's Model Context Protocol charter. A2A Agent Cards are part of Google's open A2A protocol. AGENTS.md has substantial coding-agent buy-in. Each has a viable consumer-side implementation. The specs are real.

What is missing is the deployment engine. Mainstream brands ship technical artifacts when the artifact unlocks measurable business value (schema markup, mobile-friendly rendering, sitemap registration). The agent-protocol stack does not yet meet that bar. An agent finding a service via its server card and calling it is not yet a path that drives revenue for the publisher. Without a clear feedback loop, the deployment cycle does not start.

x402 is the closest the category came to a forcing function in commerce. It was designed for agent micropayments. Coinbase backed it. Daily volume peaked above $900,000 in October 2025. Six months later it is gone. The infrastructure works. The demand did not show up. That is what spec-without-practice looks like at the protocol level: the bytes can flow, the wire format is correct, but no economic loop closes around the implementation.

Two-sided protocols compound the lag. A site shipping an MCP Server Card with no MCP-aware agents fetching it is shipping into a vacuum. An agent checking for /.well-known/agent-card.json against sites that do not publish one returns nothing useful. Both sides have to move close to in-step. The publisher side moves slower because there are more sites and longer release cycles. The consumer side has been moving faster but is still far from saturation. The result is the 1% to 3% adoption window the study measures, where neither side has crossed the threshold that makes shipping worth it for the next site.

This is also visible in scanner divergence. Different scanners measure different slices because the category has not hardened. When the 20 zero-variance checks are stripped out, the remaining 46 still split into protocol-weighted (Cloudflare) and content-weighted (Fern) views. No measurement consensus exists yet because the measurable surface is still forming. Scanners will agree more once practice arrives. They do not yet.

What does measuring "specs without practice" mean for strategy?

Two postures fit the data. One ships features assuming adoption is coming. One ships measurements showing adoption is not there yet. Both are valid stances. They answer different questions.

The first posture is what most agent-readiness vendor pages currently take. Cloudflare publishes Level 5 ("Agent-Native") on its readiness ladder, gated on four capability checks (MCP Server Card, OAuth Protected Resource, A2A Agent Card, API Catalog) that almost nobody in the measured sample passes. The ladder runs ahead of the field by design, marking what "ready" will look like once the field arrives. Fern's docs-agent scoring is similarly forward-looking. Both vendors are building infrastructure for an ecosystem they expect.

The second posture is the one this article ships from, and the one Respectarium took in publishing the correlation study. Specs exist. Adoption is thin. Less than 5% of leading B2B SaaS brands implement most of the protocol-stack specs. Respectarium measures that, names it, and re-measures on a quarterly cadence. The point is to give buyers and operators an empirical baseline against which forward-leaning vendor claims can be compared.

The reader takeaway is not "implement these 20 specs to improve your AI visibility." The data does not support that claim. The selection effect alone disqualifies it. The reader takeaway is "the agent-readiness category in 2026 is in early formation; treat any measurement of it as a snapshot, not a verdict." A site at 0% adoption on the 20 zero-variance checks is not behind. There is nowhere yet to be ahead. CompetLab AI Visibility tracking reports per-LLM outcomes separately by default precisely because cross-LLM averaging hides the disagreement Article 2 documented; the same logic applies to cross-spec averaging hiding the variance-filter problem this article documents.

Respectarium has committed to quarterly re-runs on the same 50 categories, same 908-brand baseline, same 11-script pipeline. The visible thing to watch is which checks move from below 5% adoption to above 5% between quarters. The first such move marks the moment a particular spec graduates from "interesting on paper" to "measurable in practice." The first 90 days will likely show very little movement; that is also publishable information. The cadence is the point. Quarterly publishing forces honesty in a category where it is otherwise easy to claim a finding that the next data set quietly reverses.

How was this measured, and what are the limits?

The Respectarium correlation study (study-2026-04) is a cross-sectional snapshot of 908 brands across 50 B2B SaaS categories. Three independent agent-readiness scanners produced 72 measurements per brand. Pre-registered analytical thresholds were committed in writing on 2026-04-24, two days before data analysis began on 2026-04-26. All analyses are deterministic. The merged dataset, 11 stats scripts, and all interpretation files are public on GitHub. The signal-by-signal verdict table is the same signals.csv referenced earlier.

Selection effect

Every brand in the dataset was selected because at least one of Claude, GPT, or Gemini named it in response to a category-level "top brands" prompt. The 5% adoption threshold reported in this article is therefore an adoption rate among already-LLM-mentioned brands. A representative web sample would almost certainly show lower adoption rates, not higher ones, because the LLM-mentioned subset skews toward larger and more technically mature brands. The 20-of-66 finding generalizes downward from the sample to the broader brand universe, not upward.

Variance threshold and "zero variance" framing

A check is excluded from correlation analysis when one outcome (typically "fail" or "neutral") covers 95% or more of the sample. This is a standard pre-registered statistical filter, not an editorial judgment. Three of the 20 are flat by measurement convention rather than thin adoption (Respectarium's three llms.txt checks). The remaining 17 are below the variance threshold because real adoption is below 5%. The study reports both buckets rather than collapsing them. The aggregate figure is honest only if the breakdown is named.

Cross-scanner consistency caveat

Same-named checks across different scanners measure different things in practice. Three pairs of "same-named" checks correlate at ρ ≈ 0.03 across scanners. Adoption rates in this article are reported per-scanner; do not assume "MCP Server Card" measured by Cloudflare is identical to "MCP Server Card" measured by Respectarium even where the spec is the same.

Conflict-of-interest disclosure

CompetLab authored one of the three scanners evaluated in this study. The pre-registered methodology (auditable in scripts/10-verdicts.ts) and the publication of findings unfavorable to CompetLab's own scanner (the v1 score's mean |ρ| of 0.016 across five outcomes, FDR-adjusted p of 0.69) are the primary mitigations. The full COI section lives in methodology.md §4. The companion Agent-Adoption Specification v1.0 is published under CC-BY 4.0 and is implementable by independent scanners.

The next quarterly re-run is scheduled for late July 2026. The visible thing to watch is whether any of the 20 zero-variance checks moves above 5% adoption. The most likely candidates given current spec momentum are llms.txt, AGENTS.md, and possibly Content Signals if Cloudflare default-enables it across more zones. Predictions are predictions. The point of the cadence is to let the data decide.

Scan your own site against the open spec at competlab.com/tools/agent-adoption-check.