Benchmarks · expected ranges · calibrating live

Citation Readiness benchmarks — research synthesis + expected ranges.

The Citation Readiness Score doesn't exist in a vacuum. It's a synthesis of published research (Aleyda Solis's 10-characteristic checklist · Semrush's 2025 AI-search data · Cloudflare's public crawl data) plus pre-launch fleet calibration on ~30 sites. This page is the honest map of what we know, what we expect, and what we'll learn as Free audits accumulate.

Honest framing: the expected ranges below are anchored to research + small-N internal data (~30 fleet sites, pre-launch). They are not yet a population study. As the Free Citation Readiness Score accumulates audits, this page calibrates against real measured data and we publish the updated ranges + the delta. Subscribe via RSS to be notified.

Verified anchors (from the operator's fleet)

Three fleet sites we operate and have direct measurement access to. These numbers are operator-verifiable and come from Plausible + Cloudflare Web Analytics + Cloudflare bot-crawler logs.

Site	Archetype	Measured	Source
HoldLens	Reference / financial data	194 unique visitors / 30 days · 10% inbound from chatgpt.com	Operator's own Plausible data, period 2026-05-04 to 2026-05-17
SourceScore	Reference / niche data	8 human UV / 30 days · 3,840 AI crawler hits / 30 days · 480:1 bot-to-human ratio	Cloudflare Web Analytics + bot-crawler logs, period 2026-04-19 to 2026-05-19
txtfeed.com	Reference / llms.txt directory	51,330 AI crawler hits / 30 days (fleet leader)	Cloudflare AI Crawl Control logs, period 2026-04-19 to 2026-05-19

Published research the framework anchors to

The Citation Readiness Score didn't invent its inputs. Four primary sources inform the rubric:

Source

Semrush — AI Search Behaviors 2025

Finding: Cited sources in AI-search responses see 8–18% click-through rate, vs 2–5% for Google SERP position 3.

How it's applied: The size of the gap is why AI citation is a measurable category, not vanity.

Source

Aleyda Solis — 10-characteristic LLM citation checklist (2024-2025)

Finding: LLM-citation eligibility correlates with 10 structural properties: Accessible, Useful, Recognizable, Extractable, Consistent, Corroborated, Credible, Differentiated, Fresh, Transactable.

How it's applied: Forms the core of the Citation Readiness Score's GEO Readiness dimension.

Source

Cloudflare — AI Crawl Control public data (2025)

Finding: Sites with explicit AI-bot allowlists in robots.txt see 2–3× higher AI crawler frequency than sites with default-permissive policies.

How it's applied: Anchors the Bot-Crawl Health dimension — explicit allowlist is high-leverage.

Source

CitationDesk fleet — pre-launch internal use (2026-04 to 2026-05)

Finding: The 5-dimension scoring rubric was applied to ~30 fleet sites pre-launch. Highest variance was in Dual Fit (the structural first-paragraph format).

How it's applied: The rubric is operator-tested before it became a public tool.

Expected ranges by archetype

These ranges are based on synthesis (the four sources above) + small-N internal calibration on the operator's fleet. Low / mid / high columns describe the bottom decile / median / top decile we'd expect.

Archetype	Expected low	Expected mid	Expected high	Typical limiter
Publisher (evergreen editorial)	0.25	0.40	0.65	Editorial lead-paragraphs hurt Dual Fit; entity-coherence gap common.
SaaS marketing site	0.30	0.45	0.70	Pricing + features pages score well; landing pages struggle with Dual Fit.
SaaS docs site	0.25	0.40	0.60	Built for human reading, not LLM extraction. Welcome-paragraphs are the gap.
Reference / data site	0.45	0.60	0.85	Fact-shaped first paragraphs are inherent. Entity coherence is the typical gap.
Programmatic SEO site	0.20	0.35	0.60	Templated thin content + JS-rendered body are common hard-blocks.
Calculator / tool	0.55	0.70	0.90	Highest-scoring archetype. Calculator output IS the extractable fact.
Indie newsletter / blog	0.25	0.40	0.65	Voice-driven content scores low on Dual Fit; freshness signal is the lift.

Per-dimension expected ranges

Mean score 0.0 — 1.0 across the five Citation Readiness dimensions. Same caveat — synthesis + internal calibration, to be measured against population data over time.

Dimension	Range across archetypes	Notes
SEO Foundation	0.55 — 0.85	Most sites already meet structural SEO minimums; the dimension is rarely the bottleneck.
GEO Readiness	0.30 — 0.70	Aleyda 10-characteristic checklist; biggest source of variance across archetypes.
Dual Fit	0.25 — 0.85	The biggest gap across archetypes. First-paragraph rewrites are highest-leverage.
Entity Coherence	0.30 — 0.75	Adding Person + Organization schema with sameAs lifts +0.20-0.30 across the board.
Bot-Crawl Health	0.50 — 0.95	Mostly solved by /llms.txt + robots.txt allowlist + CF AI Crawl Control = "Do not block".

The 8 fixes with the largest expected lift

Ranked roughly by expected dimension lift when applied to a typical underperforming page. Specific lift magnitude varies by archetype + by where the page started.

Rewrite	Expected lift	Notes
Add quote-ready fact in first paragraph	large — Dual Fit + GEO Readiness	Single highest-leverage fix across all archetypes.
Add Person + Organization schema with sameAs (≥2 profiles)	large — Entity Coherence	One JSON-LD block in global layout. Applies site-wide.
Reduce first paragraph to ≤600 chars	medium — Dual Fit	LLMs extract ~50-word paragraphs; longer truncates.
Add /llms.txt at site root	medium — Bot-Crawl Health	~200 bytes; Anthropic + OpenAI + Perplexity honor.
Add explicit AI-bot allowlist in /robots.txt	medium-large — Bot-Crawl Health	9 crawlers; signals "crawl welcome".
Add DefinedTerm schema for key concepts	small-medium — GEO Readiness	Wraps glossary terms for LLM extraction.
Rewrite title to query-intent + 50-60 chars	medium — SEO Foundation	Targets a specific query; not "Home" or brand-only.
Add dateModified within last 30 days	small — GEO Readiness · Freshness	Freshness signal for LLMs that weight recency.

Calibration commitment

The honest version of any benchmark page is: here's the synthesis + the anchors + the small-N data we have today, and here's how it calibrates as more data arrives.

As Free audits accumulate (every URL submitted via /tools/citation-readiness/), the score distribution per archetype updates.
When we have ≥100 audits per archetype, the "expected" columns become "observed" columns with sample sizes published.
Per-LLM citation rate data ships with Phase 1B (the polling engine that lands when operator credentials drop).
All published numbers will be CC-BY-4.0 with attribution + dataset access for researchers — same intent as the framework itself.

Notified when the calibration updates? Subscribe via RSS or email [email protected].

See where your URL falls in these ranges.

Run the Free Citation Readiness Score and compare your score to the archetype expected range above. Each audit contributes to the calibration dataset for this page.

Run the audit →See the methodology