Benchmarks · expected ranges · calibrating live
Citation Readiness benchmarks — research synthesis + expected ranges.
The Citation Readiness Score doesn't exist in a vacuum. It's a synthesis of published research (Aleyda Solis's 10-characteristic checklist · Semrush's 2025 AI-search data · Cloudflare's public crawl data) plus pre-launch fleet calibration on ~30 sites. This page is the honest map of what we know, what we expect, and what we'll learn as Free audits accumulate.
Verified anchors (from the operator's fleet)
Three fleet sites we operate and have direct measurement access to. These numbers are operator-verifiable and come from Plausible + Cloudflare Web Analytics + Cloudflare bot-crawler logs.
| Site | Archetype | Measured | Source |
|---|---|---|---|
| HoldLens | Reference / financial data | 194 unique visitors / 30 days · 10% inbound from chatgpt.com | Operator's own Plausible data, period 2026-05-04 to 2026-05-17 |
| SourceScore | Reference / niche data | 8 human UV / 30 days · 3,840 AI crawler hits / 30 days · 480:1 bot-to-human ratio | Cloudflare Web Analytics + bot-crawler logs, period 2026-04-19 to 2026-05-19 |
| txtfeed.com | Reference / llms.txt directory | 51,330 AI crawler hits / 30 days (fleet leader) | Cloudflare AI Crawl Control logs, period 2026-04-19 to 2026-05-19 |
Published research the framework anchors to
The Citation Readiness Score didn't invent its inputs. Four primary sources inform the rubric:
Source
Semrush — AI Search Behaviors 2025
Finding: Cited sources in AI-search responses see 8–18% click-through rate, vs 2–5% for Google SERP position 3.
How it's applied: The size of the gap is why AI citation is a measurable category, not vanity.
Source
Aleyda Solis — 10-characteristic LLM citation checklist (2024-2025)
Finding: LLM-citation eligibility correlates with 10 structural properties: Accessible, Useful, Recognizable, Extractable, Consistent, Corroborated, Credible, Differentiated, Fresh, Transactable.
How it's applied: Forms the core of the Citation Readiness Score's GEO Readiness dimension.
Source
Cloudflare — AI Crawl Control public data (2025)
Finding: Sites with explicit AI-bot allowlists in robots.txt see 2–3× higher AI crawler frequency than sites with default-permissive policies.
How it's applied: Anchors the Bot-Crawl Health dimension — explicit allowlist is high-leverage.
Source
CitationDesk fleet — pre-launch internal use (2026-04 to 2026-05)
Finding: The 5-dimension scoring rubric was applied to ~30 fleet sites pre-launch. Highest variance was in Dual Fit (the structural first-paragraph format).
How it's applied: The rubric is operator-tested before it became a public tool.
Expected ranges by archetype
These ranges are based on synthesis (the four sources above) + small-N internal calibration on the operator's fleet. Low / mid / high columns describe the bottom decile / median / top decile we'd expect.
| Archetype | Expected low | Expected mid | Expected high | Typical limiter |
|---|---|---|---|---|
| Publisher (evergreen editorial) | 0.25 | 0.40 | 0.65 | Editorial lead-paragraphs hurt Dual Fit; entity-coherence gap common. |
| SaaS marketing site | 0.30 | 0.45 | 0.70 | Pricing + features pages score well; landing pages struggle with Dual Fit. |
| SaaS docs site | 0.25 | 0.40 | 0.60 | Built for human reading, not LLM extraction. Welcome-paragraphs are the gap. |
| Reference / data site | 0.45 | 0.60 | 0.85 | Fact-shaped first paragraphs are inherent. Entity coherence is the typical gap. |
| Programmatic SEO site | 0.20 | 0.35 | 0.60 | Templated thin content + JS-rendered body are common hard-blocks. |
| Calculator / tool | 0.55 | 0.70 | 0.90 | Highest-scoring archetype. Calculator output IS the extractable fact. |
| Indie newsletter / blog | 0.25 | 0.40 | 0.65 | Voice-driven content scores low on Dual Fit; freshness signal is the lift. |
Per-dimension expected ranges
Mean score 0.0 — 1.0 across the five Citation Readiness dimensions. Same caveat — synthesis + internal calibration, to be measured against population data over time.
| Dimension | Range across archetypes | Notes |
|---|---|---|
| SEO Foundation | 0.55 — 0.85 | Most sites already meet structural SEO minimums; the dimension is rarely the bottleneck. |
| GEO Readiness | 0.30 — 0.70 | Aleyda 10-characteristic checklist; biggest source of variance across archetypes. |
| Dual Fit | 0.25 — 0.85 | The biggest gap across archetypes. First-paragraph rewrites are highest-leverage. |
| Entity Coherence | 0.30 — 0.75 | Adding Person + Organization schema with sameAs lifts +0.20-0.30 across the board. |
| Bot-Crawl Health | 0.50 — 0.95 | Mostly solved by /llms.txt + robots.txt allowlist + CF AI Crawl Control = "Do not block". |
The 8 fixes with the largest expected lift
Ranked roughly by expected dimension lift when applied to a typical underperforming page. Specific lift magnitude varies by archetype + by where the page started.
| Rewrite | Expected lift | Notes |
|---|---|---|
| Add quote-ready fact in first paragraph | large — Dual Fit + GEO Readiness | Single highest-leverage fix across all archetypes. |
| Add Person + Organization schema with sameAs (≥2 profiles) | large — Entity Coherence | One JSON-LD block in global layout. Applies site-wide. |
| Reduce first paragraph to ≤600 chars | medium — Dual Fit | LLMs extract ~50-word paragraphs; longer truncates. |
| Add /llms.txt at site root | medium — Bot-Crawl Health | ~200 bytes; Anthropic + OpenAI + Perplexity honor. |
| Add explicit AI-bot allowlist in /robots.txt | medium-large — Bot-Crawl Health | 9 crawlers; signals "crawl welcome". |
| Add DefinedTerm schema for key concepts | small-medium — GEO Readiness | Wraps glossary terms for LLM extraction. |
| Rewrite title to query-intent + 50-60 chars | medium — SEO Foundation | Targets a specific query; not "Home" or brand-only. |
| Add dateModified within last 30 days | small — GEO Readiness · Freshness | Freshness signal for LLMs that weight recency. |
Calibration commitment
The honest version of any benchmark page is: here's the synthesis + the anchors + the small-N data we have today, and here's how it calibrates as more data arrives.
- As Free audits accumulate (every URL submitted via /tools/citation-readiness/), the score distribution per archetype updates.
- When we have ≥100 audits per archetype, the "expected" columns become "observed" columns with sample sizes published.
- Per-LLM citation rate data ships with Phase 1B (the polling engine that lands when operator credentials drop).
- All published numbers will be CC-BY-4.0 with attribution + dataset access for researchers — same intent as the framework itself.
Notified when the calibration updates? Subscribe via RSS or email [email protected].
See where your URL falls in these ranges.
Run the Free Citation Readiness Score and compare your score to the archetype expected range above. Each audit contributes to the calibration dataset for this page.