Guide · 9 min read
Citation Oracle archetypes — the multipliers behind the score.
Every page is one or more archetypes. Each archetype carries a calibrated citation-potential multiplier. Here's the table, the stacking rules, and the four archetypes we permanently reject.
Why archetypes matter
Pages aren't citable in the abstract. They're citable because of what they are: an original-research dataset, a programmatic-unique-data table, a methodology page, a Person + Organization schema chain, a Wikipedia citation entry. Each of these is an archetype, and each archetype has a measurably different probability of being cited by ChatGPT, Claude, Perplexity, or Gemini.
The Citation Oracle that powers CitationDesk uses 15 calibrated archetype multipliers, each of which is self-calibrating against per-archetype actuals every time a page ships. The result is a 0–100 projection of weekly citations per page, which feeds the four-Oracle ranking model in the underlying AcePilot framework.
The 15 calibrated multipliers
Higher multiplier = stronger citation expectation. Negative = hard reject.
| Archetype | Multiplier |
|---|---|
| wikipedia_citation_secured | +95 — One Wikipedia citation compounds for years. |
| unique_dataset_page | +90 — A fact no LLM has in its training corpus — HoldLens-class. |
| ai_visibility_optimized_page | +85 — Aleyda 10-characteristic score ≥8. |
| methodology_page_first_party | +75 — LLMs love explainers that establish "how we know". |
| definedterm_schema_per_concept | +70 — DefinedTerm structured data on every glossary term. |
| citation_chain_inbound_3_plus | +70 — Your facts cited by ≥3 third-parties create a corroboration moat. |
| open_dataset_json_api | +65 — Machine-readable for LLM training + retrieval. |
| schema_dataset_with_distribution | +60 — Dataset + Distribution schema explicit. |
| person_schema_with_sameAs | +50 — Operator identity coherent across the site. |
| podcast_guest_transcript_indexed | +50 — Long-form transcripts get cited as authoritative. |
| haro_qwoted_featured_pitch | +45 — Journalist quotes propagate. |
| reddit_organic_corroboration | +40 — Operator-time; high signal-to-noise. |
| linkedin_zero_click_framework | +35 — Frameworks get screenshot-cited. |
| llms_txt_advertise_with_openapi | +30 — Discoverability + machine-readable endpoint. |
| freshness_per_page | +20 — dateModified within 30 days. |
How they stack on a single page
A single page commonly hits 3–10 archetypes simultaneously. For example, our HoldLens case study hits:
- methodology_page_first_party (+75) — explains the audit
- schema_dataset_with_distribution (+60) — Article schema
- person_schema_with_sameAs (+50) — operator byline
- citation_chain_inbound_3_plus (+70) — Wikipedia + HN inbounds captured
- freshness_per_page (+20) — datePublished + dateModified set
Stacking is multiplicative on the citation_weight feeding APS_v20.3. The geometric mean across all archetypes a page hits feeds the projected weekly citation count. A page hitting five archetypes at average +55 has roughly 3× the projected citations of a page hitting one archetype at +60.
The four immutable hard-rejects
Some archetypes carry a multiplier of −1000 and are filtered before ranking. They cannot be re-enabled by any configuration:
- ai_generated_filler — pages written by an LLM with no original substance. LLMs aggressively deprioritize their own training-corpus echoes.
- scraped_content_no_attribution — copies of someone else's data without citation. Beyond ethics, it's structurally non-citable.
- cloak_to_bots — serving different HTML to GPTBot than humans. Detection is automatic and the penalty is severe.
- fake_entity_schema — Organization or Person schema describing entities that don't exist or misrepresent. LLM entity graphs cross-reference and catch this fast.
How the multipliers self-calibrate
Every page that ships logs its archetype claim + projected citations + actual citations (measured 7 + 30 days post-ship) to a per-project state file. After 10+ rows per archetype, the multiplier auto-adjusts toward the mean actual-to-projected ratio, bounded ±50% per cycle to prevent oscillation. The result is that the seed multipliers on this page are not gospel — they're a calibration anchor that drifts toward observed reality across thousands of citation events.
Want to see your own page's archetype composition? Run the Citation Readiness Score — the audit identifies which archetypes a page currently hits and which the page could plausibly add.
Score your own site against this guide.
The free Citation Readiness Score runs every signal from this guide against any URL. ~90 seconds, no signup.