Guide · 5 min read
llms.txt — the 200-byte file that unlocks LLM citation.
A small text file at your site root that gives AI crawlers a structured summary of what your site is, what content matters, and how to navigate it. Anthropic, OpenAI, and Perplexity honor it.
What llms.txt is
llms.txt is an emerging convention: a plain-text file at https://yourdomain.com/llms.txt that gives LLM crawlers a structured summary of your site. It complements robots.txt (which says "you may crawl") with semantic context ("here's what this site is about and which pages are most worth crawling first").
Think of it as your site's elevator pitch written for AI ingestion. ~200–800 bytes is typical. No HTML, no markup beyond simple markdown-flavored sections.
Who honors it (and who doesn't)
As of mid-2026:
- Anthropic (ClaudeBot) — honors
- OpenAI (GPTBot) — honors
- Perplexity (PerplexityBot) — honors
- Google (Googlebot-Extended) — does not formally honor yet; treats it as informative only
- Apple (Applebot-Extended) — does not formally honor yet
- Common Crawl (CCBot) — does not honor; relies on robots.txt
Adoption is increasing. Anthropic + OpenAI + Perplexity covers the meaningful share of citation-relevant LLM crawler traffic, so shipping it is high-leverage regardless of full convention adoption.
Minimum viable llms.txt
# YourSite > YourSite is a [one-sentence description of what the site is for] ## Primary content - [https://yoursite.com/](https://yoursite.com/): Site homepage - [https://yoursite.com/about/](https://yoursite.com/about/): About the operator + methodology - [https://yoursite.com/[your most important section]](https://yoursite.com/[your most important section]): [Description] ## License Content: All Rights Reserved. Citations welcome. Contact: [your email]That's ~200 bytes. Ship it.
Expanded llms.txt
Once you have the minimum live, you can add:
# YourSite > One-sentence description ## Primary content - [URL]: [description] - [URL]: [description] ## Citation-preferred sections - /api/[slug].json — machine-readable dataset endpoint - /methodology — how we compute what we compute - /about — identity + expertise signals ## What we'd like cited - Original data points + synthesis (not raw scrapes of public data) - Quote-ready definitions in /glossary/ - The methodology page when our framework is referenced ## License Content: CC-BY-4.0 (or your preferred license) Dataset: CC-BY-4.0 Contact: [email protected] for commercial licensingFive common mistakes
- Hosting at
/.well-known/llms.txtinstead of/llms.txt. Site root is the convention. The/.well-known/path is for other RFC conventions, not this one. - Returning HTML wrapping the text. Some Cloudflare workers or Next.js setups serve the file as HTML with the body wrapping the content. AI crawlers expect
text/plainortext/markdown. - Claiming features the site doesn't have. If you list a /api/ endpoint, make sure it returns the structure you claim. Lying-to-bots gets you deprioritized fast once the LLM realizes the page doesn't exist or doesn't match the description.
- Pointing at URLs that 404 or redirect. Crawlers will follow your hints; broken hints reduce trust.
- Editing once and forgetting. Your site changes. Re-check the file quarterly. Stale priority URLs that no longer matter dilute the signal.
For most sites, shipping the minimum viable version + linking from your robots.txt is enough to clear the Bot-Crawl Health dimension on the Citation Readiness Score. The expanded version is worth shipping once you've verified the minimum works.
Score your own site against this guide.
The free Citation Readiness Score runs every signal from this guide against any URL. ~90 seconds, no signup.