AI Search

llms.txt

Definition

llms.txt is a proposed standard file — placed at the root of a website — that lists the site's most important pages in a structured Markdown format, designed to help large language models (ChatGPT, Claude, Perplexity, Gemini) discover and cite the site accurately.

Proposed by llmstxt.org in late 2024, llms.txt is to AI crawlers what sitemap.xml is to search engines — a curated index that surfaces the pages worth reading. The file lives at /llms.txt and contains:

- A top-level H1 with the site name - A blockquote summary describing what the site is and who runs it - H2 sections grouping the key URLs by topic (Product, Guides, Pricing, etc.) - Each URL paired with a one-line description of what the page contains

The format is intentionally minimal: human-editable, no required schema, plain Markdown. Unlike sitemap.xml which exists to maximise crawl coverage, llms.txt exists to maximise citation accuracy — telling the model "if someone asks about X, this is the page to cite."

Adoption is voluntary and asymmetric — Anthropic's Claude appears to consume llms.txt actively, OpenAI's ChatGPT consumes it sporadically, Google's AI Overviews currently does not. The cost of adding it is near-zero (a single file) and the upside in AI citation accuracy is real, so most sites with a clear product or content strategy now ship one.

A separate proposed standard, llms-full.txt, includes the full content of each linked page concatenated into one file. Useful for smaller sites where total content is under the typical context window of an LLM (around 100k tokens). Larger sites usually omit llms-full.txt and let the model crawl individual URLs on demand.

Related terms