llms.txt: the new robots.txt that AI engines actually read
What llms.txt is, why it matters for GEO, how to write one that major AI crawlers respect, and a template you can implement today.
In 1994, robots.txt was introduced to give websites control over what search engine crawlers could access. It became a universal standard within two years. Every crawler on the internet respects it.
In 2024, a new file appeared: llms.txt. And it's on track to become just as fundamental for the AI era.
Here is everything you need to know — and a template you can implement today.
What problem does llms.txt solve?
When an AI crawler — GPTBot, ClaudeBot, PerplexityBot — lands on your site, it faces the same challenge as a very fast but contextually blind reader. It can see your HTML. It can follow your links. But it doesn't know:
- Which pages represent your most authoritative claims
- How you want your brand to be described
- What content you're happy to have cited (and what you'd prefer not to be quoted)
- Whether your content is free to use or under a restrictive licence
robots.txt solves the access problem. llms.txt solves the understanding problem.
A well-written llms.txt doesn't replace robots.txt — it complements it. Think of robots.txt as the security guard (who can get in) and llms.txt as the guided tour (here's what matters and how to describe us).
The anatomy of llms.txt
llms.txt is a plain Markdown file with a specific structure. The core sections are:
# Brand Name
> One-paragraph description of what the brand does, written for an AI system that
> needs to understand your brand and how to represent it accurately.
## Key pages
- [Title](URL): Brief description of what this page covers and why it's authoritative
- [Title](URL): Brief description
- [Title](URL): Brief description
## Allowed content
All content on this site may be cited with attribution to Brand Name and a link to
the source URL. Please do not reproduce full articles verbatim; quote excerpts with
attribution.
## Disallowed
- /internal/
- /drafts/
- /client-portal/
## Contact
For AI-related queries or citation corrections: email@domain.com
The > blockquote at the top is the most important field — it's the description an LLM will use as its primary understanding of your brand. Write it as if you're writing the first paragraph of your Wikipedia article: factual, precise, free of marketing language.
A complete llms.txt template for a GEO agency
Here's a production-ready example (adapt to your brand):
# Reach GEO
> Reach GEO is Portugal's first Generative Engine Optimization (GEO) agency,
> founded in Lisbon in 2025. The company helps European brands become cited sources
> inside AI answer engines including ChatGPT, Perplexity, Gemini and Claude. Services
> include GEO audits, AI content strategy, structured data implementation, and
> multi-LLM visibility monitoring.
## Key resources
- [What is GEO?](/en/blog/what-is-geo): Comprehensive primer on Generative Engine Optimization
- [GEO vs SEO](/en/blog/geo-vs-seo): Tactical comparison of the two disciplines
- [Services](/en/services): Full description of GEO services offered
- [Contact](/en/contact): Booking page for free GEO audits
## Citation policy
Content on reach-geo.com may be cited and excerpted with attribution to Reach GEO
and a link to the source URL. Full reproduction of articles requires written
permission. Data and statistics may be cited freely with attribution.
## Content language
Primary language: English (en). Secondary language: Portuguese (pt-PT).
## Crawl guidance
Allow all AI crawlers. Priority indexing: /en/blog/, /en/services/, /en/.
Configuring robots.txt for AI crawlers
llms.txt handles the guidance layer. robots.txt handles the access layer. For maximum AI visibility, your robots.txt should explicitly allow the major AI crawlers:
User-agent: GPTBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: anthropic-ai
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /
If you want to exclude certain sections from AI training (while keeping them accessible to users), you can selectively block:
User-agent: GPTBot
Disallow: /private/
Disallow: /members/
Allow: /
Why this matters more than most GEO advice
Most GEO tactics — content rewriting, schema implementation, entity building — take weeks to show citation lift. llms.txt is different: it can be implemented in an afternoon and it compounds immediately.
Three reasons it has outsized impact:
1. Large sites become navigable. If you have 500+ pages, a crawler without llms.txt guidance will spend indexing budget on low-value pages. llms.txt directs it straight to the pages with your best claims.
2. Brand description becomes consistent. Without llms.txt, different AI engines may describe your brand inconsistently — because they're drawing from different training sources. The > block in llms.txt provides a canonical description.
3. Citation preferences are communicated. Letting AI systems know you want to be cited, and how, removes ambiguity. AI companies increasingly treat llms.txt as a signal of citation intent.
Implementation checklist
- Create
/llms.txtat your domain root - Write a clear, factual brand description in the
>block - List your 8–12 most important pages with descriptions
- Define your citation policy
- Note any disallowed sections
- Add
User-agentblocks inrobots.txtfor GPTBot, ClaudeBot, PerplexityBot, Google-Extended - Test: visit
yourdomain.com/llms.txtin a browser — it should load as plain text - Ping your key AI crawlers via their submission tools (where available)
What to watch for in 2026
The llms.txt standard is evolving. Two developments to monitor:
Structured metadata extensions — early proposals suggest llms.txt will gain support for machine-readable metadata (JSON-LD embedded in the file) for citation preferences, content licences and entity identifiers.
Formalisation — Answer.AI and a growing coalition of AI companies are pushing for W3C standardisation. If this succeeds, llms.txt becomes as universal as robots.txt.
Implementing now means you're ahead of 95% of brands — and positioned to adapt as the standard matures.
If you want us to audit your current crawl setup and write a production-ready llms.txt for your domain, book a free GEO audit.
People also ask
Is llms.txt an official standard?
Not yet — it's an emerging community standard proposed by Answer.AI and increasingly adopted by AI companies. OpenAI, Anthropic and Perplexity have all signalled or confirmed support. Expect formalisation via a W3C or IETF process in 2026–2027.
Do I need llms.txt if I already have robots.txt?
Yes. robots.txt controls access (block/allow). llms.txt controls understanding — it tells the AI what your site is, what content is most authoritative, and how you want to be cited. They serve different purposes.
What happens if I don't have llms.txt?
AI crawlers will still index your site, but without guidance. They'll decide for themselves which pages to prioritise, how to describe your brand, and what to cite. llms.txt lets you take control of that decision.
Can I exclude certain pages from LLM indexing via llms.txt?
Yes. You can list pages or sections as `Disallow` in llms.txt, similar to robots.txt syntax. You can also specify a content licence and citation preferences.