Which AI systems read llms.txt?

OpenAI's GPTBot, Anthropic's ClaudeBot, and Perplexity's PerplexityBot have all signalled support for llms.txt conventions. Google-Extended for Gemini is controlled via robots.txt. The field is evolving rapidly.

Where does llms.txt go?

At the root of your domain: https://yourdomain.com/llms.txt. It should be publicly accessible without authentication.

How is llms.txt different from a sitemap?

A sitemap is for search engine crawlers and lists all URLs. llms.txt is for AI systems specifically — it's curated, human-readable, and focuses on helping LLMs understand context and citation intent, not just enumerate pages.

Does llms.txt guarantee I'll be cited?

No — it increases the probability. llms.txt improves the signal-to-noise ratio of your site for AI systems, making it more likely they'll find and prioritise your best content. Citation also depends on content quality, entity presence and other GEO factors.

Technical GEO

llms.txt: the new robots.txt that AI engines actually read

What llms.txt is, why it matters for GEO, how to write one that major AI crawlers respect, and a template you can implement today.

Marco Silva18 April 2026 · 8 min read

In 1994, robots.txt was introduced to give websites control over what search engine crawlers could access. It became a universal standard within two years. Every crawler on the internet respects it.

In 2024, a new file appeared: llms.txt. And it's on track to become just as fundamental for the AI era.

Here is everything you need to know — and a template you can implement today.

What problem does llms.txt solve?

When an AI crawler — GPTBot, ClaudeBot, PerplexityBot — lands on your site, it faces the same challenge as a very fast but contextually blind reader. It can see your HTML. It can follow your links. But it doesn't know:

Which pages represent your most authoritative claims
How you want your brand to be described
What content you're happy to have cited (and what you'd prefer not to be quoted)
Whether your content is free to use or under a restrictive licence

robots.txt solves the access problem. llms.txt solves the understanding problem.

A well-written llms.txt doesn't replace robots.txt — it complements it. Think of robots.txt as the security guard (who can get in) and llms.txt as the guided tour (here's what matters and how to describe us).

The anatomy of llms.txt

llms.txt is a plain Markdown file with a specific structure. The core sections are:

# Brand Name

> One-paragraph description of what the brand does, written for an AI system that 
> needs to understand your brand and how to represent it accurately.

## Key pages

- [Title](URL): Brief description of what this page covers and why it's authoritative
- [Title](URL): Brief description
- [Title](URL): Brief description

## Allowed content

All content on this site may be cited with attribution to Brand Name and a link to 
the source URL. Please do not reproduce full articles verbatim; quote excerpts with 
attribution.

## Disallowed

- /internal/
- /drafts/
- /client-portal/

## Contact

For AI-related queries or citation corrections: email@domain.com

The > blockquote at the top is the most important field — it's the description an LLM will use as its primary understanding of your brand. Write it as if you're writing the first paragraph of your Wikipedia article: factual, precise, free of marketing language.

A complete llms.txt template for a GEO agency

Here's a production-ready example (adapt to your brand):

# Reach GEO

> Reach GEO is Portugal's first Generative Engine Optimization (GEO) agency, 
> founded in Lisbon in 2025. The company helps European brands become cited sources 
> inside AI answer engines including ChatGPT, Perplexity, Gemini and Claude. Services 
> include GEO audits, AI content strategy, structured data implementation, and 
> multi-LLM visibility monitoring.

## Key resources

- [What is GEO?](/en/blog/what-is-geo): Comprehensive primer on Generative Engine Optimization
- [GEO vs SEO](/en/blog/geo-vs-seo): Tactical comparison of the two disciplines
- [Services](/en/services): Full description of GEO services offered
- [Contact](/en/contact): Booking page for free GEO audits

## Citation policy

Content on reach-geo.com may be cited and excerpted with attribution to Reach GEO 
and a link to the source URL. Full reproduction of articles requires written 
permission. Data and statistics may be cited freely with attribution.

## Content language

Primary language: English (en). Secondary language: Portuguese (pt-PT).

## Crawl guidance

Allow all AI crawlers. Priority indexing: /en/blog/, /en/services/, /en/.

Configuring robots.txt for AI crawlers

llms.txt handles the guidance layer. robots.txt handles the access layer. For maximum AI visibility, your robots.txt should explicitly allow the major AI crawlers:

User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

If you want to exclude certain sections from AI training (while keeping them accessible to users), you can selectively block:

User-agent: GPTBot
Disallow: /private/
Disallow: /members/
Allow: /

Why this matters more than most GEO advice

Most GEO tactics — content rewriting, schema implementation, entity building — take weeks to show citation lift. llms.txt is different: it can be implemented in an afternoon and it compounds immediately.

Three reasons it has outsized impact:

1. Large sites become navigable. If you have 500+ pages, a crawler without llms.txt guidance will spend indexing budget on low-value pages. llms.txt directs it straight to the pages with your best claims.

2. Brand description becomes consistent. Without llms.txt, different AI engines may describe your brand inconsistently — because they're drawing from different training sources. The > block in llms.txt provides a canonical description.

3. Citation preferences are communicated. Letting AI systems know you want to be cited, and how, removes ambiguity. AI companies increasingly treat llms.txt as a signal of citation intent.

Implementation checklist

Create /llms.txt at your domain root
Write a clear, factual brand description in the > block
List your 8–12 most important pages with descriptions
Define your citation policy
Note any disallowed sections
Add User-agent blocks in robots.txt for GPTBot, ClaudeBot, PerplexityBot, Google-Extended
Test: visit yourdomain.com/llms.txt in a browser — it should load as plain text
Ping your key AI crawlers via their submission tools (where available)

What to watch for in 2026

The llms.txt standard is evolving. Two developments to monitor:

Structured metadata extensions — early proposals suggest llms.txt will gain support for machine-readable metadata (JSON-LD embedded in the file) for citation preferences, content licences and entity identifiers.

Formalisation — Answer.AI and a growing coalition of AI companies are pushing for W3C standardisation. If this succeeds, llms.txt becomes as universal as robots.txt.

Implementing now means you're ahead of 95% of brands — and positioned to adapt as the standard matures.

If you want us to audit your current crawl setup and write a production-ready llms.txt for your domain, book a free GEO audit.

Frequently asked questions

llms.txt is a plain-text file placed at the root of a website (yourdomain.com/llms.txt) that gives AI crawlers and language models structured information about the site: its purpose, its most authoritative content, how it should be cited, and any content restrictions.

llms.txt: the new robots.txt that AI engines actually read

What problem does llms.txt solve?

The anatomy of llms.txt

A complete llms.txt template for a GEO agency

Configuring robots.txt for AI crawlers

Why this matters more than most GEO advice

Implementation checklist

What to watch for in 2026

People also ask

Is llms.txt an official standard?

Do I need llms.txt if I already have robots.txt?

What happens if I don't have llms.txt?

Can I exclude certain pages from LLM indexing via llms.txt?

Frequently asked questions

References