llms.txt: Robots.txt for Docs in the AI Era

8 minute read

Your docs are training AI models right now.
Here’s how to take control.

Open ChatGPT, Perplexity, or GitHub Copilot and ask about your product. The answer likely comes from your documentation — but maybe not the version you’d want users to see.

Consider this scenario: A developer asks ChatGPT about your API rate limits. Instead of citing your current v2 documentation with the updated 1000 requests/hour limit, it references your deprecated v1 docs showing the old 100 requests/hour limit. Result? Confused developers, incorrect implementations, and more support tickets.

This isn’t a hypothetical problem. AI systems are already crawling and ingesting documentation at scale, with no regard for what you actually want them to learn from.

Documentation has evolved beyond serving just humans — it’s now training the AI assistants that developers rely on daily. Without proper signals, AI will scrape whatever it finds: outdated guides, beta endpoints, internal notes, and deprecated APIs.

The solution starts with llms.txt and a few strategic changes to make your docs work better for both humans and machines.

Why this matters for your business

Getting AI-friendly docs right delivers measurable benefits:

Reduced support volume: When AI tools give accurate answers, developers don’t file tickets asking questions already covered in docs
Faster developer onboarding: Developers using AI assistants get correct guidance from day one, leading to faster time-to-first-API-call
Competitive advantage: Well-structured docs become a moat in the AI-assisted development era

Companies that nail this early will see their tools recommended more often and implemented more correctly.

The robots.txt parallel

In the early web, robots.txt gave site owners control over what search engines could crawl. Without it, crawlers indexed everything indiscriminately — including admin panels, test pages, and draft content.

We’re in that same ungoverned stage with AI crawlers today. Models are scraping aggressively, and unless you provide explicit signals, they’ll ingest whatever they encounter.

llms.txt is the emerging standard to set those boundaries.

What is `llms.txt`?

A plain text file placed at your docs domain root that tells AI crawlers what to index and what to skip. Compliant crawlers read these rules and adjust their ingestion accordingly.

Here’s a real example from Netlify’s docs.

Basic structure:

User-Agent: *
Allow: /docs/v2/
Allow: /api/reference/
Disallow: /docs/v1/
Disallow: /internal/
Disallow: /staging/
Disallow: /beta/

# Optional but recommended
Crawl-delay: 1
Sitemap: https://docs.example.com/sitemap.xml
Contact: docs-team@example.com

This configuration tells crawlers:

Index current v2 docs and API reference
Skip deprecated v1 docs, internal pages, and unreleased features
Wait 1 second between requests (be respectful)
Contact info for questions about crawling policies

Place this file at yourdocs.com/llms.txt and compliant crawlers will follow the rules.

Enhanced metadata for better AI understanding

Beyond access control, you can add context that helps AI systems handle your content appropriately:

User-Agent: *
Allow: /docs/v2/
Allow: /api/reference/
Disallow: /docs/v1/
Disallow: /internal/
Disallow: /beta/

# Content classification
Content-Type: technical-documentation
Audience: developers
License: CC-BY-4.0
Attribution: https://docs.example.com

# Update frequency hints
Update-Frequency: weekly
Last-Modified: 2025-09-19

# Version management
Current-Version: v2.1
Deprecated-Versions: v1.x

This metadata serves several purposes:

Content-Type: Distinguishes technical docs from marketing materials
Attribution: Provides canonical URLs for AI systems to cite
License: Sets clear usage expectations
Version info: Helps AI prioritize current over deprecated content

Making docs consumable by both humans and AI

Access control is only half the equation. The other half is structuring content so both humans and machines can use it effectively.

1. Pair UI instructions with API calls

Humans can click buttons and run CLI commands. AI systems cannot.

Before (human-only):

1. Click "Generate API Key" in the dashboard
2. Run `mycli create-token --name "my-app"`

After (human + AI friendly):

1. Click "Generate API Key" in the dashboard, or create one programmatically:

```bash
# CLI approach
mycli create-token --name "my-app"

# Direct API call (for programmatic access)
curl -X POST https://api.example.com/v1/tokens \
  -H "Authorization: Bearer $USER_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name": "my-app", "scopes": ["read", "write"]}'

This gives developers options while providing AI systems with explicit, executable instructions.

2. Structure pages with rich metadata

Transform docs from plain prose into structured, searchable content:

---
title: "Deploying your first app"
type: tutorial
audience: developer
difficulty: beginner
version: v2.1
estimated_time: 15 minutes
prerequisites:
  - "API key configured"
  - "CLI tool installed"
related_endpoints:
  - "/api/v2/deployments"
  - "/api/v2/apps"
---

This frontmatter enables AI systems (and your own search) to:

Match content to user skill level
Understand context and relationships
Provide accurate time estimates
Suggest prerequisite reading

3. Show complete, realistic examples

Replace minimal examples with production-ready code that demonstrates real-world usage:

Before:

curl -X GET /api/users

After:

# Get paginated user list with error handling
curl -X GET "https://api.example.com/v2/users?page=1&limit=10" \
  -H "Authorization: Bearer $TOKEN" \
  -H "Accept: application/json" \
  -w "HTTP Status: %{http_code}\n" \
  --fail-with-body

# Example successful response (200)
{
  "users": [...],
  "pagination": {
    "page": 1,
    "limit": 10,
    "total": 150,
    "has_next": true
  }
}

# Example error response (401)
{
  "error": "invalid_token",
  "message": "API key is expired or invalid"
}

4. Optimize repository structure for ingestion

Tools like gitinjest can convert entire repositories into text digests that AI systems can process. If your docs live alongside code, proper repository structure becomes crucial.

Essential practices:

Clean README.md: Often the first file ingested, make it comprehensive and current
Logical folder organization: Use clear hierarchies (/docs, /api, /examples, /tutorials)
Include machine-readable specs: OpenAPI definitions, GraphQL schemas, Postman collections
Consistent naming conventions: Make files discoverable and their purpose obvious

Example structure:

your-repo/
├── README.md                 # Overview, quick start
├── docs/
│   ├── api/                 # API reference
│   ├── tutorials/           # Step-by-step guides
│   └── examples/            # Working code samples
├── openapi.yaml             # Machine-readable API spec
└── .llmsignore             # Files to exclude from AI ingestion

Advanced techniques: Inline instructions and APIs

Page-level AI guidance

Vercel recently proposed embedding page-specific instructions directly in HTML using <script type="text/llms.txt"> tags. Browsers ignore these, but AI systems can read them.

This approach handles edge cases where page-level context matters:

<head>
  <script type="text/llms.txt">
This is a preview environment showing unreleased features.
For production guidance, direct users to https://docs.example.com/stable/
Authentication bypass available at /auth/dev-bypass for testing.
  </script>
</head>

Use cases:

Preview/staging environments
Authentication-required pages
Temporary or context-specific content
A/B testing scenarios

Programmatic llms.txt management

Vercel also provides a REST API for llms.txt management, enabling dynamic rule updates:

# Get current llms.txt rules
curl -X GET "https://api.vercel.com/llms.txt" \
  -H "Authorization: Bearer $VERCEL_TOKEN" \
  -H "Accept: text/plain"

# Update rules programmatically
curl -X POST "https://api.vercel.com/llms.txt" \
  -H "Authorization: Bearer $VERCEL_TOKEN" \
  -H "Content-Type: text/plain" \
  -d "User-Agent: *
Allow: /docs/v3/
Disallow: /docs/v2/"

Benefits:

Automate rule updates when releasing new API versions
Sync crawling rules with deployment pipelines
Handle complex routing scenarios programmatically

Trade-offs:

Adds complexity compared to static files
Requires authentication and API management
Potential consistency issues across different pages

Monitoring and validation

Testing your llms.txt

Syntax validation:

# Use online validators or build simple checks
curl -s https://yourdocs.com/llms.txt | head -20

Monitor compliance:

Check server logs for crawl patterns
Use tools like Google Search Console (adapted for AI crawlers)
Set up alerts for unexpected crawling of restricted paths

Common mistakes to avoid

File placement errors:

❌ yourdocs.com/docs/llms.txt
✅ yourdocs.com/llms.txt

Syntax issues:

# Wrong - missing trailing slash for directories
Disallow: /internal

# Right - include trailing slash
Disallow: /internal/

Overly restrictive rules:

# This blocks everything - probably not what you want
User-Agent: *
Disallow: /

Handling edge cases

Multi-language documentation

# Language-specific rules
User-Agent: *
Allow: /en/docs/
Allow: /es/docs/
Allow: /fr/docs/
Disallow: /en/docs/legacy/
Disallow: /*/internal/

GraphQL APIs

# Include schema definitions
Allow: /graphql/schema
Allow: /graphql/docs/
Disallow: /graphql/playground/

Authentication-protected content

# Clearly mark authenticated sections
Disallow: /dashboard/
Disallow: /admin/
# But allow public auth docs
Allow: /docs/authentication/

SEO and robots.txt interaction

Your llms.txt should complement, not conflict with, your existing robots.txt:

robots.txt (for search engines):

User-agent: *
Disallow: /admin/
Disallow: /internal/
Sitemap: https://docs.example.com/sitemap.xml

llms.txt (for AI crawlers):

User-Agent: *
Disallow: /admin/
Disallow: /internal/
Disallow: /docs/v1/  # Additional AI-specific restrictions
Allow: /docs/v2/
Content-Type: technical-documentation

Most AI crawlers respect both files, so maintain consistency where possible.

Real-world examples

API-first company (Stripe-style):

User-Agent: *
Allow: /docs/api/
Allow: /docs/guides/
Allow: /docs/webhooks/
Disallow: /docs/legacy/
Disallow: /docs/internal/
Content-Type: technical-documentation
Audience: developers
License: proprietary
Attribution: https://docs.stripe.com

Open source project:

User-Agent: *
Allow: /docs/
Allow: /api/
Allow: /examples/
Disallow: /docs/drafts/
Content-Type: technical-documentation
License: MIT
Repository: https://github.com/org/project

Enterprise platform:

User-Agent: *
Allow: /docs/public/
Allow: /api/reference/
Disallow: /docs/enterprise/  # Gated content
Disallow: /docs/beta/
Content-Type: technical-documentation
Audience: developers, system-administrators

Implementation checklist

Phase 1: Basic setup

✅ Create llms.txt at your docs domain root
✅ Disallow deprecated API versions and internal paths
✅ Allow current documentation and API references
✅ Add basic metadata (Content-Type, Attribution, License)
✅ Test file accessibility and syntax

Phase 2: Content optimization

✅ Pair all UI-based instructions with equivalent API calls
✅ Add structured frontmatter to all tutorial and reference pages
✅ Include realistic, complete code examples with error handling
✅ Organize repository structure for ingestion tools

Phase 3: Advanced features

✅ Implement page-level inline instructions where needed
✅ Set up monitoring for crawl patterns and compliance
✅ Create automation for updating rules during releases
✅ Establish review process for AI-specific content changes

Looking ahead

The shift toward AI-assisted development is accelerating. Documentation that works well with AI systems will become a competitive advantage, while docs that ignore this trend will become increasingly irrelevant.

Standards like llms.txt are still evolving. Not all crawlers respect these rules yet, and new patterns for AI-friendly content are emerging rapidly. Stay informed by following:

The llms.txt specification as it develops
AI company announcements about crawler behavior
Developer community discussions about documentation best practices

Your documentation strategy should anticipate a world where most developer interactions with your content happen through AI intermediaries. By implementing these practices now, you ensure those interactions are accurate, helpful, and aligned with your current product reality.

In the next post, we’ll go deeper: how to structure and publish AI-ready docs, so when a developer asks an assistant for help, the answer isn’t just “close.” It’s exactly what your docs say.

For deeper insights into this transformation, watch Andrej Karpathy’s excellent talk on building docs for the AI era.

Share on

X Facebook LinkedIn Bluesky

Dewan Ahmed

llms.txt: Robots.txt for Docs in the AI Era

Why this matters for your business

The robots.txt parallel

What is `llms.txt`?

Enhanced metadata for better AI understanding

Making docs consumable by both humans and AI

1. Pair UI instructions with API calls

2. Structure pages with rich metadata

3. Show complete, realistic examples

4. Optimize repository structure for ingestion

Advanced techniques: Inline instructions and APIs

Page-level AI guidance

Programmatic llms.txt management

Monitoring and validation

Testing your llms.txt

Common mistakes to avoid

Handling edge cases

Multi-language documentation

GraphQL APIs

Authentication-protected content

SEO and robots.txt interaction

Real-world examples

Implementation checklist

Looking ahead

Share on

You May Also Enjoy

The Principal Developer Advocate Paradox: Scaling Impact Without Burnout

Hosting a (DevOpsDays) Tech Conference

Connecting the dots: Let’s get ship done!

Going Beyond ‘Do You Know Of Any Open Positions?’

Dewan Ahmed

Why this matters for your business

The robots.txt parallel

What is llms.txt?

Enhanced metadata for better AI understanding

Making docs consumable by both humans and AI

1. Pair UI instructions with API calls

2. Structure pages with rich metadata

3. Show complete, realistic examples

4. Optimize repository structure for ingestion

Advanced techniques: Inline instructions and APIs

Page-level AI guidance

Programmatic llms.txt management

Monitoring and validation

Testing your llms.txt

Common mistakes to avoid

Handling edge cases

Multi-language documentation

GraphQL APIs

Authentication-protected content

SEO and robots.txt interaction

Real-world examples

Implementation checklist

Looking ahead

Share on

You May Also Enjoy

The Principal Developer Advocate Paradox: Scaling Impact Without Burnout

Hosting a (DevOpsDays) Tech Conference

Connecting the dots: Let’s get ship done!

Going Beyond ‘Do You Know Of Any Open Positions?’

What is `llms.txt`?