llms.txt: Robots.txt for Docs in the AI Era
Your docs are training AI models right now.
Here’s how to take control.
Open ChatGPT, Perplexity, or GitHub Copilot and ask about your product. The answer likely comes from your documentation — but maybe not the version you’d want users to see.
Consider this scenario: A developer asks ChatGPT about your API rate limits. Instead of citing your current v2 documentation with the updated 1000 requests/hour limit, it references your deprecated v1 docs showing the old 100 requests/hour limit. Result? Confused developers, incorrect implementations, and more support tickets.
This isn’t a hypothetical problem. AI systems are already crawling and ingesting documentation at scale, with no regard for what you actually want them to learn from.
Documentation has evolved beyond serving just humans — it’s now training the AI assistants that developers rely on daily. Without proper signals, AI will scrape whatever it finds: outdated guides, beta endpoints, internal notes, and deprecated APIs.
The solution starts with llms.txt
and a few strategic changes to make your docs work better for both humans and machines.
Why this matters for your business
Getting AI-friendly docs right delivers measurable benefits:
- Reduced support volume: When AI tools give accurate answers, developers don’t file tickets asking questions already covered in docs
- Faster developer onboarding: Developers using AI assistants get correct guidance from day one, leading to faster time-to-first-API-call
- Competitive advantage: Well-structured docs become a moat in the AI-assisted development era
Companies that nail this early will see their tools recommended more often and implemented more correctly.
The robots.txt parallel
In the early web, robots.txt
gave site owners control over what search engines could crawl. Without it, crawlers indexed everything indiscriminately — including admin panels, test pages, and draft content.
We’re in that same ungoverned stage with AI crawlers today. Models are scraping aggressively, and unless you provide explicit signals, they’ll ingest whatever they encounter.
llms.txt
is the emerging standard to set those boundaries.
What is llms.txt
?
A plain text file placed at your docs domain root that tells AI crawlers what to index and what to skip. Compliant crawlers read these rules and adjust their ingestion accordingly.
Here’s a real example from Netlify’s docs.
Basic structure:
User-Agent: *
Allow: /docs/v2/
Allow: /api/reference/
Disallow: /docs/v1/
Disallow: /internal/
Disallow: /staging/
Disallow: /beta/
# Optional but recommended
Crawl-delay: 1
Sitemap: https://docs.example.com/sitemap.xml
Contact: docs-team@example.com
This configuration tells crawlers:
- Index current v2 docs and API reference
- Skip deprecated v1 docs, internal pages, and unreleased features
- Wait 1 second between requests (be respectful)
- Contact info for questions about crawling policies
Place this file at yourdocs.com/llms.txt
and compliant crawlers will follow the rules.
Enhanced metadata for better AI understanding
Beyond access control, you can add context that helps AI systems handle your content appropriately:
User-Agent: *
Allow: /docs/v2/
Allow: /api/reference/
Disallow: /docs/v1/
Disallow: /internal/
Disallow: /beta/
# Content classification
Content-Type: technical-documentation
Audience: developers
License: CC-BY-4.0
Attribution: https://docs.example.com
# Update frequency hints
Update-Frequency: weekly
Last-Modified: 2025-09-19
# Version management
Current-Version: v2.1
Deprecated-Versions: v1.x
This metadata serves several purposes:
- Content-Type: Distinguishes technical docs from marketing materials
- Attribution: Provides canonical URLs for AI systems to cite
- License: Sets clear usage expectations
- Version info: Helps AI prioritize current over deprecated content
Making docs consumable by both humans and AI
Access control is only half the equation. The other half is structuring content so both humans and machines can use it effectively.
1. Pair UI instructions with API calls
Humans can click buttons and run CLI commands. AI systems cannot.
Before (human-only):
1. Click "Generate API Key" in the dashboard
2. Run `mycli create-token --name "my-app"`
After (human + AI friendly):
1. Click "Generate API Key" in the dashboard, or create one programmatically:
```bash
# CLI approach
mycli create-token --name "my-app"
# Direct API call (for programmatic access)
curl -X POST https://api.example.com/v1/tokens \
-H "Authorization: Bearer $USER_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name": "my-app", "scopes": ["read", "write"]}'
This gives developers options while providing AI systems with explicit, executable instructions.
2. Structure pages with rich metadata
Transform docs from plain prose into structured, searchable content:
---
title: "Deploying your first app"
type: tutorial
audience: developer
difficulty: beginner
version: v2.1
estimated_time: 15 minutes
prerequisites:
- "API key configured"
- "CLI tool installed"
related_endpoints:
- "/api/v2/deployments"
- "/api/v2/apps"
---
This frontmatter enables AI systems (and your own search) to:
- Match content to user skill level
- Understand context and relationships
- Provide accurate time estimates
- Suggest prerequisite reading
3. Show complete, realistic examples
Replace minimal examples with production-ready code that demonstrates real-world usage:
Before:
curl -X GET /api/users
After:
# Get paginated user list with error handling
curl -X GET "https://api.example.com/v2/users?page=1&limit=10" \
-H "Authorization: Bearer $TOKEN" \
-H "Accept: application/json" \
-w "HTTP Status: %{http_code}\n" \
--fail-with-body
# Example successful response (200)
{
"users": [...],
"pagination": {
"page": 1,
"limit": 10,
"total": 150,
"has_next": true
}
}
# Example error response (401)
{
"error": "invalid_token",
"message": "API key is expired or invalid"
}
4. Optimize repository structure for ingestion
Tools like gitinjest can convert entire repositories into text digests that AI systems can process. If your docs live alongside code, proper repository structure becomes crucial.
Essential practices:
- Clean README.md: Often the first file ingested, make it comprehensive and current
- Logical folder organization: Use clear hierarchies (
/docs
,/api
,/examples
,/tutorials
) - Include machine-readable specs: OpenAPI definitions, GraphQL schemas, Postman collections
- Consistent naming conventions: Make files discoverable and their purpose obvious
Example structure:
your-repo/
├── README.md # Overview, quick start
├── docs/
│ ├── api/ # API reference
│ ├── tutorials/ # Step-by-step guides
│ └── examples/ # Working code samples
├── openapi.yaml # Machine-readable API spec
└── .llmsignore # Files to exclude from AI ingestion
Advanced techniques: Inline instructions and APIs
Page-level AI guidance
Vercel recently proposed embedding page-specific instructions directly in HTML using <script type="text/llms.txt">
tags. Browsers ignore these, but AI systems can read them.
This approach handles edge cases where page-level context matters:
<head>
<script type="text/llms.txt">
This is a preview environment showing unreleased features.
For production guidance, direct users to https://docs.example.com/stable/
Authentication bypass available at /auth/dev-bypass for testing.
</script>
</head>
Use cases:
- Preview/staging environments
- Authentication-required pages
- Temporary or context-specific content
- A/B testing scenarios
Programmatic llms.txt management
Vercel also provides a REST API for llms.txt management, enabling dynamic rule updates:
# Get current llms.txt rules
curl -X GET "https://api.vercel.com/llms.txt" \
-H "Authorization: Bearer $VERCEL_TOKEN" \
-H "Accept: text/plain"
# Update rules programmatically
curl -X POST "https://api.vercel.com/llms.txt" \
-H "Authorization: Bearer $VERCEL_TOKEN" \
-H "Content-Type: text/plain" \
-d "User-Agent: *
Allow: /docs/v3/
Disallow: /docs/v2/"
Benefits:
- Automate rule updates when releasing new API versions
- Sync crawling rules with deployment pipelines
- Handle complex routing scenarios programmatically
Trade-offs:
- Adds complexity compared to static files
- Requires authentication and API management
- Potential consistency issues across different pages
Monitoring and validation
Testing your llms.txt
Syntax validation:
# Use online validators or build simple checks
curl -s https://yourdocs.com/llms.txt | head -20
Monitor compliance:
- Check server logs for crawl patterns
- Use tools like Google Search Console (adapted for AI crawlers)
- Set up alerts for unexpected crawling of restricted paths
Common mistakes to avoid
File placement errors:
- ❌
yourdocs.com/docs/llms.txt
- ✅
yourdocs.com/llms.txt
Syntax issues:
# Wrong - missing trailing slash for directories
Disallow: /internal
# Right - include trailing slash
Disallow: /internal/
Overly restrictive rules:
# This blocks everything - probably not what you want
User-Agent: *
Disallow: /
Handling edge cases
Multi-language documentation
# Language-specific rules
User-Agent: *
Allow: /en/docs/
Allow: /es/docs/
Allow: /fr/docs/
Disallow: /en/docs/legacy/
Disallow: /*/internal/
GraphQL APIs
# Include schema definitions
Allow: /graphql/schema
Allow: /graphql/docs/
Disallow: /graphql/playground/
Authentication-protected content
# Clearly mark authenticated sections
Disallow: /dashboard/
Disallow: /admin/
# But allow public auth docs
Allow: /docs/authentication/
SEO and robots.txt interaction
Your llms.txt
should complement, not conflict with, your existing robots.txt
:
robots.txt (for search engines):
User-agent: *
Disallow: /admin/
Disallow: /internal/
Sitemap: https://docs.example.com/sitemap.xml
llms.txt (for AI crawlers):
User-Agent: *
Disallow: /admin/
Disallow: /internal/
Disallow: /docs/v1/ # Additional AI-specific restrictions
Allow: /docs/v2/
Content-Type: technical-documentation
Most AI crawlers respect both files, so maintain consistency where possible.
Real-world examples
API-first company (Stripe-style):
User-Agent: *
Allow: /docs/api/
Allow: /docs/guides/
Allow: /docs/webhooks/
Disallow: /docs/legacy/
Disallow: /docs/internal/
Content-Type: technical-documentation
Audience: developers
License: proprietary
Attribution: https://docs.stripe.com
Open source project:
User-Agent: *
Allow: /docs/
Allow: /api/
Allow: /examples/
Disallow: /docs/drafts/
Content-Type: technical-documentation
License: MIT
Repository: https://github.com/org/project
Enterprise platform:
User-Agent: *
Allow: /docs/public/
Allow: /api/reference/
Disallow: /docs/enterprise/ # Gated content
Disallow: /docs/beta/
Content-Type: technical-documentation
Audience: developers, system-administrators
Implementation checklist
Phase 1: Basic setup
- ✅ Create
llms.txt
at your docs domain root - ✅ Disallow deprecated API versions and internal paths
- ✅ Allow current documentation and API references
- ✅ Add basic metadata (Content-Type, Attribution, License)
- ✅ Test file accessibility and syntax
Phase 2: Content optimization
- ✅ Pair all UI-based instructions with equivalent API calls
- ✅ Add structured frontmatter to all tutorial and reference pages
- ✅ Include realistic, complete code examples with error handling
- ✅ Organize repository structure for ingestion tools
Phase 3: Advanced features
- ✅ Implement page-level inline instructions where needed
- ✅ Set up monitoring for crawl patterns and compliance
- ✅ Create automation for updating rules during releases
- ✅ Establish review process for AI-specific content changes
Looking ahead
The shift toward AI-assisted development is accelerating. Documentation that works well with AI systems will become a competitive advantage, while docs that ignore this trend will become increasingly irrelevant.
Standards like llms.txt
are still evolving. Not all crawlers respect these rules yet, and new patterns for AI-friendly content are emerging rapidly. Stay informed by following:
- The llms.txt specification as it develops
- AI company announcements about crawler behavior
- Developer community discussions about documentation best practices
Your documentation strategy should anticipate a world where most developer interactions with your content happen through AI intermediaries. By implementing these practices now, you ensure those interactions are accurate, helpful, and aligned with your current product reality.
Your documentation strategy should anticipate a world where most developer interactions with your content happen through AI intermediaries. By implementing these practices now, you ensure those interactions are accurate, helpful, and aligned with your current product reality.
In the next post, we’ll go deeper: how to structure and publish AI-ready docs, so when a developer asks an assistant for help, the answer isn’t just “close.” It’s exactly what your docs say.
For deeper insights into this transformation, watch Andrej Karpathy’s excellent talk on building docs for the AI era.