Free tool · 2026
Block AI from training on your site.
Pick which AI crawlers to allow or block — GPTBot, ClaudeBot, Google-Extended, Perplexity, Bytespider, and 15+ more. Get a clean, copy-pasteable robots.txt with sources documented inline.
Presets:
Training crawlers
Index your site to train future LLMs. Block these if you don't want your content shaping next-gen models.
AI search indexers
Index your site to power AI search/answer products. Allowing these is the new SEO — your pages can be cited.
On-demand fetchers
Fetch a page only when a user clicks/asks for it inside a chat product. Blocking these can hurt user experience without much training-data benefit.
robots.txt · 13 bots blocked
# Generated by https://klaws.app/tools/ai-crawler-robots# Last updated: 2026-04-28# --- OpenAI ---User-agent: GPTBotDisallow: /# --- Anthropic ---User-agent: ClaudeBotDisallow: /User-agent: anthropic-aiDisallow: /# --- Google ---User-agent: Google-ExtendedDisallow: /# --- Common Crawl ---User-agent: CCBotDisallow: /# --- Meta ---User-agent: FacebookBotDisallow: /User-agent: Meta-ExternalAgentDisallow: /# --- Apple ---User-agent: Applebot-ExtendedDisallow: /# --- ByteDance ---User-agent: BytespiderDisallow: /# --- Cohere ---User-agent: cohere-aiDisallow: /# --- Amazon ---User-agent: AmazonbotDisallow: /# --- Diffbot ---User-agent: DiffbotDisallow: /# --- Webz.io ---User-agent: omgilibotDisallow: /# --- Path-level rules (all crawlers) ---User-agent: *Disallow: /private/Disallow: /admin/
Custom path rules (optional)
How to install
- Copy the generated
robots.txtabove (or click Download). - Place it at the root of your domain so it's served at
https://yourdomain.com/robots.txt. - Most static hosts (Vercel, Netlify, Cloudflare Pages, GitHub Pages) just want the file in your
public/directory. - Verify with:
curl https://yourdomain.com/robots.txt. - Most major crawlers will pick up changes within a few hours.
Agents that respect the rules
Klaws agents read robots.txt — yours and everyone else's.
When a Klaws agent fetches a page on your behalf — for research, monitoring, or content drafting — it identifies as KlawsBot and obeys the destination's robots.txt. Set per-task allowlists and rate limits in the dashboard.
FAQ
AI crawler questions
Do AI crawlers actually respect robots.txt?+
The major ones documented here (OpenAI, Anthropic, Google, Apple, Meta, Perplexity, Cohere) publicly commit to respecting robots.txt. Smaller and adversarial scrapers often don't. Robots.txt is a polite request, not a security control — treat it that way.
Will blocking GPTBot hurt my Google ranking?+
No. GPTBot is OpenAI's training crawler. Google's regular search crawler is Googlebot, which is unrelated. Blocking Google-Extended specifically opts you out of training Gemini/Vertex without affecting Google Search ranking — that's the point of having it as a separate user agent.
What's the difference between a training crawler and a search/answer crawler?+
Training crawlers (GPTBot, ClaudeBot, CCBot, Google-Extended, etc.) feed model training. Search crawlers (OAI-SearchBot, PerplexityBot) index pages to be cited inside the chat product when a user asks a related question. If you want traffic from AI search, allow the second group; if you don't want your content to train models, block the first.
Where do I put the generated robots.txt?+
At the root of your domain, served at https://yourdomain.com/robots.txt. It must be plain text and accessible without authentication. Most static-site hosts (Netlify, Vercel, Cloudflare Pages) just want a robots.txt file in the public root.
Why does the file list each user agent separately instead of grouping them?+
Most modern crawlers handle grouped User-agent lines correctly, but a few older parsers don't. Listing each on its own User-agent block is the most compatible format and the easiest to diff later.
Does Klaws respect robots.txt?+
Yes. When a Klaws agent fetches a URL on your behalf, it identifies as KlawsBot and obeys robots.txt. You can also set per-task allowlists/blocklists in the dashboard for finer control.