ToolPilot

AI Training Opt-Out Generator

Generate robots.txt rules, ai.txt, and meta tags to block AI crawlers (GPTBot, CCBot, Google-Extended, Bytespider) from training on your website content.

Select AI Crawlers to Block

Blocking 15 of 18 crawlers

# AI Training Crawler Opt-Out Rules
# Generated by ToolPilot — 2026-03-10
# Blocking 15 AI training crawlers

# OpenAI — GPT model training
User-agent: GPTBot
Disallow: /

# Google — Gemini / AI training
User-agent: Google-Extended
Disallow: /

# Google — Vertex AI grounding
User-agent: Google-CloudVertexBot
Disallow: /

# Common Crawl — Training dataset (used by many AI companies)
User-agent: CCBot
Disallow: /

# ByteDance — TikTok / AI model training
User-agent: Bytespider
Disallow: /

# Anthropic — Claude model training
User-agent: ClaudeBot
Disallow: /

# Anthropic — Anthropic web fetching
User-agent: anthropic-ai
Disallow: /

# Apple — Apple Intelligence training
User-agent: Applebot-Extended
Disallow: /

# Meta — Meta AI training
User-agent: Meta-ExternalAgent
Disallow: /

# Meta — Meta AI content fetching
User-agent: Meta-ExternalFetcher
Disallow: /

# Amazon — Alexa / AI training
User-agent: Amazonbot
Disallow: /

# Cohere — Cohere model training
User-agent: cohere-ai
Disallow: /

# Diffbot — Web data extraction for AI
User-agent: Diffbot
Disallow: /

# Webz.io — Data collection for AI datasets
User-agent: Omgilibot
Disallow: /

# Timpi — Decentralized search AI training
User-agent: Timpibot
Disallow: /

# Allow regular search engine crawlers
User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: *
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

How to Block AI Crawlers from Training on Your Website

AI companies like OpenAI, Google, Meta, Anthropic, and ByteDance use web crawlers to scrape website content for training their AI models. If you haven't explicitly blocked these crawlers, your content is likely already being used — and new content will continue to be scraped.

The primary defense is robots.txt — a standard file that tells crawlers what they can and cannot access. Our generator creates comprehensive robots.txt rules for all known AI training crawlers, plus ai.txt (a newer standard specifically for AI crawler policies) and HTML meta tags for page-level control.

Important: robots.txt is a voluntary standard. Well-behaved crawlers (GPTBot, Google-Extended) respect it, but not all crawlers do. Combining robots.txt with meta tags and ai.txt provides the strongest protection currently available.

This tool stays updated with the latest AI crawler user-agents as new ones emerge. All generation happens in your browser — we don't see or store your domain name.

Frequently Asked Questions

Will this actually stop AI from training on my content?
It blocks well-behaved crawlers like GPTBot (OpenAI), Google-Extended (Google AI), CCBot (Common Crawl), and Bytespider (ByteDance). However, robots.txt is voluntary — some crawlers may ignore it. It's the strongest protection currently available without technical measures like Cloudflare bot blocking.
What is ai.txt?
ai.txt is an emerging standard (similar to robots.txt) specifically for communicating AI training preferences. It lets you specify whether you consent to AI training, what types of AI use you allow, and your licensing preferences.
Will blocking AI crawlers hurt my SEO?
No. Google Search uses Googlebot for indexing, which is separate from Google-Extended (used for AI training). Blocking Google-Extended does not affect your search rankings.
Do I need to block all crawlers?
Not necessarily. You might want to allow some AI crawlers (for example, if you want your content to appear in AI search results) while blocking others. Our tool lets you choose which crawlers to block.
Where do I put these files?
robots.txt goes at the root of your website (e.g., yoursite.com/robots.txt). ai.txt also goes at the root. Meta tags go in the <head> section of your HTML pages.

Related Tools