If you’ve checked your server logs lately, you’ve probably noticed a flood of new bot names: GPTBot, ClaudeBot, PerplexityBot, CCBot, Bytespider. They all have “AI” in common — but they’re doing completely different things to your website.
Some of them are taking your content to train AI models. Others are sending you traffic by indexing your site for AI-powered search. Treating them the same way is a mistake that can cost you visibility or protect your content less effectively.
Here’s how to tell them apart — and what to do about each.
The Two Categories
AI Training Bots — They Take Your Content
These bots crawl your website to collect data used to train large language models (LLMs). When GPTBot visits your site, it’s harvesting text to teach future versions of ChatGPT. Your content becomes part of the training dataset.
Key AI training bots:
| Bot | Company | Model |
|---|---|---|
| GPTBot | OpenAI | GPT-4, future models |
| ClaudeBot | Anthropic | Claude |
| CCBot | Common Crawl | Used by OpenAI, Meta, Google, and dozens more |
| Bytespider | ByteDance (TikTok) | ByteDance AI models |
| Google-Extended | Gemini AI | |
| Meta-ExternalAgent | Meta | Llama models |
What they do with your content:
- Extract text from your pages
- Store it in massive training datasets
- Use it to teach AI models how to generate text
- May reproduce your content in AI outputs (without attribution)
Traffic impact: Zero. These bots don’t send users to your site.
AI Search Bots — They Send You Traffic
These bots crawl your website to power AI-driven search engines. When PerplexityBot visits, it’s indexing your content so it can cite you as a source when users ask relevant questions. Users who see your site cited may click through.
Key AI search bots:
| Bot | Company | Search Product |
|---|---|---|
| PerplexityBot | Perplexity AI | Perplexity.ai |
| OAI-SearchBot | OpenAI | SearchGPT / ChatGPT Search |
| Googlebot | Google Search (also AI Overviews) | |
| Applebot | Apple | Siri, Spotlight, Safari |
What they do with your content:
- Index your pages for AI search results
- Cite your site as a source in answers
- Display your content with attribution and links
- Drive referral traffic when users click citations
Traffic impact: Positive — citations drive clicks.
Why This Distinction Matters
Blocking the wrong bot hurts you
If you block PerplexityBot thinking it’s an AI training crawler, you’ve just removed yourself from Perplexity’s search index. Users asking questions your content could answer won’t see your site. That’s lost traffic.
If you allow GPTBot thinking it might send traffic — it won’t. OpenAI’s training crawler doesn’t index your site for search results. It only collects data.
The same company can operate both types
OpenAI is the clearest example:
- GPTBot = training crawler (takes content, no traffic)
- OAI-SearchBot = search indexer (drives traffic via SearchGPT)
- ChatGPT-User = real-time browsing (when a user asks ChatGPT to visit a URL)
You might want to block GPTBot but allow OAI-SearchBot — and you can, because they have different user agents.
Google does the same:
- Googlebot = search indexer (allow — drives traffic)
- Google-Extended = AI training (optional — block if you want to opt out of Gemini training)
Apple too:
- Applebot = Siri/Spotlight indexer (allow — drives visibility)
- Applebot-Extended = Apple Intelligence training (block if you want to opt out)
How to Block AI Training Bots Only
If you want to opt out of AI training while staying visible in AI search, use this robots.txt configuration:
# Block AI training crawlers
User-agent: GPTBot
User-agent: ClaudeBot
User-agent: anthropic-ai
User-agent: CCBot
User-agent: Bytespider
User-agent: Google-Extended
User-agent: Applebot-Extended
User-agent: Meta-ExternalAgent
Disallow: /
# AI search bots — allow (they drive traffic)
# PerplexityBot: no rules needed, allowed by default
# OAI-SearchBot: no rules needed, allowed by default
# Applebot: no rules needed, allowed by default
How to Block All AI Bots (Training + Search)
If you want maximum control and opt out of everything AI:
# Block all AI training bots
User-agent: GPTBot
User-agent: ClaudeBot
User-agent: anthropic-ai
User-agent: CCBot
User-agent: Bytespider
User-agent: Google-Extended
User-agent: Applebot-Extended
User-agent: Meta-ExternalAgent
Disallow: /
# Block AI search bots
User-agent: PerplexityBot
User-agent: OAI-SearchBot
Disallow: /
Note: Blocking Applebot (not just Applebot-Extended) would remove you from Siri and Spotlight — a significant trade-off.
Should You Block AI Training Bots?
This is a decision that depends on your situation:
Block AI training bots if:
- You’re a content creator or publisher protecting intellectual property
- You run a news, media, or subscription site
- You object to unpaid commercial use of your content
- You want to license your data rather than give it away
Allow AI training bots if:
- You’re comfortable contributing to AI development
- Your content is already widely distributed
- You’re an open-source or educational resource
- You want to potentially influence AI model behavior
Should You Block AI Search Bots?
Almost certainly not — unless you have specific concerns about content summarization reducing your page views (a legitimate issue for some publishers).
Allow AI search bots if:
- You want visibility in AI-powered search results
- You want citation-based referral traffic
- You want to reach users of Perplexity, SearchGPT, etc.
Block AI search bots if:
- You run paywalled content being summarized without payment
- You’ve negotiated or want to negotiate licensing deals
- You have specific objections to AI summarization
Quick Reference
| Bot | Type | Blocks Training? | Sends Traffic? | Recommend |
|---|---|---|---|---|
| GPTBot | Training | Yes | No | Block (optional) |
| ClaudeBot | Training | Yes | No | Block (optional) |
| CCBot | Training | Yes (many companies) | No | Block (optional) |
| Bytespider | Training | Yes | No | Block (recommended) |
| PerplexityBot | Search | No | Yes | Allow |
| OAI-SearchBot | Search | No | Yes | Allow |
| Applebot | Search | No | Yes | Allow |
| Applebot-Extended | Training | Yes | No | Block (optional) |
Check Your Current Bot Access
Use our AI Bot Checker to see which AI training and search bots can currently access your website, and whether your robots.txt is correctly configured.
Related Bot Guides:
- GPTBot - OpenAI’s training crawler
- ClaudeBot - Anthropic’s training crawler
- CCBot - Common Crawl dataset
- Bytespider - ByteDance’s aggressive crawler
- PerplexityBot - Perplexity AI search crawler