What is ClaudeBot?
ClaudeBot is Anthropic’s official web crawler used to collect publicly available data from the internet to train Claude AI models. It is separate from Claude’s web browsing capability (which is used by users in real-time).
ClaudeBot crawls the web autonomously to build training datasets. If you see it in your server logs, Anthropic is collecting your content for AI model training.
Anthropic’s Crawler Ecosystem
Anthropic operates several bots with different purposes:
| Bot | User Agent | Purpose |
|---|---|---|
| ClaudeBot | claudebot |
General AI training data collection |
| anthropic-ai | anthropic-ai |
AI research and data collection |
| Claude-Web | Claude-Web |
Claude’s real-time web search |
| Claude-SearchBot | Claude-SearchBot |
Claude search indexing |
| Claude-User | Claude-User |
User-initiated browsing |
This page focuses on ClaudeBot — the training data crawler.
User Agent
Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)
Alternative form:
claudebot
What Data Does ClaudeBot Collect?
ClaudeBot crawls publicly accessible pages to collect:
- Web page text content
- Articles and blog posts
- Documentation and technical content
- Public discussions and forums
It is designed to skip paywalled content, login-required pages, and content blocked via robots.txt.
Should You Block ClaudeBot?
The considerations are similar to GPTBot:
Allow ClaudeBot if:
- You’re comfortable with your content training AI models
- You want to contribute to safety-focused AI development
- You’re a researcher, educator, or open-knowledge advocate
- You don’t have strong copyright concerns
Block ClaudeBot if:
- You’re a publisher protecting intellectual property
- You object to commercial use of your content without compensation
- You run a subscription or premium content site
- You want to opt out of all AI training comprehensively
- You’re concerned about content reproduction in AI outputs
Note: Anthropic has a Constitutional AI approach and generally treats content respectfully, but the fundamental copyright and compensation concerns remain the same as with other AI trainers.
How to Block ClaudeBot
Add to your robots.txt:
User-agent: ClaudeBot
Disallow: /
To block all Anthropic bots simultaneously:
User-agent: ClaudeBot
User-agent: anthropic-ai
Disallow: /
Block specific sections:
User-agent: ClaudeBot
Disallow: /premium/
Disallow: /members/
Disallow: /paid-content/
Allow: /
How to Block at Server Level
Nginx
if ($http_user_agent ~* "(ClaudeBot|anthropic-ai)") {
return 403;
}
Apache (.htaccess)
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} "ClaudeBot|anthropic-ai" [NC]
RewriteRule .* - [F,L]
Cloudflare
Create a WAF rule matching User Agent containing ClaudeBot or anthropic-ai, set action to Block.
Verifying ClaudeBot
To confirm a request is actually from Anthropic:
# Reverse DNS
host [IP address]
# Should resolve to Anthropic infrastructure
# Forward confirmation
host [resolved hostname]
# Should return the original IP
Anthropic publishes documentation about their crawlers at their official website.
Does ClaudeBot Respect robots.txt?
Yes. Anthropic has stated that ClaudeBot honors robots.txt directives. This is consistent with community reports — blocking via robots.txt is effective.
ClaudeBot vs Claude’s Search Features
An important distinction:
- ClaudeBot — autonomous training crawler (covered here)
- Claude-Web / Claude-SearchBot — used to answer user questions
- Claude-User — when a Claude user asks it to visit a URL
Blocking ClaudeBot with robots.txt will prevent AI training data collection. It will NOT prevent Claude from browsing your site when a user explicitly asks it to — though separate directives can control that too.
Crawl Behavior
ClaudeBot is generally considered a moderate crawler:
- Less aggressive than Bytespider
- Similar volume to GPTBot
- Respects crawl-delay directives
Test ClaudeBot Access to Your Site
Use our AI Bot Checker to verify if ClaudeBot or other Anthropic crawlers can access your website.
Related AI Training Bots:
- GPTBot - OpenAI’s AI training crawler
- CCBot - Common Crawl dataset (used by many AI companies)
- Bytespider - ByteDance aggressive AI crawler
AI Search Bots (different purpose — drives traffic):
- PerplexityBot - Perplexity AI search crawler
For comprehensive bot testing, explore our free bot detection tools.