ClaudeBot: Anthropic's Web Crawler Explained

What is ClaudeBot?

ClaudeBot is Anthropic’s official web crawler used to collect publicly available data from the internet to train Claude AI models. It is separate from Claude’s web browsing capability (which is used by users in real-time).

ClaudeBot crawls the web autonomously to build training datasets. If you see it in your server logs, Anthropic is collecting your content for AI model training.

Anthropic’s Crawler Ecosystem

Anthropic operates several bots with different purposes:

Bot	User Agent	Purpose
ClaudeBot	`claudebot`	General AI training data collection
anthropic-ai	`anthropic-ai`	AI research and data collection
Claude-Web	`Claude-Web`	Claude’s real-time web search
Claude-SearchBot	`Claude-SearchBot`	Claude search indexing
Claude-User	`Claude-User`	User-initiated browsing

This page focuses on ClaudeBot — the training data crawler.

User Agent

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)

Alternative form:

claudebot

What Data Does ClaudeBot Collect?

ClaudeBot crawls publicly accessible pages to collect:

Web page text content
Articles and blog posts
Documentation and technical content
Public discussions and forums

It is designed to skip paywalled content, login-required pages, and content blocked via robots.txt.

Should You Block ClaudeBot?

The considerations are similar to GPTBot:

Allow ClaudeBot if:

You’re comfortable with your content training AI models
You want to contribute to safety-focused AI development
You’re a researcher, educator, or open-knowledge advocate
You don’t have strong copyright concerns

Block ClaudeBot if:

You’re a publisher protecting intellectual property
You object to commercial use of your content without compensation
You run a subscription or premium content site
You want to opt out of all AI training comprehensively
You’re concerned about content reproduction in AI outputs

Note: Anthropic has a Constitutional AI approach and generally treats content respectfully, but the fundamental copyright and compensation concerns remain the same as with other AI trainers.

How to Block ClaudeBot

Add to your robots.txt:

User-agent: ClaudeBot
Disallow: /

To block all Anthropic bots simultaneously:

User-agent: ClaudeBot
User-agent: anthropic-ai
Disallow: /

Block specific sections:

User-agent: ClaudeBot
Disallow: /premium/
Disallow: /members/
Disallow: /paid-content/
Allow: /

How to Block at Server Level

Nginx

if ($http_user_agent ~* "(ClaudeBot|anthropic-ai)") {
    return 403;
}

Apache (.htaccess)

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} "ClaudeBot|anthropic-ai" [NC]
RewriteRule .* - [F,L]

Cloudflare

Create a WAF rule matching User Agent containing ClaudeBot or anthropic-ai, set action to Block.

Verifying ClaudeBot

To confirm a request is actually from Anthropic:

# Reverse DNS
host [IP address]
# Should resolve to Anthropic infrastructure

# Forward confirmation
host [resolved hostname]
# Should return the original IP

Anthropic publishes documentation about their crawlers at their official website.

Does ClaudeBot Respect robots.txt?

Yes. Anthropic has stated that ClaudeBot honors robots.txt directives. This is consistent with community reports — blocking via robots.txt is effective.

ClaudeBot vs Claude’s Search Features

An important distinction:

ClaudeBot — autonomous training crawler (covered here)
Claude-Web / Claude-SearchBot — used to answer user questions
Claude-User — when a Claude user asks it to visit a URL

Blocking ClaudeBot with robots.txt will prevent AI training data collection. It will NOT prevent Claude from browsing your site when a user explicitly asks it to — though separate directives can control that too.

Crawl Behavior

ClaudeBot is generally considered a moderate crawler:

Less aggressive than Bytespider
Similar volume to GPTBot
Respects crawl-delay directives

Test ClaudeBot Access to Your Site

Use our AI Bot Checker to verify if ClaudeBot or other Anthropic crawlers can access your website.

Related AI Training Bots:

GPTBot - OpenAI’s AI training crawler
CCBot - Common Crawl dataset (used by many AI companies)
Bytespider - ByteDance aggressive AI crawler

AI Search Bots (different purpose — drives traffic):

PerplexityBot - Perplexity AI search crawler

For comprehensive bot testing, explore our free bot detection tools.