What is GPTBot?

GPTBot is OpenAI’s official web crawler, used to collect data from the public internet for training future AI models — including ChatGPT. It was first publicly documented in August 2023, when OpenAI published its user agent and IP range details.

Unlike ChatGPT’s browsing feature (which fetches pages on behalf of users), GPTBot runs autonomously in the background to collect training data at scale.

What Does GPTBot Do?

GPTBot crawls publicly accessible web pages to:

  • Build training datasets for future versions of GPT models
  • Improve model quality by exposing it to diverse text content
  • Discover new content across the web continuously

It does NOT crawl on behalf of users in real-time. If you see GPTBot in your logs, OpenAI is collecting your content for model training — not to answer someone’s question.

User Agent

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.1; +https://openai.com/gptbot)

Older version:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)

GPTBot IP Ranges

OpenAI publishes their official IP ranges at: https://openai.com/gptbot-ranges.txt

You can also verify GPTBot with reverse DNS — legitimate requests resolve to OpenAI’s infrastructure.

Should You Block GPTBot?

This is one of the most debated bot decisions in 2025–2026. Here’s the breakdown:

Allow GPTBot if:

  • You want your content to influence future AI model behavior
  • You’re a researcher or educator wanting broad reach
  • You support open AI development
  • You don’t have copyright concerns about your content

Block GPTBot if:

  • You’re a publisher or content creator protecting intellectual property
  • You object to unpaid commercial use of your content
  • You run a news, media, or subscription-based site
  • You want to opt out of AI training entirely
  • You’re concerned about your content being reproduced in AI outputs

Important: Many major publishers (NYT, BBC, Reuters) have blocked GPTBot citing copyright and fair compensation concerns.

How to Block GPTBot

Add to your robots.txt:

User-agent: GPTBot
Disallow: /

Block specific sections only:

User-agent: GPTBot
Disallow: /private/
Disallow: /premium/
Disallow: /members/
Allow: /blog/
Allow: /

How to Block GPTBot at Server Level

Nginx

if ($http_user_agent ~* "GPTBot") {
    return 403;
}

Apache (.htaccess)

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} GPTBot [NC]
RewriteRule .* - [F,L]

Cloudflare (WAF Rule)

Create a WAF custom rule:

  • Field: User Agent
  • Operator: Contains
  • Value: GPTBot
  • Action: Block

How to Verify It’s Real GPTBot

User agent strings can be spoofed. To verify:

# Step 1: reverse DNS lookup
host [IP address]
# Should return something like: crawl-xxx.openai.com

# Step 2: forward DNS confirmation
host crawl-xxx.openai.com
# Should return the original IP

Legitimate GPTBot always resolves back to OpenAI infrastructure.

Does GPTBot Respect robots.txt?

Yes — OpenAI has stated that GPTBot respects the robots.txt standard. In practice, most reports from webmasters confirm it does honor Disallow directives.

However, data already crawled before you added the block may have been used in previous training runs.

GPTBot vs ChatGPT-User vs OAI-SearchBot

OpenAI operates multiple bots with different purposes:

Bot Purpose Should You Block?
GPTBot AI model training Optional — consider copyright
ChatGPT-User User-initiated browsing Usually allow (drives traffic)
OAI-SearchBot SearchGPT indexing Usually allow (search visibility)

Blocking GPTBot does NOT prevent your site from appearing in ChatGPT search results — those are handled by different bots.

Is GPTBot Harmful?

GPTBot is not malicious in the traditional sense — it won’t attack your server or steal credentials. However:

  • It can consume significant bandwidth on large sites
  • It collects your content without compensation
  • Your content may be reproduced in AI outputs without attribution
  • It may ingest content behind soft paywalls if accessible via URL

Crawl Volume

GPTBot is one of the more aggressive AI training crawlers. Sites report:

  • Hundreds to thousands of requests per day
  • Crawls every few days to weekly for active sites
  • Multiple concurrent requests from different IPs

What Percentage of Sites Block GPTBot?

Since its launch in 2023, adoption of GPTBot blocks has grown significantly:

  • Within weeks of launch, thousands of major sites added blocks
  • Studies show 20-30%+ of top websites now block GPTBot
  • Media and news sites have the highest blocking rates

Test GPTBot Access to Your Site

Use our AI Bot Checker to verify if GPTBot can access your website and which pages are exposed to OpenAI’s crawler.

Related AI Training Bots:

  • ClaudeBot - Anthropic’s AI training crawler
  • CCBot - Common Crawl, used by many AI companies
  • Bytespider - ByteDance/TikTok AI training bot

AI Search Bots (different purpose — drives traffic):

For comprehensive bot testing, explore our free bot detection tools.