Most people know Googlebot — it’s the crawler that indexes your site for Google Search. But since 2023, Google has been operating a second crawler: Google-Extended. It has a completely different purpose, and you can control it independently.
Understanding the difference between these two crawlers is essential for anyone managing their relationship with Google’s AI products.
The Short Version
| Googlebot | Google-Extended | |
|---|---|---|
| Purpose | Google Search indexing | Gemini AI + Bard training data |
| Blocks SEO? | Yes, if blocked | No |
| Blocks AI training? | No | Yes |
| Should you block? | Almost never | Optional |
| User agent | Googlebot |
Google-Extended |
Blocking Googlebot kills your search rankings. Blocking Google-Extended has no impact on SEO — it only opts you out of AI training data collection.
What Is Googlebot?
Googlebot is Google’s primary search crawler. It’s been crawling the web since 1998 and is the backbone of Google Search.
When Googlebot visits your site:
- It renders your HTML, CSS, and JavaScript
- It indexes your content for Google Search
- It follows links to discover new pages
- It updates its index when your content changes
Traffic impact: Massive. Blocking Googlebot means your pages don’t appear in Google Search. For most websites, this is traffic suicide.
Googlebot variants:
- Googlebot — main desktop crawler
- Googlebot-Smartphone — mobile crawler
- Googlebot-Image — Google Image Search
- Googlebot-Video — Google Video
- Googlebot-News — Google News
- Google-InspectionTool — Search Console testing
What Is Google-Extended?
Google-Extended was introduced in September 2023, shortly after OpenAI launched GPTBot. It’s a separate user agent that allows webmasters to opt out of Google’s AI training data collection without affecting regular Googlebot crawling.
When Google-Extended visits your site:
- It collects content to train Google’s AI models (Gemini, formerly Bard)
- It powers Google’s Vertex AI and other ML products
- It is not used for Google Search indexing
- Blocking it has zero impact on your search rankings
Google created this separate user agent specifically so webmasters could make an independent choice about AI training without having to choose between “allow everything” or “block Google entirely.”
Why Google Created Two Crawlers
Before Google-Extended existed, webmasters faced an impossible choice:
- Allow Googlebot → maintain search rankings but also feed AI training
- Block Googlebot → lose all Google traffic
Google-Extended solves this by separating the two functions. Now you can:
- Allow Googlebot → stay in Google Search
- Block Google-Extended → opt out of Gemini AI training
This is actually the most publisher-friendly approach any major AI company has taken — more flexible than OpenAI’s single GPTBot or Anthropic’s single ClaudeBot.
How to Control Each Crawler
Allow both (default — no action needed)
If you don’t add any rules, both crawlers are allowed. Your site gets indexed and your content may be used for AI training.
Block Google-Extended only (recommended if you want to opt out of AI training)
User-agent: Google-Extended
Disallow: /
This removes you from Gemini’s training data while keeping full Google Search indexing. No SEO impact.
Block specific sections from Google-Extended
User-agent: Google-Extended
Disallow: /premium/
Disallow: /paid-content/
Allow: /
Block Googlebot only (almost never recommended)
User-agent: Googlebot
Disallow: /
This kills your Google Search presence. Only appropriate for staging sites, private tools, or sites that explicitly don’t want Google traffic.
Block Googlebot-specific variants
You can control individual Googlebot variants:
# Block Google Image Search
User-agent: Googlebot-Image
Disallow: /
# Block Google News
User-agent: Googlebot-News
Disallow: /news/
# Keep main Googlebot allowed
User-agent: Googlebot
Allow: /
Does Blocking Google-Extended Affect AI Overviews?
Google AI Overviews (formerly Search Generative Experience) are a different case. They pull information from Google’s existing search index — content Googlebot has already crawled — rather than from Google-Extended’s dataset.
Blocking Google-Extended does NOT remove you from AI Overviews.
To influence AI Overviews, you’d need to use the nosnippet meta tag or robots meta directives, which affect how Googlebot processes your content for snippets.
Does Blocking Google-Extended Affect Google Bard/Gemini Answers?
Partially. Google-Extended collects data for training Gemini’s underlying model. If you block it, your content may eventually be less represented in Gemini’s training data — but this is a long-term effect, not immediate.
Gemini’s answers also pull from Google’s live search index, so Googlebot-crawled content still influences responses.
Should You Block Google-Extended?
Block Google-Extended if:
- You’re a content creator or publisher with copyright concerns
- You run a news or media site that hasn’t entered a licensing deal with Google
- You’re blocking all AI training crawlers as policy (GPTBot, ClaudeBot, CCBot, etc.)
- You want to negotiate data licensing terms rather than give content away
Allow Google-Extended if:
- You’re comfortable with Google using your content for AI training
- You want maximum contribution to Google’s AI products
- You’re an educational or research organization supporting open AI development
Key point: Unlike blocking Googlebot, blocking Google-Extended has no downside for SEO. If you’re uncertain, blocking it is the lower-risk choice.
Comparison With Other AI Training Crawlers
Google-Extended is actually the most transparent implementation in the industry:
| Crawler | Company | Separate from search? | robots.txt controlled? |
|---|---|---|---|
| Google-Extended | Yes ✓ | Yes ✓ | |
| Applebot-Extended | Apple | Yes ✓ | Yes ✓ |
| GPTBot | OpenAI | N/A (no search) | Yes ✓ |
| ClaudeBot | Anthropic | N/A (no search) | Yes ✓ |
| CCBot | Common Crawl | N/A | Yes ✓ |
| Bytespider | ByteDance | N/A | Inconsistent ⚠️ |
The separate user agent approach by Google and Apple is genuinely better for publishers than a single crawler that does both.
Verify Googlebot Is Real
Fake Googlebots are common — scrapers often spoof Google’s user agent assuming sites won’t block “Googlebot.” Always verify:
# Step 1: reverse DNS
host [IP address]
# Should return: crawl-xxx-xxx-xxx-xxx.googlebot.com
# Step 2: forward confirmation
host crawl-xxx-xxx-xxx-xxx.googlebot.com
# Should return the original IP
Legitimate Googlebot resolves to googlebot.com or google.com domains.
Check Your Google Crawler Access
Use our SEO Bot Checker to verify Googlebot access and AI Bot Checker to check Google-Extended status.
Related guides:
- Googlebot - Complete Googlebot guide
- AI Training Bots vs AI Search Bots - Understanding the difference
- How to Block All AI Crawlers in 2026 - Complete blocking template
- GPTBot - OpenAI’s equivalent training crawler