Googlebot vs Google-Extended: Google's Two Crawlers Explained

Most people know Googlebot — it’s the crawler that indexes your site for Google Search. But since 2023, Google has been operating a second crawler: Google-Extended. It has a completely different purpose, and you can control it independently.

Understanding the difference between these two crawlers is essential for anyone managing their relationship with Google’s AI products.

The Short Version

	Googlebot	Google-Extended
Purpose	Google Search indexing	Gemini AI + Bard training data
Blocks SEO?	Yes, if blocked	No
Blocks AI training?	No	Yes
Should you block?	Almost never	Optional
User agent	`Googlebot`	`Google-Extended`

Blocking Googlebot kills your search rankings. Blocking Google-Extended has no impact on SEO — it only opts you out of AI training data collection.

What Is Googlebot?

Googlebot is Google’s primary search crawler. It’s been crawling the web since 1998 and is the backbone of Google Search.

When Googlebot visits your site:

It renders your HTML, CSS, and JavaScript
It indexes your content for Google Search
It follows links to discover new pages
It updates its index when your content changes

Traffic impact: Massive. Blocking Googlebot means your pages don’t appear in Google Search. For most websites, this is traffic suicide.

Googlebot variants:

Googlebot — main desktop crawler
Googlebot-Smartphone — mobile crawler
Googlebot-Image — Google Image Search
Googlebot-Video — Google Video
Googlebot-News — Google News
Google-InspectionTool — Search Console testing

What Is Google-Extended?

Google-Extended was introduced in September 2023, shortly after OpenAI launched GPTBot. It’s a separate user agent that allows webmasters to opt out of Google’s AI training data collection without affecting regular Googlebot crawling.

When Google-Extended visits your site:

It collects content to train Google’s AI models (Gemini, formerly Bard)
It powers Google’s Vertex AI and other ML products
It is not used for Google Search indexing
Blocking it has zero impact on your search rankings

Google created this separate user agent specifically so webmasters could make an independent choice about AI training without having to choose between “allow everything” or “block Google entirely.”

Why Google Created Two Crawlers

Before Google-Extended existed, webmasters faced an impossible choice:

Allow Googlebot → maintain search rankings but also feed AI training
Block Googlebot → lose all Google traffic

Google-Extended solves this by separating the two functions. Now you can:

Allow Googlebot → stay in Google Search
Block Google-Extended → opt out of Gemini AI training

This is actually the most publisher-friendly approach any major AI company has taken — more flexible than OpenAI’s single GPTBot or Anthropic’s single ClaudeBot.

How to Control Each Crawler

Allow both (default — no action needed)

If you don’t add any rules, both crawlers are allowed. Your site gets indexed and your content may be used for AI training.

Block Google-Extended only (recommended if you want to opt out of AI training)

User-agent: Google-Extended
Disallow: /

This removes you from Gemini’s training data while keeping full Google Search indexing. No SEO impact.

Block specific sections from Google-Extended

User-agent: Google-Extended
Disallow: /premium/
Disallow: /paid-content/
Allow: /

Block Googlebot only (almost never recommended)

User-agent: Googlebot
Disallow: /

This kills your Google Search presence. Only appropriate for staging sites, private tools, or sites that explicitly don’t want Google traffic.

Block Googlebot-specific variants

You can control individual Googlebot variants:

# Block Google Image Search
User-agent: Googlebot-Image
Disallow: /

# Block Google News
User-agent: Googlebot-News
Disallow: /news/

# Keep main Googlebot allowed
User-agent: Googlebot
Allow: /

Does Blocking Google-Extended Affect AI Overviews?

Google AI Overviews (formerly Search Generative Experience) are a different case. They pull information from Google’s existing search index — content Googlebot has already crawled — rather than from Google-Extended’s dataset.

Blocking Google-Extended does NOT remove you from AI Overviews.

To influence AI Overviews, you’d need to use the nosnippet meta tag or robots meta directives, which affect how Googlebot processes your content for snippets.

Does Blocking Google-Extended Affect Google Bard/Gemini Answers?

Partially. Google-Extended collects data for training Gemini’s underlying model. If you block it, your content may eventually be less represented in Gemini’s training data — but this is a long-term effect, not immediate.

Gemini’s answers also pull from Google’s live search index, so Googlebot-crawled content still influences responses.

Should You Block Google-Extended?

Block Google-Extended if:

You’re a content creator or publisher with copyright concerns
You run a news or media site that hasn’t entered a licensing deal with Google
You’re blocking all AI training crawlers as policy (GPTBot, ClaudeBot, CCBot, etc.)
You want to negotiate data licensing terms rather than give content away

Allow Google-Extended if:

You’re comfortable with Google using your content for AI training
You want maximum contribution to Google’s AI products
You’re an educational or research organization supporting open AI development

Key point: Unlike blocking Googlebot, blocking Google-Extended has no downside for SEO. If you’re uncertain, blocking it is the lower-risk choice.

Comparison With Other AI Training Crawlers

Google-Extended is actually the most transparent implementation in the industry:

Crawler	Company	Separate from search?	robots.txt controlled?
Google-Extended	Google	Yes ✓	Yes ✓
Applebot-Extended	Apple	Yes ✓	Yes ✓
GPTBot	OpenAI	N/A (no search)	Yes ✓
ClaudeBot	Anthropic	N/A (no search)	Yes ✓
CCBot	Common Crawl	N/A	Yes ✓
Bytespider	ByteDance	N/A	Inconsistent ⚠️

The separate user agent approach by Google and Apple is genuinely better for publishers than a single crawler that does both.

Verify Googlebot Is Real

Fake Googlebots are common — scrapers often spoof Google’s user agent assuming sites won’t block “Googlebot.” Always verify:

# Step 1: reverse DNS
host [IP address]
# Should return: crawl-xxx-xxx-xxx-xxx.googlebot.com

# Step 2: forward confirmation
host crawl-xxx-xxx-xxx-xxx.googlebot.com
# Should return the original IP

Legitimate Googlebot resolves to googlebot.com or google.com domains.

Check Your Google Crawler Access

Use our SEO Bot Checker to verify Googlebot access and AI Bot Checker to check Google-Extended status.

Related guides:

Googlebot - Complete Googlebot guide
AI Training Bots vs AI Search Bots - Understanding the difference
How to Block All AI Crawlers in 2026 - Complete blocking template
GPTBot - OpenAI’s equivalent training crawler