Most people know Googlebot — it’s the crawler that indexes your site for Google Search. But since 2023, Google has been operating a second crawler: Google-Extended. It has a completely different purpose, and you can control it independently.

Understanding the difference between these two crawlers is essential for anyone managing their relationship with Google’s AI products.

The Short Version

Googlebot Google-Extended
Purpose Google Search indexing Gemini AI + Bard training data
Blocks SEO? Yes, if blocked No
Blocks AI training? No Yes
Should you block? Almost never Optional
User agent Googlebot Google-Extended

Blocking Googlebot kills your search rankings. Blocking Google-Extended has no impact on SEO — it only opts you out of AI training data collection.

What Is Googlebot?

Googlebot is Google’s primary search crawler. It’s been crawling the web since 1998 and is the backbone of Google Search.

When Googlebot visits your site:

  • It renders your HTML, CSS, and JavaScript
  • It indexes your content for Google Search
  • It follows links to discover new pages
  • It updates its index when your content changes

Traffic impact: Massive. Blocking Googlebot means your pages don’t appear in Google Search. For most websites, this is traffic suicide.

Googlebot variants:

  • Googlebot — main desktop crawler
  • Googlebot-Smartphone — mobile crawler
  • Googlebot-Image — Google Image Search
  • Googlebot-Video — Google Video
  • Googlebot-News — Google News
  • Google-InspectionTool — Search Console testing

What Is Google-Extended?

Google-Extended was introduced in September 2023, shortly after OpenAI launched GPTBot. It’s a separate user agent that allows webmasters to opt out of Google’s AI training data collection without affecting regular Googlebot crawling.

When Google-Extended visits your site:

  • It collects content to train Google’s AI models (Gemini, formerly Bard)
  • It powers Google’s Vertex AI and other ML products
  • It is not used for Google Search indexing
  • Blocking it has zero impact on your search rankings

Google created this separate user agent specifically so webmasters could make an independent choice about AI training without having to choose between “allow everything” or “block Google entirely.”

Why Google Created Two Crawlers

Before Google-Extended existed, webmasters faced an impossible choice:

  • Allow Googlebot → maintain search rankings but also feed AI training
  • Block Googlebot → lose all Google traffic

Google-Extended solves this by separating the two functions. Now you can:

  • Allow Googlebot → stay in Google Search
  • Block Google-Extended → opt out of Gemini AI training

This is actually the most publisher-friendly approach any major AI company has taken — more flexible than OpenAI’s single GPTBot or Anthropic’s single ClaudeBot.

How to Control Each Crawler

Allow both (default — no action needed)

If you don’t add any rules, both crawlers are allowed. Your site gets indexed and your content may be used for AI training.

User-agent: Google-Extended
Disallow: /

This removes you from Gemini’s training data while keeping full Google Search indexing. No SEO impact.

Block specific sections from Google-Extended

User-agent: Google-Extended
Disallow: /premium/
Disallow: /paid-content/
Allow: /
User-agent: Googlebot
Disallow: /

This kills your Google Search presence. Only appropriate for staging sites, private tools, or sites that explicitly don’t want Google traffic.

Block Googlebot-specific variants

You can control individual Googlebot variants:

# Block Google Image Search
User-agent: Googlebot-Image
Disallow: /

# Block Google News
User-agent: Googlebot-News
Disallow: /news/

# Keep main Googlebot allowed
User-agent: Googlebot
Allow: /

Does Blocking Google-Extended Affect AI Overviews?

Google AI Overviews (formerly Search Generative Experience) are a different case. They pull information from Google’s existing search index — content Googlebot has already crawled — rather than from Google-Extended’s dataset.

Blocking Google-Extended does NOT remove you from AI Overviews.

To influence AI Overviews, you’d need to use the nosnippet meta tag or robots meta directives, which affect how Googlebot processes your content for snippets.

Does Blocking Google-Extended Affect Google Bard/Gemini Answers?

Partially. Google-Extended collects data for training Gemini’s underlying model. If you block it, your content may eventually be less represented in Gemini’s training data — but this is a long-term effect, not immediate.

Gemini’s answers also pull from Google’s live search index, so Googlebot-crawled content still influences responses.

Should You Block Google-Extended?

Block Google-Extended if:

  • You’re a content creator or publisher with copyright concerns
  • You run a news or media site that hasn’t entered a licensing deal with Google
  • You’re blocking all AI training crawlers as policy (GPTBot, ClaudeBot, CCBot, etc.)
  • You want to negotiate data licensing terms rather than give content away

Allow Google-Extended if:

  • You’re comfortable with Google using your content for AI training
  • You want maximum contribution to Google’s AI products
  • You’re an educational or research organization supporting open AI development

Key point: Unlike blocking Googlebot, blocking Google-Extended has no downside for SEO. If you’re uncertain, blocking it is the lower-risk choice.

Comparison With Other AI Training Crawlers

Google-Extended is actually the most transparent implementation in the industry:

Crawler Company Separate from search? robots.txt controlled?
Google-Extended Google Yes ✓ Yes ✓
Applebot-Extended Apple Yes ✓ Yes ✓
GPTBot OpenAI N/A (no search) Yes ✓
ClaudeBot Anthropic N/A (no search) Yes ✓
CCBot Common Crawl N/A Yes ✓
Bytespider ByteDance N/A Inconsistent ⚠️

The separate user agent approach by Google and Apple is genuinely better for publishers than a single crawler that does both.

Verify Googlebot Is Real

Fake Googlebots are common — scrapers often spoof Google’s user agent assuming sites won’t block “Googlebot.” Always verify:

# Step 1: reverse DNS
host [IP address]
# Should return: crawl-xxx-xxx-xxx-xxx.googlebot.com

# Step 2: forward confirmation
host crawl-xxx-xxx-xxx-xxx.googlebot.com
# Should return the original IP

Legitimate Googlebot resolves to googlebot.com or google.com domains.


Check Your Google Crawler Access

Use our SEO Bot Checker to verify Googlebot access and AI Bot Checker to check Google-Extended status.

Related guides: