Google

Googlebot

Google's primary web crawler for search indexing. Also used to gather data for Google's AI models like Gemini and Bard.

Purpose: Search indexing and AI model training

Quick Facts

Company
Google
Respects robots.txt
Yes
Last Updated
2025-05
Official Documentation

📊 Popularity & Traffic

#1Ranking among AI crawlers
~4.5% of all HTML requests, >25% of verified bot trafficTraffic share

The largest crawler on the web. Traffic grew ~96% YoY as Google ramped up AI training.

🤖 User Agent Strings

Use these patterns to identify Googlebot in your server logs or configure your robots.txt file.

Googlebot

Respects robots.txt

Primary Google crawler for search and AI

Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

Google-Extended

Respects robots.txt

Robots.txt token to control AI training usage (not a crawler itself)

Google-Extended

🌐 IP Ranges

Source: DNS verification (googlebot.com PTR lookups)
Official source file

No specific IP ranges published. Identify this bot using the User Agent strings above.

📝 Robots.txt Configuration

Add the following to your robots.txt file to block Googlebot:

# Block AI training but allow search indexing:
User-agent: Google-Extended
Disallow: /

# Block everything from Google:
User-agent: Googlebot
Disallow: /

💡 Important Notes

  • Google-Extended is NOT a separate crawler - it's a robots.txt token for AI opt-out
  • Disallowing Google-Extended blocks AI training but allows normal search indexing
  • Disallowing Googlebot removes your site from Google Search entirely
  • Opting out via Google-Extended does not affect search ranking or SGE results
Beyond blocking crawlers

See what AI is saying about your brand

Understanding crawlers is step one. With Aiso, you can see the actual conversations happening about your brand inside ChatGPT, Claude, and Perplexity.