Google
Googlebot
Google's primary web crawler for search indexing. Also used to gather data for Google's AI models like Gemini and Bard.
Purpose: Search indexing and AI model training
📊 Popularity & Traffic
#1Ranking among AI crawlers
~4.5% of all HTML requests, >25% of verified bot trafficTraffic share
The largest crawler on the web. Traffic grew ~96% YoY as Google ramped up AI training.
🤖 User Agent Strings
Use these patterns to identify Googlebot in your server logs or configure your robots.txt file.
Googlebot
Respects robots.txtPrimary Google crawler for search and AI
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)Google-Extended
Respects robots.txtRobots.txt token to control AI training usage (not a crawler itself)
Google-Extended🌐 IP Ranges
Source: DNS verification (googlebot.com PTR lookups)
Official source fileNo specific IP ranges published. Identify this bot using the User Agent strings above.
📝 Robots.txt Configuration
Add the following to your robots.txt file to block Googlebot:
# Block AI training but allow search indexing:
User-agent: Google-Extended
Disallow: /
# Block everything from Google:
User-agent: Googlebot
Disallow: /💡 Important Notes
- Google-Extended is NOT a separate crawler - it's a robots.txt token for AI opt-out
- Disallowing Google-Extended blocks AI training but allows normal search indexing
- Disallowing Googlebot removes your site from Google Search entirely
- Opting out via Google-Extended does not affect search ranking or SGE results
Beyond blocking crawlers
See what AI is saying about your brand
Understanding crawlers is step one. With Aiso, you can see the actual conversations happening about your brand inside ChatGPT, Claude, and Perplexity.