Allen Institute for AI
AI2Bot
Academic crawler supporting open AI research initiatives like Semantic Scholar and the Dolma multi-trillion token dataset.
Purpose: Scientific and open-source AI research
📊 Popularity & Traffic
Smaller volume than commercial bots but highly significant for open-source AI benchmarks.
🤖 User Agent Strings
Use these patterns to identify AI2Bot in your server logs or configure your robots.txt file.
AI2Bot
Respects robots.txtGeneral research crawler
AI2BotAI2Bot-Dolma
Respects robots.txtCrawler for the Dolma open dataset
AI2Bot-Dolma🌐 IP Ranges
Source: Allen Institute infrastructure
No specific IP ranges published. Identify this bot using the User Agent strings above.
📝 Robots.txt Configuration
Add the following to your robots.txt file to block AI2Bot:
User-agent: AI2Bot
Disallow: /💡 Important Notes
- AI2 is a non-profit founded by Paul Allen with a focus on 'AI for the common good'
- Used to build the OLMo (Open Language Model) datasets
- Known to be very respectful of site owner preferences and robots.txt
Beyond blocking crawlers
See what AI is saying about your brand
Understanding crawlers is step one. With Aiso, you can see the actual conversations happening about your brand inside ChatGPT, Claude, and Perplexity.