Allen Institute for AI

AI2Bot

Academic crawler supporting open AI research initiatives like Semantic Scholar and the Dolma multi-trillion token dataset.

Purpose: Scientific and open-source AI research

Quick Facts

Respects robots.txt
Yes
Last Updated
2025-05
Official Documentation

📊 Popularity & Traffic

Smaller volume than commercial bots but highly significant for open-source AI benchmarks.

🤖 User Agent Strings

Use these patterns to identify AI2Bot in your server logs or configure your robots.txt file.

AI2Bot

Respects robots.txt

General research crawler

AI2Bot

AI2Bot-Dolma

Respects robots.txt

Crawler for the Dolma open dataset

AI2Bot-Dolma

🌐 IP Ranges

Source: Allen Institute infrastructure

No specific IP ranges published. Identify this bot using the User Agent strings above.

📝 Robots.txt Configuration

Add the following to your robots.txt file to block AI2Bot:

User-agent: AI2Bot
Disallow: /

💡 Important Notes

  • AI2 is a non-profit founded by Paul Allen with a focus on 'AI for the common good'
  • Used to build the OLMo (Open Language Model) datasets
  • Known to be very respectful of site owner preferences and robots.txt
Beyond blocking crawlers

See what AI is saying about your brand

Understanding crawlers is step one. With Aiso, you can see the actual conversations happening about your brand inside ChatGPT, Claude, and Perplexity.