As AI tools like ChatGPT become increasingly integrated into how people find and consume information online, understanding what these systems can and cannot "see" on your website is crucial for content creators, marketers, and developers.
With the rise of AI-powered search and retrieval systems like ChatGPT's Search and Bing-powered search features, the way users discover and interact with web content is fundamentally changing. Instead of browsing through search results and visiting websites directly, users increasingly ask AI assistants to find and summarize information for them.
Why This Matters
If ChatGPT can't "see" certain content on your website, that content effectively becomes invisible to users who rely on AI assistants to find information. Understanding these limitations is crucial for ensuring your content remains discoverable in an AI-first world.
โCritical Questions We Set Out to Answer
Can AI systems like ChatGPT effectively "see" and retrieve all the content on my website?
What types of content structures or technical implementations might prevent AI systems from accessing my content?
How can I optimize my website to ensure it's fully accessible to AI retrieval systems?
๐ฌResearch Methodology
We conducted a methodical audit of ChatGPT's web retrieval capabilities across a series of controlled test websites with varying levels of complexity:
Built five test websites with controlled content structures
Each site contained both general content and specific "marker content" with unique identifiers
Deployed all sites using Vercel connected to GitHub repositories
Tested ChatGPT's ability to retrieve specific information using "Browse with Bing" feature
Documented results to understand what ChatGPT could and couldn't access
๐งชTest Sites Overview
Site 1: Text-Only Website
SuccessPurpose: Baseline test for simple content retrieval
Description: Basic static HTML with plain text content to establish retrieval baseline
Site 2: Blog Article Website
FailedPurpose: Test link following with generic titles
Description: Main page with links to blog articles using non-descriptive titles
Site 3: Blog Title Hint Website
SuccessPurpose: Test link following with descriptive titles
Description: Blog articles with explicit, keyword-rich titles in link text
Site 4: Blog Content Hint Website
FailedPurpose: Test partial title matching
Description: Articles with titles containing partial query matches
Site 5: JavaScript-Hidden Content
MixedPurpose: Test JavaScript execution capabilities
Description: Content hidden in tabs, accordions, and dynamically loaded sections
Site 6: Image-Based Content (OCR)
FailedPurpose: Test text extraction from images
Description: Critical information embedded only in image files
Site 7: White-on-White Text
SuccessPurpose: Test HTML parsing vs visual rendering
Description: Text hidden with CSS styling but present in HTML
๐Key Findings
Static Content Accessibility
ChatGPT reliably retrieves static HTML content on main pages
Link Following Behavior
Only follows links with explicit, relevant titles - ignores generic titles
JavaScript Limitations
Cannot execute JavaScript for dynamic content loading
Search Engine Workaround for JavaScript Content
OpenAI uses Bing and Google searches as a workaround to access JavaScript-rendered pages that have been indexed by search engines
CSS vs HTML Processing
Processes HTML content regardless of CSS styling/visibility
Image Text Extraction
Cannot extract text from images, may hallucinate responses
โ ๏ธImportant Caveat: Search Engine Workarounds
Good news for JavaScript-heavy sites: While our direct testing shows ChatGPT cannot execute JavaScript, OpenAI uses Bing and Google searches as workarounds to access pages that are behind JavaScript but have been indexed by search engines.
What This Means for Your Site
Don't panic if you have JavaScript content - first ensure these pages are properly indexed by Bing and Google
Optimize further by moving critical content above the JavaScript fold - especially parts that answer questions users commonly ask ChatGPT (Aiso can help identify these)
Monitor developments - we expect OpenAI to improve their crawlers over time
๐ฎFuture Outlook
Experts like Elie Berreby expect AI crawlers to improve their JavaScript rendering capabilities over time. However, the timeline remains uncertain and this doesn't appear to be a top priority on OpenAI's current roadmap.
๐Complete Audit Results
Content Type | ChatGPT Can Access | Notes |
---|---|---|
Static content on main page | Yes โ | Reliably retrieved across all test sites |
Content on linked pages (generic titles) | No โ | Does not follow links without explicit relevance signals |
Content on linked pages (explicit titles) | Yes โ | Will follow links when titles clearly indicate relevance |
Content on linked pages (vague titles) | No โ | Partial matches in titles are insufficient |
JavaScript tab content | No โ | Cannot execute JavaScript to reveal tabbed content |
JavaScript accordion content | No โ | Cannot access content in collapsed accordions |
Dynamically loaded content (after user action) | No โ | Cannot access content loaded via JavaScript or external files |
Text embedded in images | No โ | Cannot extract text from images, may hallucinate responses instead |
Text hidden with CSS (white-on-white) | Yes โ | Processes raw HTML content regardless of visual styling |
๐ฏ6 Key Actions to Optimize Your Website for AI Retrieval
1. Use Explicit, Descriptive Link Text
CriticalEnsure links to important content contain explicit keywords that clearly indicate what information can be found on the linked page
Implementation:
2. Optimize Page Titles
HighExplicit page titles significantly improve content discovery, while vague titles result in content being missed entirely
Implementation:
3. Keep Critical Content in Static HTML
HighChatGPT can access hidden content in tabs and accordions, but content loaded dynamically after user interactions is invisible
Implementation:
4. Place Important Information on Main Pages
MediumContent on main pages is reliably accessed, while secondary pages are only discovered under specific conditions
Implementation:
5. Avoid Image-Only Content for Critical Information
HighChatGPT cannot reliably extract text from images, while it processes HTML content regardless of styling
Implementation:
6. Ensure Search Engine Indexing
MediumEven JavaScript-dependent content can be discoverable if it's properly indexed by search engines
Implementation:
๐กConclusion: Preparing for an AI-First World
Our comprehensive audit of ChatGPT's web retrieval capabilities has revealed both strengths and limitations in how AI systems access and retrieve web content. While ChatGPT can successfully retrieve static content and execute JavaScript to access hidden content in tabs and accordions, it struggles with following links without explicit relevance signals and cannot access content that requires user interactions to load dynamically.
These findings have significant implications for website owners, content creators, and marketers who want to ensure their content remains discoverable in an AI-first world. By implementing the actionable insights from our research, you can optimize your website for AI retrieval and ensure that your valuable content doesn't become invisible to users who rely on AI assistants to find information.
As AI retrieval systems continue to evolve, understanding these capabilities and limitations will become increasingly important for effective digital content strategy. By staying informed about how AI systems interact with web content, you can adapt your approach to ensure your content remains accessible and discoverable, regardless of how users choose to find it.
๐Final Thought
The rise of AI assistants represents a fundamental shift in how users discover and consume web content. Just as websites had to adapt to mobile devices and search engine algorithms in the past, they must now adapt to AI retrieval systems to remain visible and relevant in the evolving digital landscape.
๐งOptimize Your Website for AI Discovery
Ready to ensure your website is fully optimized for AI retrieval systems? Implement these findings to maintain visibility in an AI-powered world.
๐จโ๐ปAbout the Author
Ben Tannenbaum
Ben Tannenbaum is the founder of Aiso, a marketing tech company helping brands be visible in AI responses. With expertise in AI search optimization and content strategy, Ben helps businesses adapt to the evolving landscape of AI-powered search.