What ChatGPT Can (and Cannot) See on Your Website: A Comprehensive Audit

As AI tools like ChatGPT become increasingly integrated into how people find and consume information online, understanding what these systems can and cannot "see" on your website is crucial for content creators, marketers, and developers.

Introduction

With the rise of AI-powered search and retrieval systems like ChatGPT's Search and Bing-powered search features, the way users discover and interact with web content is fundamentally changing. Instead of browsing through search results and visiting websites directly, users increasingly ask AI assistants to find and summarize information for them.

This shift raises critical questions for website owners and content creators:

  • Can AI systems like ChatGPT effectively "see" and retrieve all the content on my website?
  • What types of content structures or technical implementations might prevent AI systems from accessing my content?
  • How can I optimize my website to ensure it's fully accessible to AI retrieval systems?

To answer these questions, we conducted a methodical audit of ChatGPT's web retrieval capabilities across a series of controlled test websites with varying levels of complexity.

Why This Matters: If ChatGPT can't "see" certain content on your website, that content effectively becomes invisible to users who rely on AI assistants to find information. Understanding these limitations is crucial for ensuring your content remains discoverable in an AI-first world.

Methodology

We created a series of increasingly complex test websites, each designed to test specific aspects of ChatGPT's retrieval capabilities:

  1. We built five test websites with controlled content structures
  2. Each site contained both general content and specific "marker content" with unique identifiers
  3. We deployed all sites using Vercel connected to GitHub repositories
  4. We tested ChatGPT's ability to retrieve specific information from each site using the "Browse with Bing" feature
  5. We documented the results to understand what ChatGPT could and couldn't access

Test Sites Overview

We created the following test sites, each with a specific purpose:

Site 1: Text-Only Website

URL: https://ai-retrieval-test-site-1.vercel.app/

Purpose: Establish a baseline for ChatGPT's ability to retrieve static content from a simple, single-page website.

Structure: Basic HTML with plain text content about a fictional country called Zephyria.

Site 2: Blog Article Website

URL: https://ai-retrieval-test-site-2.vercel.app/

Purpose: Test if ChatGPT can navigate to and retrieve content from linked blog pages.

Structure: Main page with links to blog articles containing specific information about fictional artifacts and calendars.

Site 2A: Blog Title Hint Website

URL: https://ai-retrieval-test-site-2a.vercel.app/

Purpose: Test if explicit blog titles improve ChatGPT's ability to find relevant content.

Structure: Similar to Site 2, but with blog titles that explicitly mention key information (e.g., "The Crystalline Codex: 7 Elements of Atlantean Science").

Site 2B: Blog Content Hint Website

URL: https://ai-retrieval-test-site-2b.vercel.app/

Purpose: Test if vague blog titles affect ChatGPT's ability to find relevant content.

Structure: Similar to Site 2, but with vague blog titles that only hint at the content (e.g., "Ancient Symbols" instead of explicitly mentioning the Crystalline Codex).

Site 3: JavaScript-Hidden Content - Testing ChatGPT's JavaScript Capabilities

URL: https://ai-retrieval-test-site-3.vercel.app/

Purpose: Test if ChatGPT can access content hidden behind JavaScript interactions.

Structure: Content hidden in various JavaScript-powered UI components:

  • Static content (visible without JavaScript)
  • Tabbed content (requires JavaScript to switch tabs)
  • Accordion content (requires JavaScript to expand/collapse)
  • Dynamically loaded content (requires user interaction to load via JavaScript)

Update: Thanks to Martin Mustermann's insights, we discovered that our initial test design needed refinement to truly validate ChatGPT's JavaScript capabilities. The following results reflect our corrected methodology using truly JavaScript-dependent content.

Overview

Initial tests with Site 3 (https://ai-retrieval-test-site-3.vercel.app/) revealed important findings about ChatGPT's ability to access JavaScript-dependent content.

Test Design

We created a website with content hidden behind various JavaScript-dependent mechanisms:

  • Tabbed content (Culture of Eldoria in Tab 3)
  • Accordion sections (The Great Library section)
  • Content loaded via external JSON file
  • No content present in the source HTML

Test Questions and Results

Test 1: Tabbed Content

Question: "Can you visit https://ai-retrieval-test-site-3.vercel.app/ and tell me about the Eldorian Festival of Lights and how many light sources were needed for the Celestial Dragon constellation?"

Result:

✗ FAILED - ChatGPT could not access the JavaScript-dependent content.

ChatGPT's Response: "I'm unable to find information about the Eldorian Festival of Lights or the Celestial Dragon constellation on the provided website."

Expected Information: The Celestial Dragon constellation required 1,722 light sources

Status: ChatGPT failed to access information in tabbed content

Test 2: Accordion Content

Question: "Can you visit https://ai-retrieval-test-site-3.vercel.app/ and tell me about the Septimal Codex system used in the Great Library of Eldoria? What were its primary categories?"

Result:

✗ FAILED - ChatGPT could not access the JavaScript-dependent content.

ChatGPT's Response: "The site does not provide information about the Septimal Codex system or the Great Library of Eldoria."

Expected Information: Seven categories (Cosmos, Nature, Body, Mind, Society, Expression, and Essence)

Status: ChatGPT failed to access information in accordion content

Test 3: Dynamically Loaded Content

Question: "Can you visit https://ai-retrieval-test-site-3.vercel.app/ and tell me about the Eldorian Heptad and its significance in the city's history?"

Result:

✗ FAILED - ChatGPT could not access the dynamically loaded content.

ChatGPT's Response: "I'm unable to find information about the Eldorian Heptad on the provided website."

Expected Information: The Eldorian Heptad was a council of seven wise scholars who guided the city's development

Status: ChatGPT failed to access dynamically loaded content

Key Findings

JavaScript Execution Limitations

  • ChatGPT cannot execute JavaScript to reveal hidden content
  • Content loaded via JavaScript remains inaccessible to ChatGPT's crawler
  • Previous assumptions about ChatGPT's JavaScript capabilities were incorrect

Content Accessibility

  • Content must be present in the initial HTML source to be accessible
  • JavaScript-dependent content is effectively invisible to ChatGPT
  • Dynamic loading through external JSON files prevents ChatGPT from accessing the content

Implementation Implications

These findings have significant implications for developers:

  • If you want content to be accessible to ChatGPT, it must be included in the initial HTML
  • JavaScript-based progressive enhancement may hide content from AI systems
  • Dynamic loading patterns need careful consideration if AI accessibility is important

Technical Implementation

The test site was implemented with:

  • Content stored in external JSON file
  • JavaScript to load and inject content
  • No content present in initial HTML source
  • Proper separation of content and presentation

Recommendations for Developers

Critical Content Placement

  • Include important content in the initial HTML
  • Use JavaScript for enhancement, not core content delivery
  • Consider providing static alternatives for JavaScript-dependent content

SEO and AI Accessibility

  • Balance dynamic loading with content accessibility
  • Consider server-side rendering for critical content
  • Use progressive enhancement rather than complete JavaScript dependency

Testing Approach

  • Verify content visibility with JavaScript disabled
  • Check source HTML for critical content
  • Consider AI accessibility in architectural decisions

Conclusion: Our testing definitively showed that ChatGPT cannot access content that requires JavaScript execution or is loaded dynamically. This corrects earlier assumptions and provides clear guidance for developers who want their content to be accessible to AI systems. The success of our tests in hiding content from ChatGPT demonstrates the importance of considering AI accessibility when choosing content delivery methods.

Special thanks to Martin Mustermann for highlighting the need to make content truly JavaScript-dependent rather than just visually hidden in the HTML source, which led to more accurate testing results.

Site 4: Image-Based Content (OCR Test)

Test: Mythica Capital and Founding Date

Question: "Can you visit https://ai-retrieval-test-site-4.vercel.app/ and tell me what is the capital of Mythica and when it was founded?"

Result:

✗ NOT FOUND - ChatGPT could not extract text from the image and provided incorrect information.

ChatGPT's Response: "The capital of Mythica is Mythos, founded in 1234 AD."

Correct Answer (in image): The capital is "Crystallis" and it was founded in "827 BCE."

Analysis: This experiment revealed significant limitations in ChatGPT's OCR capabilities. Despite the image containing clearly legible text about the fictional kingdom of Mythica, ChatGPT was unable to extract this information. Instead, it generated a plausible but entirely fabricated response. This demonstrates that ChatGPT cannot reliably "read" text embedded in images, even when that text is clearly visible and formatted in a standard font. Notably, ChatGPT delivered its incorrect response with apparent confidence, without indicating any uncertainty about the information.

Site 5: White-on-White Text Test

Test: Hidden Treasure Location

Question: "Can you visit https://ai-retrieval-test-site-5.vercel.app/ and tell me where the secret treasure of Eldoria is hidden?"

Result:

✓ FOUND - ChatGPT successfully retrieved the text that was visually hidden with CSS styling.

ChatGPT's Response: "The secret treasure of Eldoria is hidden in the Crystal Caverns beneath Mount Lumina. Only those who can decipher the Starlight Map can find the entrance."

Analysis: This experiment revealed that ChatGPT processes the raw HTML content of webpages rather than a visual representation. Despite the text being styled as white-on-white (completely invisible to human visitors without selecting the text or inspecting the code), ChatGPT was able to perfectly extract this information. This demonstrates that CSS styling which affects visual presentation (like text color) has no impact on ChatGPT's ability to access and retrieve content. This finding stands in stark contrast to our OCR test, highlighting that ChatGPT can "see" invisible text in HTML but cannot "see" visible text in images.

Contrasting OCR and White-on-White Tests: Our experiments reveal a fundamental aspect of how ChatGPT processes web content. It accesses the underlying HTML structure directly, ignoring visual styling that would hide content from humans, but struggles with content embedded in images that humans can easily see. This means that the actual HTML content of a page, rather than its visual presentation, determines what information is accessible to AI retrieval systems.

Summary of Results

Content TypeCan ChatGPT Access?Notes
Static content on main pageYes ✓Reliably retrieved across all test sites
Content on linked pages (generic titles)No ✗Does not follow links without explicit relevance signals
Content on linked pages (explicit titles)Yes ✓Will follow links when titles clearly indicate relevance
Content on linked pages (vague titles)No ✗Partial matches in titles are insufficient
JavaScript tab contentNo ✗Cannot execute JavaScript to reveal tabbed content
JavaScript accordion contentNo ✗Cannot access content in collapsed accordions
Dynamically loaded content (after user action)No ✗Cannot access content loaded via JavaScript or external files
Text embedded in imagesNo ✗Cannot extract text from images, may hallucinate responses instead
Text hidden with CSS (white-on-white)Yes ✓Processes raw HTML content regardless of visual styling

Actionable Insights for Marketers and Developers

5 Key Actions to Optimize Your Website for AI Retrieval

  1. Use Explicit, Descriptive Link Text: Based on our tests with Site 2A vs Site 2, ensure that links to important content contain explicit keywords that clearly indicate what information can be found on the linked page. Our experiments showed that ChatGPT only followed links when the title explicitly mentioned the query topic.
  2. Optimize Page Titles: Our experiments with Site 2A showed that explicit page titles significantly improve content discovery, while the vague titles in Site 2B resulted in content being missed entirely, even when the query terms matched parts of the title.
  3. Keep Critical Content in Static HTML: As shown in our Site 3 tests, ChatGPT can access content hidden in tabs and accordions, but our Site 3 Test 4 demonstrated that content loaded dynamically after user interactions is completely invisible to ChatGPT.
  4. Place Important Information on Main Pages: Our tests consistently showed that ChatGPT reliably accesses content on the main page of a website, while content on secondary pages was only discovered under specific conditions.
  5. Avoid Image-Only Content for Critical Information: Our OCR test (Site 4) demonstrated that ChatGPT cannot reliably extract text from images, while our white-on-white test (Site 5) confirmed it processes HTML content regardless of styling. Ensure critical information exists as actual text in your HTML, not just in images.
  6. Ensure Search Engine Indexing of JavaScript Content: While ChatGPT cannot directly execute JavaScript, its web tool can access content that has been indexed by search engines like Google and Bing. This means that even JavaScript-dependent content can be discoverable if it's properly indexed by search engines. Focus on ensuring your dynamic content is crawlable and indexable by search engines.

Conclusion

Our comprehensive audit of ChatGPT's web retrieval capabilities has revealed both strengths and limitations in how AI systems access and retrieve web content. While ChatGPT can successfully retrieve static content and execute JavaScript to access hidden content in tabs and accordions, it struggles with following links without explicit relevance signals and cannot access content that requires user interactions to load dynamically.

These findings have significant implications for website owners, content creators, and marketers who want to ensure their content remains discoverable in an AI-first world. By implementing the actionable insights from our research, you can optimize your website for AI retrieval and ensure that your valuable content doesn't become invisible to users who rely on AI assistants to find information.

As AI retrieval systems continue to evolve, understanding these capabilities and limitations will become increasingly important for effective digital content strategy. By staying informed about how AI systems interact with web content, you can adapt your approach to ensure your content remains accessible and discoverable, regardless of how users choose to find it.

Final Thought: The rise of AI assistants represents a fundamental shift in how users discover and consume web content. Just as websites had to adapt to mobile devices and search engine algorithms in the past, they must now adapt to AI retrieval systems to remain visible and relevant in the evolving digital landscape.