Technical Research

What ChatGPT Can (and Cannot) See on Your Website: A Comprehensive Audit

BTBen Tannenbaum
15 min read

Comprehensive technical audit revealing ChatGPT's web retrieval capabilities and limitations, with actionable insights for optimizing your website for AI discovery.

As AI tools like ChatGPT become increasingly integrated into how people find and consume information online, understanding what these systems can and cannot "see" on your website is crucial for content creators, marketers, and developers.

With the rise of AI-powered search and retrieval systems like ChatGPT's Search and Bing-powered search features, the way users discover and interact with web content is fundamentally changing. Instead of browsing through search results and visiting websites directly, users increasingly ask AI assistants to find and summarize information for them.

Why This Matters

If ChatGPT can't "see" certain content on your website, that content effectively becomes invisible to users who rely on AI assistants to find information. Understanding these limitations is crucial for ensuring your content remains discoverable in an AI-first world.

Critical Questions We Set Out to Answer

Can AI systems like ChatGPT effectively "see" and retrieve all the content on my website?

What types of content structures or technical implementations might prevent AI systems from accessing my content?

How can I optimize my website to ensure it's fully accessible to AI retrieval systems?

Research Methodology

We conducted a methodical audit of ChatGPT's web retrieval capabilities across a series of controlled test websites with varying levels of complexity:

1

Built five test websites with controlled content structures

2

Each site contained both general content and specific "marker content" with unique identifiers

3

Deployed all sites using Vercel connected to GitHub repositories

4

Tested ChatGPT's ability to retrieve specific information using "Browse with Bing" feature

5

Documented results to understand what ChatGPT could and couldn't access

🧪Test Sites Overview

FileTextIcon

Site 1: Text-Only Website

Success

Purpose: Baseline test for simple content retrieval

Description: Basic static HTML with plain text content to establish retrieval baseline

FileTextIcon

Site 2: Blog Article Website

Failed

Purpose: Test link following with generic titles

Description: Main page with links to blog articles using non-descriptive titles

Link2Icon

Site 3: Blog Title Hint Website

Success

Purpose: Test link following with descriptive titles

Description: Blog articles with explicit, keyword-rich titles in link text

QuestionMarkCircledIcon

Site 4: Blog Content Hint Website

Failed

Purpose: Test partial title matching

Description: Articles with titles containing partial query matches

GearIcon

Site 5: JavaScript-Hidden Content

Mixed

Purpose: Test JavaScript execution capabilities

Description: Content hidden in tabs, accordions, and dynamically loaded sections

ImageIcon

Site 6: Image-Based Content (OCR)

Failed

Purpose: Test text extraction from images

Description: Critical information embedded only in image files

GhostIcon

Site 7: White-on-White Text

Success

Purpose: Test HTML parsing vs visual rendering

Description: Text hidden with CSS styling but present in HTML

Key Findings

CheckIcon

Static Content Accessibility

High Impact

ChatGPT reliably retrieves static HTML content on main pages

Action Required: Ensure critical information is in static HTML on primary pages
Link2Icon

Link Following Behavior

Critical Impact

Only follows links with explicit, relevant titles - ignores generic titles

Action Required: Use descriptive, keyword-rich anchor text for internal links
ExclamationTriangleIcon

JavaScript Limitations

High Impact

Cannot execute JavaScript for dynamic content loading

Action Required: Make critical content available without user interactions
MagnifyingGlassIcon

Search Engine Workaround for JavaScript Content

Medium Impact

OpenAI uses Bing and Google searches as a workaround to access JavaScript-rendered pages that have been indexed by search engines

Action Required: Ensure JavaScript pages are indexed by Bing and Google; move critical content (especially FAQ answers) above the JavaScript fold
EyeOpenIcon

CSS vs HTML Processing

Medium Impact

Processes HTML content regardless of CSS styling/visibility

Action Required: Text in HTML is accessible even when visually hidden
Cross2Icon

Image Text Extraction

High Impact

Cannot extract text from images, may hallucinate responses

Action Required: Provide text alternatives for image-based content

Important Caveat: Search Engine Workarounds

Good news for JavaScript-heavy sites: While our direct testing shows ChatGPT cannot execute JavaScript, OpenAI uses Bing and Google searches as workarounds to access pages that are behind JavaScript but have been indexed by search engines.

What This Means for Your Site

1.

Don't panic if you have JavaScript content - first ensure these pages are properly indexed by Bing and Google

2.

Optimize further by moving critical content above the JavaScript fold - especially parts that answer questions users commonly ask ChatGPT (Aiso can help identify these)

3.

Monitor developments - we expect OpenAI to improve their crawlers over time

🔮Future Outlook

Experts like Elie Berreby expect AI crawlers to improve their JavaScript rendering capabilities over time. However, the timeline remains uncertain and this doesn't appear to be a top priority on OpenAI's current roadmap.

Complete Audit Results

Content TypeChatGPT Can AccessNotes
Static content on main pageYes ✓Reliably retrieved across all test sites
Content on linked pages (generic titles)No ✗Does not follow links without explicit relevance signals
Content on linked pages (explicit titles)Yes ✓Will follow links when titles clearly indicate relevance
Content on linked pages (vague titles)No ✗Partial matches in titles are insufficient
JavaScript tab contentNo ✗Cannot execute JavaScript to reveal tabbed content
JavaScript accordion contentNo ✗Cannot access content in collapsed accordions
Dynamically loaded content (after user action)No ✗Cannot access content loaded via JavaScript or external files
Text embedded in imagesNo ✗Cannot extract text from images, may hallucinate responses instead
Text hidden with CSS (white-on-white)Yes ✓Processes raw HTML content regardless of visual styling

6 Key Actions to Optimize Your Website for AI Retrieval

1. Use Explicit, Descriptive Link Text

Critical

Ensure links to important content contain explicit keywords that clearly indicate what information can be found on the linked page

Implementation:
Include target keywords in anchor text
Avoid generic 'Click here' or 'Read more'
Make link text descriptive and specific

2. Optimize Page Titles

High

Explicit page titles significantly improve content discovery, while vague titles result in content being missed entirely

Implementation:
Use specific, keyword-rich page titles
Include query-relevant terms in titles
Avoid vague or generic titles

3. Keep Critical Content in Static HTML

High

ChatGPT can access hidden content in tabs and accordions, but content loaded dynamically after user interactions is invisible

Implementation:
Place important info in static HTML
Avoid user-interaction dependencies
Ensure content is crawlable

4. Place Important Information on Main Pages

Medium

Content on main pages is reliably accessed, while secondary pages are only discovered under specific conditions

Implementation:
Feature key content prominently
Include summaries on main pages
Use clear navigation structure

5. Avoid Image-Only Content for Critical Information

High

ChatGPT cannot reliably extract text from images, while it processes HTML content regardless of styling

Implementation:
Provide text alternatives for images
Use alt text descriptively
Include text versions of image content

6. Ensure Search Engine Indexing

Medium

Even JavaScript-dependent content can be discoverable if it's properly indexed by search engines

Implementation:
Focus on search engine crawlability
Use server-side rendering when possible
Implement proper SEO practices

Conclusion: Preparing for an AI-First World

Our comprehensive audit of ChatGPT's web retrieval capabilities has revealed both strengths and limitations in how AI systems access and retrieve web content. While ChatGPT can successfully retrieve static content and execute JavaScript to access hidden content in tabs and accordions, it struggles with following links without explicit relevance signals and cannot access content that requires user interactions to load dynamically.

These findings have significant implications for website owners, content creators, and marketers who want to ensure their content remains discoverable in an AI-first world. By implementing the actionable insights from our research, you can optimize your website for AI retrieval and ensure that your valuable content doesn't become invisible to users who rely on AI assistants to find information.

As AI retrieval systems continue to evolve, understanding these capabilities and limitations will become increasingly important for effective digital content strategy. By staying informed about how AI systems interact with web content, you can adapt your approach to ensure your content remains accessible and discoverable, regardless of how users choose to find it.

Final Thought

The rise of AI assistants represents a fundamental shift in how users discover and consume web content. Just as websites had to adapt to mobile devices and search engine algorithms in the past, they must now adapt to AI retrieval systems to remain visible and relevant in the evolving digital landscape.

Optimize Your Website for AI Discovery

Ready to ensure your website is fully optimized for AI retrieval systems? Implement these findings to maintain visibility in an AI-powered world.

👨‍💻About the Author

BT

Ben Tannenbaum

Ben Tannenbaum is the founder of Aiso, a marketing tech company helping brands be visible in AI responses. With expertise in AI search optimization and content strategy, Ben helps businesses adapt to the evolving landscape of AI-powered search.