๐Ÿ”ฌTechnical Research

What ChatGPT Can (and Cannot) See on Your Website: A Comprehensive Audit

BTBen Tannenbaum
โ€ขโ€ข15 min read

Comprehensive technical audit revealing ChatGPT's web retrieval capabilities and limitations, with actionable insights for optimizing your website for AI discovery.

As AI tools like ChatGPT become increasingly integrated into how people find and consume information online, understanding what these systems can and cannot "see" on your website is crucial for content creators, marketers, and developers.

With the rise of AI-powered search and retrieval systems like ChatGPT's Search and Bing-powered search features, the way users discover and interact with web content is fundamentally changing. Instead of browsing through search results and visiting websites directly, users increasingly ask AI assistants to find and summarize information for them.

โš ๏ธ

Why This Matters

If ChatGPT can't "see" certain content on your website, that content effectively becomes invisible to users who rely on AI assistants to find information. Understanding these limitations is crucial for ensuring your content remains discoverable in an AI-first world.

โ“Critical Questions We Set Out to Answer

Can AI systems like ChatGPT effectively "see" and retrieve all the content on my website?

What types of content structures or technical implementations might prevent AI systems from accessing my content?

How can I optimize my website to ensure it's fully accessible to AI retrieval systems?

๐Ÿ”ฌResearch Methodology

We conducted a methodical audit of ChatGPT's web retrieval capabilities across a series of controlled test websites with varying levels of complexity:

1

Built five test websites with controlled content structures

2

Each site contained both general content and specific "marker content" with unique identifiers

3

Deployed all sites using Vercel connected to GitHub repositories

4

Tested ChatGPT's ability to retrieve specific information using "Browse with Bing" feature

5

Documented results to understand what ChatGPT could and couldn't access

๐ŸงชTest Sites Overview

๐Ÿ“„

Site 1: Text-Only Website

Success

Purpose: Baseline test for simple content retrieval

Description: Basic static HTML with plain text content to establish retrieval baseline

๐Ÿ“

Site 2: Blog Article Website

Failed

Purpose: Test link following with generic titles

Description: Main page with links to blog articles using non-descriptive titles

๐Ÿ”—

Site 3: Blog Title Hint Website

Success

Purpose: Test link following with descriptive titles

Description: Blog articles with explicit, keyword-rich titles in link text

โ“

Site 4: Blog Content Hint Website

Failed

Purpose: Test partial title matching

Description: Articles with titles containing partial query matches

โš™๏ธ

Site 5: JavaScript-Hidden Content

Mixed

Purpose: Test JavaScript execution capabilities

Description: Content hidden in tabs, accordions, and dynamically loaded sections

๐Ÿ–ผ๏ธ

Site 6: Image-Based Content (OCR)

Failed

Purpose: Test text extraction from images

Description: Critical information embedded only in image files

๐Ÿ‘ป

Site 7: White-on-White Text

Success

Purpose: Test HTML parsing vs visual rendering

Description: Text hidden with CSS styling but present in HTML

๐Ÿ”Key Findings

โœ…

Static Content Accessibility

High Impact

ChatGPT reliably retrieves static HTML content on main pages

Action Required: Ensure critical information is in static HTML on primary pages
๐Ÿ”—

Link Following Behavior

Critical Impact

Only follows links with explicit, relevant titles - ignores generic titles

Action Required: Use descriptive, keyword-rich anchor text for internal links
โš ๏ธ

JavaScript Limitations

High Impact

Cannot execute JavaScript for dynamic content loading

Action Required: Make critical content available without user interactions
๐Ÿ”

Search Engine Workaround for JavaScript Content

Medium Impact

OpenAI uses Bing and Google searches as a workaround to access JavaScript-rendered pages that have been indexed by search engines

Action Required: Ensure JavaScript pages are indexed by Bing and Google; move critical content (especially FAQ answers) above the JavaScript fold
๐Ÿ‘๏ธ

CSS vs HTML Processing

Medium Impact

Processes HTML content regardless of CSS styling/visibility

Action Required: Text in HTML is accessible even when visually hidden
๐Ÿšซ

Image Text Extraction

High Impact

Cannot extract text from images, may hallucinate responses

Action Required: Provide text alternatives for image-based content

โš ๏ธImportant Caveat: Search Engine Workarounds

Good news for JavaScript-heavy sites: While our direct testing shows ChatGPT cannot execute JavaScript, OpenAI uses Bing and Google searches as workarounds to access pages that are behind JavaScript but have been indexed by search engines.

What This Means for Your Site

1.

Don't panic if you have JavaScript content - first ensure these pages are properly indexed by Bing and Google

2.

Optimize further by moving critical content above the JavaScript fold - especially parts that answer questions users commonly ask ChatGPT (Aiso can help identify these)

3.

Monitor developments - we expect OpenAI to improve their crawlers over time

๐Ÿ”ฎFuture Outlook

Experts like Elie Berreby expect AI crawlers to improve their JavaScript rendering capabilities over time. However, the timeline remains uncertain and this doesn't appear to be a top priority on OpenAI's current roadmap.

๐Ÿ“ŠComplete Audit Results

Content TypeChatGPT Can AccessNotes
Static content on main pageYes โœ“Reliably retrieved across all test sites
Content on linked pages (generic titles)No โœ—Does not follow links without explicit relevance signals
Content on linked pages (explicit titles)Yes โœ“Will follow links when titles clearly indicate relevance
Content on linked pages (vague titles)No โœ—Partial matches in titles are insufficient
JavaScript tab contentNo โœ—Cannot execute JavaScript to reveal tabbed content
JavaScript accordion contentNo โœ—Cannot access content in collapsed accordions
Dynamically loaded content (after user action)No โœ—Cannot access content loaded via JavaScript or external files
Text embedded in imagesNo โœ—Cannot extract text from images, may hallucinate responses instead
Text hidden with CSS (white-on-white)Yes โœ“Processes raw HTML content regardless of visual styling

๐ŸŽฏ6 Key Actions to Optimize Your Website for AI Retrieval

1. Use Explicit, Descriptive Link Text

Critical

Ensure links to important content contain explicit keywords that clearly indicate what information can be found on the linked page

Implementation:
โ€ข Include target keywords in anchor text
โ€ข Avoid generic 'Click here' or 'Read more'
โ€ข Make link text descriptive and specific

2. Optimize Page Titles

High

Explicit page titles significantly improve content discovery, while vague titles result in content being missed entirely

Implementation:
โ€ข Use specific, keyword-rich page titles
โ€ข Include query-relevant terms in titles
โ€ข Avoid vague or generic titles

3. Keep Critical Content in Static HTML

High

ChatGPT can access hidden content in tabs and accordions, but content loaded dynamically after user interactions is invisible

Implementation:
โ€ข Place important info in static HTML
โ€ข Avoid user-interaction dependencies
โ€ข Ensure content is crawlable

4. Place Important Information on Main Pages

Medium

Content on main pages is reliably accessed, while secondary pages are only discovered under specific conditions

Implementation:
โ€ข Feature key content prominently
โ€ข Include summaries on main pages
โ€ข Use clear navigation structure

5. Avoid Image-Only Content for Critical Information

High

ChatGPT cannot reliably extract text from images, while it processes HTML content regardless of styling

Implementation:
โ€ข Provide text alternatives for images
โ€ข Use alt text descriptively
โ€ข Include text versions of image content

6. Ensure Search Engine Indexing

Medium

Even JavaScript-dependent content can be discoverable if it's properly indexed by search engines

Implementation:
โ€ข Focus on search engine crawlability
โ€ข Use server-side rendering when possible
โ€ข Implement proper SEO practices

๐Ÿ’กConclusion: Preparing for an AI-First World

Our comprehensive audit of ChatGPT's web retrieval capabilities has revealed both strengths and limitations in how AI systems access and retrieve web content. While ChatGPT can successfully retrieve static content and execute JavaScript to access hidden content in tabs and accordions, it struggles with following links without explicit relevance signals and cannot access content that requires user interactions to load dynamically.

These findings have significant implications for website owners, content creators, and marketers who want to ensure their content remains discoverable in an AI-first world. By implementing the actionable insights from our research, you can optimize your website for AI retrieval and ensure that your valuable content doesn't become invisible to users who rely on AI assistants to find information.

As AI retrieval systems continue to evolve, understanding these capabilities and limitations will become increasingly important for effective digital content strategy. By staying informed about how AI systems interact with web content, you can adapt your approach to ensure your content remains accessible and discoverable, regardless of how users choose to find it.

๐Ÿš€Final Thought

The rise of AI assistants represents a fundamental shift in how users discover and consume web content. Just as websites had to adapt to mobile devices and search engine algorithms in the past, they must now adapt to AI retrieval systems to remain visible and relevant in the evolving digital landscape.

๐Ÿ”งOptimize Your Website for AI Discovery

Ready to ensure your website is fully optimized for AI retrieval systems? Implement these findings to maintain visibility in an AI-powered world.

๐Ÿ‘จโ€๐Ÿ’ปAbout the Author

BT

Ben Tannenbaum

Ben Tannenbaum is the founder of Aiso, a marketing tech company helping brands be visible in AI responses. With expertise in AI search optimization and content strategy, Ben helps businesses adapt to the evolving landscape of AI-powered search.