🔬Research Study

Testing ChatGPT's Non-Ranking Nature: Espresso Machine Brand Visibility Analysis

BTBen Tannenbaum
12 min read

Comprehensive experiment testing whether ChatGPT maintains consistent rankings across 400 queries, challenging traditional notions of AI search rankings.

💡Our Hypothesis

Unlike Google's search results which provide clear, consistent rankings, we hypothesized that ChatGPT doesn't actually "rank" brands in the traditional sense. A brand might be mentioned first in one answer but appear later (or not at all) in others, suggesting that ChatGPT's responses are more about contextual relevance than strict ranking.

📚Inspiration: Elie Berreby's Analysis

This experiment was inspired by Elie Berreby's article "Rankings Don't Apply to AI Search", which argues that traditional SEO ranking concepts don't apply to AI-powered search. Berreby suggests that AI responses are inherently variable and context-dependent, challenging the notion of consistent rankings in AI-generated content.

Methodology

To test this hypothesis, we conducted a comprehensive experiment with 100 runs across four distinct rounds:

1Round 1 (Baseline)

100 runs with search mode disabled, using the exact same question to establish a baseline for ChatGPT's response patterns

2Round 2: Question Variations

100 runs with slight variations in question phrasing to test Elie Berreby's point about response consistency

2.2Round 2.2: Contextual Variations

100 runs with more significant question variations, focusing on different aspects like reliability, best taste, and quality to understand how context affects brand visibility

3Round 3: Search Mode Enabled

100 runs with search mode enabled to analyze how web search capabilities impact response variability

Key Findings

Round 1: Baseline Analysis

Round 1 Rankings Comparison showing consistent ChatGPT brand ordering
Round 1 analysis revealing surprising consistency in ChatGPT's espresso machine brand rankings

🎯Key Finding: Unexpected Consistency

Our baseline analysis of 100 identical queries asking "what's the best espresso machines under $250" revealed a surprising level of consistency in ChatGPT's responses. De'Longhi maintained the #1 position with almost no exceptions, while positions 2–5 showed minor variations in their relative ordering.

This pattern of consistency in top positions, with some flexibility in lower rankings, is actually quite similar to what we observe in Google search results. There were only a few instances (3–4 times) where ChatGPT provided only the top 3 instead of the top 5, but even then, the top 3 remained consistent.

📊Analysis Implications

This finding suggests that ChatGPT does maintain a form of ranking consistency for certain queries, particularly when the question is specific and the product category has clear market leaders. The consistency in De'Longhi's #1 position across 100 runs indicates that ChatGPT's responses aren't entirely random or context-dependent for well-established product categories.

Round 2: Question Variation Analysis

Round 2 Brand Visibility showing response patterns with question variations
Round 2 results demonstrating stability despite question phrasing variations

🔄Phrasing Variation Results

In Round 2, we tested Elie Berreby's challenge about question phrasing by introducing subtle variations to the original question. Despite these variations, the results remained remarkably consistent, suggesting that with minor phrasing changes, it's still reasonable to discuss rankings in ChatGPT's responses. The core brands maintained their relative positions, indicating a level of stability in ChatGPT's recommendations for this product category.

Round 2.2: Contextual Variation Analysis

Brand Visibility Heatmap showing contextual impact on brand mentions
Brand visibility heatmap revealing how context affects brand prominence
Brand Visibility Comparison across different query contexts
Comparative analysis of brand visibility across different contextual queries

🎨Contextual Impact on Rankings

Round 2.2 introduced stronger contextual variations by incorporating terms like "most reliable," "best taste," and "most professional." This round better aligns with Elie's fundamental challenge about question phrasing, as it demonstrates how different aspects of the same product category can lead to significantly different brand recommendations. While the original brands maintained roughly similar positions, we observed a substantial increase in the total number of distinct brands mentioned.

Round 3: Search Mode Impact

🌐Web Search Integration Effects

Round 3 with search mode enabled revealed how real-time web data affects ChatGPT's brand recommendations. The integration of current web information introduced more variability in responses, suggesting that when ChatGPT has access to recent data, its recommendations become more dynamic and less predictable than the baseline patterns observed in earlier rounds.

Conclusions and Implications

Ranking Consistency

ChatGPT does maintain ranking-like consistency for specific, well-defined product categories with clear market leaders.

🔄

Context Sensitivity

Significant contextual changes in queries can dramatically alter brand visibility and recommendation patterns.

🌐

Search Mode Impact

Real-time web access increases response variability, making rankings less predictable than baseline models.

📊

Measurement Importance

Understanding AI search patterns requires systematic measurement and analysis rather than assumptions.

🔬Research Takeaways

This experiment challenges both the assumption that AI search lacks rankings and the assumption that it maintains traditional ranking structures. The reality is more nuanced: ChatGPT exhibits ranking-like behavior under certain conditions while showing significant variability in others.