Is your portfolio company winning AI search? 4 metrics that matter (and 1 that lies)

Why investors are asking before marketing teams are

ChatGPT crossed 800 million weekly active users in October 2025 and was at roughly 900M by February 2026. Ahrefs estimates it now handles about 12% of Google's search-style query volume. Similarweb counted 1.13 billion AI referral visits to the top 1,000 sites in June 2025 alone - up 357% year over year.

Those numbers are now showing up inside investment memos. Across the portfolio reviews we've been asked to help with at Aiso, the pattern is consistent: a partner reads an analyst report, asks a portfolio CEO “what's our ChatGPT story?” - and the marketing team is suddenly building a measurement framework on a two-week deadline.

The problem is that most of the metrics being reported back are either the wrong unit, the wrong denominator, or systematically wrong in the same direction. This is what to look at instead.

The 4 metrics that matter (and 1 that lies)

Mentions on commercial-intent prompts that match the company's positioning

Not share-of-voice across every prompt - share on the small set of prompts a buyer would actually run before a purchase decision.

Declared AI referrals (treat as a floor, not the truth)

Useful directionally, but systematically understated: roughly 1 in 5 ChatGPT outbound clicks goes to Google first, and 70%+ of AI sessions strip the referrer entirely.

Visibility per dollar spent

Mentions on the right prompts, normalized by what was spent to earn them. The portfolio-level comparable for AI search investment.

Revenue per dollar spent

The only metric that ultimately matters. AI referrals convert at a premium on most benchmarks - but at least one peer-reviewed study disagrees, so insist on the company's own numbers, not vendor averages.

Metric 1 - Mentions on commercial-intent prompts that match the company's positioning

The most common AI visibility report we see at the investor level is some flavor of “share of voice across 200 prompts.” This is the wrong unit. Two prompt lists with identical share-of-voice numbers can imply opposite business outcomes.

The right prompt list passes two filters:

Commercial intent. The prompt is something a buyer would plausibly ask before a purchase, renewal, or vendor-shortlist decision - not an informational query that feeds the top of a funnel that doesn't exist.
Positioning alignment. The prompt sits inside the category the company is actually trying to own. A fintech focused on mid-market AP automation shouldn't be measuring share-of-voice on “best accounting software” - that category is owned by incumbents and a win there doesn't change the business.

Here's why the first filter matters more than people think. We pulled the intent breakdown from Aiso's consent-based panel of millions of real, anonymized AI assistant prompts - across dozens of models, hundreds of countries, and every major AI assistant - and the shape of the distribution is the punchline:

Brand-mention rate by intent

Share of AI assistant prompts that name at least one brand, by user intent.

Source: Aiso consent-based panel of millions of real, anonymized AI assistant prompts. Commercial intent is just 5.8% of total volume but carries the highest brand-mention pressure of the major intents.

Three things to read off that chart. (1) Commercial-intent prompts are only 5.8% of total volume - a small slice that determines an outsized share of revenue. (2) When an AI assistant fields a commercial-intent prompt, it names a brand ~1.85x more often than on informational prompts (32.4% vs. 17.5%). (3) Inside the panel, when a brand is named on a commercial-intent prompt the answer averages 2.47 distinct brands - a shortlist, not a ranking. That's the most investor-legible framing: AI commercial answers historically read as shortlists. How that shortlist gets compiled is the harder question - and the replay below shows it depends on whether retrieval fires.

What happens when we replay these prompts on the live model

To pressure-test the panel observation against today's behavior, we replayed 20 commercial-intent prompts from the panel through OpenAI's gpt-5.3-chat-latest with web search enabled, 5 runs each (100 successful calls, zero failures). The distribution diverged sharply from the panel:

4.90

Mean distinct brands per answer (all 100 calls)

27%

Answers in the 2–3 brand band the panel implies

39%

Answers that named 5+ brands (long lists)

18%

Answers that named zero brands

Top brands across the 100-call run: Nvidia (10/100), Samsung (9), Meta (8), YouTube (7), then OpenAI, AMD, Sony, Kdenlive, Shotcut, and Mikuni at 5 each.

Distinct brands named per answer - live replay

Count of distinct brands named per answer across 100 live calls (20 commercial-intent prompts × 5 runs).

Source: Aiso replay, May 2026. gpt-5.3-chat-latest via OpenAI Responses API with web_search enabled, default temperature. Parent-brand-level extraction; product SKUs collapsed. The 2–3 brand band (shaded) - the panel-implied shortlist - accounted for only 27% of answers.

Two things explain the gap between 2.47 and 4.90. First, the web_search tool actually fired on only 10% of calls - most answers came from the model's priors, not live retrieval. When a model answers from priors it tends to enumerate broadly (every brand it remembers in the category); when it retrieves, it tends to cite a smaller, fresher set. Second, our extraction pipeline here counts parent brands (Nvidia, Samsung, Meta) rather than the product-level mentions the panel field captures (RTX 4090, Galaxy S24), so part of the divergence is measurement, not behavior.

The honest investor takeaway is that the shape of the AI answer depends on whether retrieval fires. When it doesn't, the brands that get enumerated are the ones with the heaviest training footprint - Wikipedia, large review sites, dominant press coverage. When it does, fresh and well-grounded sources get pulled in. A portfolio company is winning AI search only when it's present on both surfaces. Reporting share-of-voice on one of them looks like progress and isn't.

Synthesized prompt lists - the kind most agencies generate from keyword tools - drift toward what marketers think buyers ask. Panel data shows what they actually do. The question to ask a portfolio company is not “what's your AI share of voice?” - it's “show me your prompt list, tell me why each one is on it, and show me both your training-data footprint and your retrieval-time visibility separately.” If those answers don't come back cleanly, the headline number doesn't matter.

Metric 2 - Declared AI referrals (treat as a floor, not the truth)

This is the metric that lies - not because it's wrong, but because everyone reads it as a ceiling when it's a floor. Three findings together explain why every published AI referral number understates the real influence:

21.6% of ChatGPT outbound clicks go to Google itself. Semrush analyzed over a billion lines of US clickstream data from a 200-million-user panel between October 2024 and February 2026 (Search Engine Land, April 2026). Roughly one in five users who clicks out of ChatGPT routes through Google first - typically to verify a brand name they just saw. Every one of those visits gets attributed to Google organic in the destination's analytics.
70.6% of AI visits arrive without referrer data. Loamly's February 2026 analysis of 20,428 AI visits found seven in ten had no usable referrer because of mobile WKWebView, Chrome Custom Tabs, and privacy-stripping defaults. ChatGPT only started attaching utm_source=chatgpt.com to citation links in June 2025; Google AI Overviews and AI Mode still pass nothing.
93% of AI search sessions never click out at all. Conductor's 2026 benchmark dubs this the “AI dark funnel” - the AI answers the question, the user gets what they need, and no website ever sees the visit. The influence is real; the analytics signal isn't.

The workaround the data actually supports is to read AI influence through a shadow signal: branded search volume. Ahrefs studied 75,000 brands in August 2025 and found the correlation between brand mentions and AI visibility was 0.664 - versus 0.218 for backlinks. In plain English: when an AI assistant recommends a brand, the most reliable downstream observable isn't a referral hit, it's a spike in people Googling the brand name.

For investor reporting, this means three things. (1) Take declared AI referrals as a directional floor, not a ceiling. (2) Pair them with branded-search-volume trend lines from Google Search Console - those move in lockstep with AI visibility but show up sooner and at higher magnitude. (3) Be skeptical of any board deck that reports flat AI referrals as proof that AI search isn't working; the leak through Google is structural and well documented.

Metric 3 - Visibility per dollar spent

Visibility on its own is vanity. The metric that lets you compare across portfolio companies - and across content, technical, and PR investments inside one company - is qualified mentions per dollar spent: how many mentions on the commercial-intent prompt list did the company earn for each dollar of AI-search-attributable spend?

Public benchmarks are starting to land. First Page Sage's 2026 dataset put the average generative engine optimization CAC at $559 across industries - a 14.4% premium over traditional SEO CAC, offset by 27% higher conversion rates and 9.2% higher reported lead quality. B2B SaaS came in lowest at $249; Higher Ed highest at $1,014. The same dataset notes GEO CAC has fallen roughly 37.5% from initial 2024 levels as practitioner skill has matured.

Two things to flag when this number gets quoted at you:

The 14.4% premium over SEO is the right initial expectation; AI search optimization carries an early-mover tax in most categories, paid back through the conversion lift in metric 4.
Schema and grounding work has an outsized payback. Aiso's controlled experiments measured roughly a 30% improvement in AI retrieval accuracy when identical content was served with versus without JSON-LD, and websites ranking on Bing page one are roughly 3x more likely to be cited by ChatGPT. Both are essentially free compared to producing more content, and both compound. If a portfolio company's AI search budget is going entirely to net-new writing before either lever has been pulled, that's a flag.

The portfolio question is the same as for any other channel: what did you spend, what did you get, and how does it compare quarter on quarter? If a company can't name the denominator, the numerator is theater.

Metric 4 - Revenue per dollar spent

Ultimately, referrals don't matter - conversion does. Three independent 2025–2026 datasets agree directionally that AI search traffic converts at a premium:

Visibility Labs analyzed 9.46M non-branded organic sessions and 135K ChatGPT sessions across 94 seven-to-eight-figure ecommerce brands and found ChatGPT converted 31% higher than non-branded organic (1.81% vs. 1.39%, full-year 2025, GA4-measured, homepage and blog excluded).
Shopify's Q1 2026 enterprise data showed AI-referred shoppers converted ~50% higher than organic search visitors with 14% higher average order value on PDP-start sessions.
Similarweb's cross-site 2026 analysis pegged the gap more modestly: 7% AI-referral conversion vs. 5% organic - a 40% relative premium.

The contrarian datapoint worth knowing - and worth raising in a portfolio review if the company hasn't - is a 2025 peer-reviewed study in Marketing Science covering 973 ecommerce sites that found ChatGPT-referred traffic underperformed organic by roughly 13% on conversion. The study's explanation is that ChatGPT users tend to arrive earlier in the consideration journey than organic searchers, so the same session converts less but seeds more downstream branded traffic.

Both readings can be true at once: AI traffic converts at a premium on commercial-intent prompts and at a discount on informational ones. That's exactly why metric 1's prompt-list quality matters so much. If a portfolio company reports flat conversion from AI referrals, the first diagnostic is to look at the share of their visibility that sits on actual commercial-intent prompts vs. top-of-funnel ones.

The right ask from the investor seat is for the company's own conversion numbers, segmented by AI referral source and prompt intent, alongside the public benchmarks above. Vendor averages are useful as sanity checks; they aren't a substitute for the company's own data.

5 questions for your next portfolio review

Show me your prompt list. Which prompts are you measuring share-of-voice on, and what makes each one a commercial-intent prompt that matches your positioning? How was the list assembled - synthesized, or from real-user panel data?
What's your branded-search trend? Pull Google Search Console's branded-query volume for the last 12 months alongside declared AI referrals. The two should move together; if branded search is rising faster than AI referrals, the AI-to-Google leak is doing its job.
What did you spend, and on what? Break out AI-search-attributable spend into schema and grounding work, net-new content, third-party citation work (G2, Reddit, YouTube), and PR. Ask why the mix is what it is. If schema and grounding haven't been pulled before content, that's the cheaper lever skipped.
What's your conversion rate on AI referrals? Segmented by prompt intent if possible. Benchmark against the Visibility Labs / Shopify / Similarweb numbers above, but treat the company's own data as the truth.
What are you not measuring yet, and why? Honest answer is more informative than the dashboard. The AI dark funnel is real; pretending otherwise is the warning sign.

The bigger picture

AI search is at the stage paid social was in 2010 - the channel exists, the attribution is half-built, and the companies that learn to measure it correctly compound for years before the late majority catches up. The four metrics above aren't the final word; they're the floor for a portfolio conversation that doesn't embarrass either side. The first portfolio that systematically gets this right will run roughly two years ahead of the ones reporting share-of-voice on the wrong prompt list.

If you want to pressure-test a portfolio company's AI search readiness on the technical side before the metrics conversation, our Grounding Readiness Checker grades a URL across the five dimensions Microsoft Bing now publishes for AI-era indexing - it takes about ten seconds and surfaces the cheap fixes first.