Benchmark · AI Search · Sampling

How often should you
run your prompts?

Ask ChatGPT the same question twice and you can get two different answers. That makes every AI-visibility measurement a sample, not a fact - and how many samples you need depends on how noisy your brand’s mention rate is. Here is the benchmark, and the statistics behind it.

Ben Tannenbaum · June 19, 2026 · 8 min read

Once you have settled how many prompts to track, the next question is cadence: how many times a day should you actually run each one? The instinct is “as often as I can afford.” The better answer is “exactly enough to make the number stop wobbling,” and that amount is different for a market leader than for a challenger brand.

Every answer is a coin flip, not a fact

Large language models sample their output. Run “what are the best running shoes for marathon training?” a hundred times and you will get a hundred slightly different answers, each naming a slightly different set of brands. So when you measure “Brand X appears in 40% of answers,” that 40% is an estimate from a handful of draws - with its own margin of error, exactly like a poll.

That reframes the question. You are not deciding how often to refresh a dashboard. You are deciding how big a sample you need each day to trust the percentage you report.

The definitive answer

  • Category leader - ~1 run/day. A brand like Nike shows up in almost every commercial answer about running shoes. Its mention rate is near 100%, so the result barely varies. A single daily run per prompt is representative.
  • Established mid-size brand - ~10 runs/day. A recognized but non-dominant brand sits in the middle, where answers genuinely vary. Ten runs a day pins the number down well enough to act on.
  • Long-tail / challenger brand - ~40 runs/day. A smaller brand that appears occasionally has the noisiest signal of all. Around 40 runs a day is what it takes to measure a low, volatile mention rate with confidence.

Recommended daily runs per prompt, by brand authority

The noisier your mention rate, the more samples you need per day

Source: Aiso benchmark across client AI-visibility projects, 2025–2026.

The counter-intuitive part: the biggest brands need the fewest runs. Not because they are big, but because their outcome is nearly certain. Uncertainty is highest for brands whose true mention rate sits in the messy middle.

The statistics, briefly

Whether a brand is named in a given answer is a yes/no outcome, so a mention rate behaves like a proportion. The uncertainty on a measured proportion p from n runs is its standard error, √(p(1−p)/n). Two things fall out of that formula, and they are the whole argument:

  • Variance peaks at 50% and collapses at the edges. The p(1−p) term is largest when p = 0.5 and approaches zero as p nears 0 or 1. A brand mentioned ~95% of the time (the leader) is almost deterministic; a brand mentioned ~40% of the time (the challenger) is maximally noisy. That is why leaders need one run and challengers need many.
  • Precision improves with √n, not n. Error shrinks with the square root of the number of runs, so each extra run helps less than the last. Going from 1 to 10 runs roughly halves your margin of error; going from 40 to 80 barely registers.

Why more runs sharpen the number, with diminishing returns

Margin of error on a measured 40% mention rate, by daily runs per prompt (95% confidence)

Source: standard error of a proportion, 1.96 × √(p(1−p)/n), p = 0.4.

That curve is why 40 is our default ceiling, not 400. By 40 runs a day, the margin of error on a noisy 40% mention rate is tight enough to spot a real change week over week, and spending more buys almost nothing. For brands whose true rate is further from 50%, you reach “good enough” far sooner - hence 10 for mid-size and 1 for the leader.

Run frequency is a precision dial, not a freshness dial. You are not running 40 times because the answer changes 40 times a day. You are running 40 times because that is how many samples it takes to measure a volatile percentage you can stand behind.

How to set your own cadence

  1. Estimate your rough mention rate first. Run a prompt 10–20 times once. If you appear nearly always or almost never, you are near an edge and need few runs. If you land somewhere in the 30–70% band, you are in the noisy middle and need more.
  2. Match cadence to that band. Near-certain outcome: ~1/day. Recognized but contested: ~10/day. Sparse and volatile: up to ~40/day.
  3. Tune per prompt, not per brand. The same brand can be near-certain on its branded comparisons and noisy on unbranded category queries. Spend your runs where the variance actually is.
  4. Remember the multiplier. Daily runs multiply by your tracked prompt count and by the number of assistants you cover. A lean, well-targeted prompt set is what lets you afford enough runs on the prompts that matter.

Get cadence right and your share-of-answer trend lines reflect real movement instead of sampling jitter. Get it wrong and you will either chase ghosts in the noise or, with too few runs on a volatile brand, miss the shifts entirely.