Methodology

How we test.
And how we try not to fool ourselves.

Aiso's research is grounded in falsificationism and hypothesis-driven experimental design. We pre-register hypotheses, run paired tests, and publish complete results — including the ones that don't support our priors.

“The first principle is that you must not fool yourself, and you are the easiest person to fool.”— Richard Feynman, Caltech Commencement Address (1974)

In a fast-moving field like AI search, it's tempting to cherry-pick data, ignore inconvenient results, or frame findings to support a business goal. We made a different choice. The ten commitments below shape every experiment we publish.

Our ten commitments

  1. 01

    Falsifiability over confirmation

    Karl Popper's falsificationism · The Logic of Scientific Discovery (1959)

    What this means: Design experiments that can prove us wrong, not just confirm our beliefs.

    How we practice it: We create control groups and test sites that could disprove our hypotheses. Every experiment includes scenarios where we might be wrong.

    Example: In our schema markup experiment, we created identical test sites — one with schema, one without — specifically to test whether schema actually matters for AI extraction.

  2. 02

    Hypothesis first, data second

    Henri Poincaré's scientific method · Science and Hypothesis (1902)

    What this means: State clear hypotheses before testing, not after seeing the data.

    How we practice it: We document hypotheses, predicted outcomes, and success criteria before running experiments. No post-hoc rationalization.

    Example: Before testing ChatGPT's response patterns, we predicted specific outcomes for 400 espresso machine queries. We then compared actual results against those predictions.

  3. 03

    Reproducible methods

    Scientific method · Standard scientific practice

    What this means: Document everything so others can replicate our work.

    How we practice it: We publish detailed methodologies, share test prompts, and provide step-by-step instructions for replicating our experiments.

    Example: Our schema markup study includes the exact ChatGPT queries used, the test site URLs, and the methodology so anyone can replicate the experiment.

  4. 04

    Controlled variables

    Experimental design · Statistical experimental design

    What this means: Isolate what we're testing — change one variable at a time.

    How we practice it: We control for confounding variables and test single hypotheses with proper controls. We don't change multiple things at once.

    Example: When testing schema markup impact, we kept content, design, and structure identical — only the schema markup differed between test sites.

  5. 05

    Statistical significance

    Statistical inference · Statistical hypothesis testing

    What this means: Use sample sizes large enough to draw meaningful conclusions.

    How we practice it: We test with appropriate sample sizes and avoid drawing conclusions from a handful of cases. We measure consistency, not just averages.

    Example: Our espresso machine ranking study tested 400 identical queries to measure ChatGPT's response variability — a sample size large enough to detect patterns reliably.

  6. 06

    Transparent limitations

    Scientific honesty · Cargo Cult Science (1974)

    What this means: Acknowledge what we don't know and what our methods can't measure.

    How we practice it: We list every limitation, caveat, and uncertainty in our findings. We don't oversell results or hide weaknesses.

    Example: When publishing on schema markup, we explicitly noted the AI Overviews period and the limited timeframe of our test, even though it weakened the headline finding.

  7. 07

    Quantifiable results

    Measurement principle · Scientific measurement standards

    What this means: Define success metrics upfront and measure them objectively.

    How we practice it: We establish clear, measurable criteria before running experiments. No subjective assessments — only quantifiable outcomes.

    Example: Our schema experiment measured specific improvements in information extraction, additional structured data retrieval, and source attribution.

  8. 08

    No cherry-picking data

    Statistical integrity · Nature: How to fight cherry-picking

    What this means: Report all results, not just the ones that support our hypotheses.

    How we practice it: We show complete responses, including ones that don't support our hypotheses. We report null results and unexpected findings.

    Example: In our ranking experiment, we published all 400 query results — including cases where ChatGPT was inconsistent, not just the ones that supported our hypothesis.

  9. 09

    Iterate based on evidence

    Poincaré's convention · The Value of Science (1905)

    What this means: Update our beliefs when data contradicts them, not the other way around.

    How we practice it: When experiments contradict our assumptions, we change our recommendations and methodology — not our interpretation of the data.

    Example: Findings about ChatGPT's non-ranking nature led us to revise our approach, moving away from traditional ranking metrics.

  10. 10

    Subject to peer review

    Community validation · Nature on the peer review process

    What this means: Welcome scrutiny, feedback, and challenges from the community.

    How we practice it: We publish methodologies publicly and invite others to test, challenge, and improve our work. We respond to criticism constructively.

    Example: This page itself is an invitation for peer review — we share our methodology openly so it can be examined and pushed back on.

Why this matters for our customers

Trustworthy insights

Scientific rigor produces insights you can act on. When we publish a finding, it comes from controlled experiments, not cherry-picked anecdotes.

The cost of bad data

In AI optimization, bad data leads to wasted resources and missed opportunities. Our approach is designed to keep you on the right side of that.

Experiments in practice

Three experiments that show the methodology applied end-to-end.

  • Schema markup vs no schema: a real ChatGPT experiment

    Controlled A/B test with identical sites — one with comprehensive schema markup, one without.

    Key findings

    • Schema markup provides additional structured data (ratings, certifications)
    • Better professional boundaries in AI responses
    • Enhanced source attribution and verification
    • Improved information categorization

    Methodology

    Created identical test sites with and without schema, asked ChatGPT identical questions, and analyzed response quality objectively.

    Read the full study
  • Testing ChatGPT's non-ranking nature: 400 espresso machine queries

    Comprehensive experiment testing whether ChatGPT maintains consistent rankings across 400 identical queries.

    Key findings

    • ChatGPT does not maintain consistent rankings like Google
    • Response order varies significantly between identical queries
    • Traditional ranking metrics don't apply to AI search
    • Brand visibility requires different measurement approaches

    Methodology

    Tested 400 identical queries about espresso machine brands, analyzed response consistency and ranking patterns.

    Read the full study
  • The ChatGPT funnel: impressions, clicks, and conversions

    Methodology for tracking real ChatGPT performance using server logs and OAI-SearchBot detection.

    Key findings

    • Server log analysis provides accurate ChatGPT traffic data
    • OAI-SearchBot user agent enables precise detection
    • Traditional analytics miss AI-driven traffic
    • Conversion tracking requires specialized methodology

    Methodology

    Analyzed server logs for the OAI-SearchBot user agent, correlated with ChatGPT responses, and validated through controlled testing.

    Read the full study

Recent research using this methodology

The methodology above is the engine. These are the published pieces it has produced.

The full list, including external coverage, is on our press and research page.

Academic references

The methodology draws on these foundational thinkers in the philosophy of science.

  • The Logic of Scientific Discovery

    Karl Popper · 1959

    Foundational work on falsificationism and the demarcation between science and pseudoscience.

  • Conjectures and Refutations

    Karl Popper · 1963

    Further development of Popper's philosophy of science and critical rationalism.

  • Science and Hypothesis

    Henri Poincaré · 1902

    Classic work on the nature of scientific hypothesis and the role of convention in science.

  • The Value of Science

    Henri Poincaré · 1905

    Exploration of the philosophical foundations of science and mathematics.

  • Cargo Cult Science

    Richard Feynman · 1974

    Caltech commencement address on scientific integrity and avoiding self-deception.

About the author

Ben Tannenbaum

Founder & CEO, Aiso

Ben founded Aiso to help brands optimize their visibility in AI assistants. Background in B2B SaaS and data platform architecture, with a previous exit in commercial real estate analytics.

Run the same methodology on your brand

Free trial of the platform, or book a call to discuss a custom research engagement.