How We Test and Run Experiments: Our Commitment to Try Not to Fool Ourselves

January 25, 2025•15 min read•Methodology

Our Scientific Commitment

At Aiso, we believe that rigorous scientific methodology is essential for trustworthy insights in AI visibility tracking. This post outlines our ten core commitments to scientific rigor, inspired by Karl Popper's falsificationism and Henri Poincaré's principles of hypothesis-driven research. We're committed to trying not to fool ourselves.

The First Principle: Don't Fool Yourself

"The first principle is that you must not fool yourself, and you are the easiest person to fool."

— Richard Feynman, Caltech Commencement Address (1974)

In the rapidly evolving field of AI visibility tracking, it's tempting to cherry-pick data that confirms our assumptions, ignore inconvenient results, or present findings in ways that support our business goals. But at Aiso, we've made a different choice.

We've committed to rigorous scientific methodology because we believe that trustworthy insights require trustworthy methods. When companies make decisions about AI optimization based on our data, those decisions affect real businesses, real customers, and real outcomes.

This isn't just about academic rigor for its own sake. It's about ensuring that our customers can trust our insights enough to act on them. Bad data leads to bad decisions, and in the competitive world of AI search optimization, bad decisions can be costly.

Our methodology is inspired by two giants of scientific philosophy: Karl Popper, who taught us that science advances by falsification, not confirmation, and Henri Poincaré, who emphasized the importance of clear hypotheses and reproducible methods.

Our Ten Commitments to Scientific Rigor

These commitments guide every experiment we run and every insight we publish. They're not just principles—they're practical guidelines that ensure our work meets the highest standards of scientific integrity.

Falsifiability Over Confirmation

Karl Popper's Falsificationism•The Logic of Scientific Discovery (1959)

What this means: Design experiments that can prove us wrong, not just confirm our beliefs.

How we practice this:

We create control groups and test sites that could disprove our hypotheses. Every experiment includes scenarios where we might be wrong.

Real example:

In our schema markup experiment, we created identical sites - one with schema, one without - specifically to test if schema actually matters for AI retrieval.

Hypothesis First, Data Second

Henri Poincaré's Scientific Method•Science and Hypothesis (1902)

What this means: State clear hypotheses before testing, not after seeing the data.

How we practice this:

We pre-register our expectations and hypotheses before running experiments. This prevents us from retrofitting explanations to match results.

Real example:

Before testing ChatGPT's ranking consistency, we explicitly stated our hypothesis: 'ChatGPT does not maintain consistent rankings like traditional search engines.'

Control for Confounding Variables

Scientific Method Principle•General scientific methodology

What this means: Isolate the variable you're testing by controlling everything else.

How we practice this:

We ensure our test sites are identical except for the single variable we're testing. Same content, same design, same hosting - only the test variable differs.

Real example:

Our schema experiment used identical HTML content, CSS, and hosting - only the schema markup differed between test and control sites.

Transparent Methodology

Open Science Movement•Reproducible research principles

What this means: Share exactly how we test so others can verify and build upon our work.

How we practice this:

We publish live test sites, exact prompts used, and detailed methodologies. Nothing is hidden or proprietary about our testing approach.

Real example:

Our schema experiment includes live URLs that anyone can test: schema-markup-experiment.vercel.app/with-schema and /without-schema

Reproducible Experiments

Scientific Reproducibility•Scientific method standards

What this means: Others should be able to replicate our experiments and get similar results.

How we practice this:

We provide exact prompts, URLs, and step-by-step instructions. Anyone should be able to reproduce our findings independently.

Real example:

We published the exact ChatGPT prompts used in our ranking experiment, along with the test sites, so others can verify our results.

Acknowledge Limitations

Intellectual Honesty•Scientific integrity principles

What this means: Be clear about what we don't know and the limitations of our data.

How we practice this:

We explicitly state sample biases, methodological limitations, and areas where our data might not be representative.

Real example:

In our demographics analysis, we clearly stated that our ChatGPT sample is biased toward tech-savvy users and may not represent all ChatGPT users.

Quantifiable Results

Measurement Principle•Scientific measurement standards

What this means: Define success metrics upfront and measure them objectively.

How we practice this:

We establish clear, measurable criteria before running experiments. No subjective assessments - only quantifiable outcomes.

Real example:

Our schema experiment measured specific improvements: 30% better information extraction, additional structured data retrieval, and enhanced source attribution.

No Cherry-Picking Data

Statistical Integrity•Statistical best practices

What this means: Report all results, not just the ones that support our hypotheses.

How we practice this:

We show complete ChatGPT responses, including responses that don't support our hypotheses. We report null results and unexpected findings.

Real example:

In our ranking experiment, we published all 400 query results, including cases where ChatGPT was inconsistent, not just the examples that supported our hypothesis.

Iterate Based on Evidence

Poincaré's Convention•The Value of Science (1905)

What this means: Update our beliefs when data contradicts them, not the other way around.

How we practice this:

When experiments contradict our assumptions, we change our recommendations and methodology, not our interpretation of the data.

Real example:

Our findings about ChatGPT's non-ranking nature led us to completely revise our approach to AI visibility tracking, moving away from traditional ranking metrics.

Subject to Peer Review

Community Validation•Scientific peer review process

What this means: Welcome scrutiny, feedback, and challenges from the community.

How we practice this:

We publish our methodologies publicly and invite others to test, challenge, and improve upon our work. We respond to criticism constructively.

Real example:

This blog post itself is an invitation for peer review - we're sharing our methodology openly for community examination and feedback.

Why This Matters for Our Customers

Trustworthy Insights

Scientific rigor leads to insights you can trust enough to act on. When we say "schema markup improves AI visibility by 30%," you know that's based on controlled experiments, not cherry-picked anecdotes.

The Cost of Bad Data

In AI optimization, bad data leads to wasted resources, missed opportunities, and competitive disadvantages. Our scientific approach helps you avoid these costly mistakes.

Examples in Practice

Here are three real experiments that demonstrate our scientific methodology in action. Each one follows our commitments and provides actionable insights for AI optimization.

Schema Markup vs No Schema: A Real ChatGPT Experiment

Controlled A/B test with identical sites - one with comprehensive schema markup, one without. Results showed 30% improvement in AI information retrieval.

Key Findings:

Schema markup provides additional structured data (ratings, certifications)
Better professional boundaries in AI responses
Enhanced source attribution and verification
Improved information categorization

Methodology:

Created identical test sites with/without schema, asked ChatGPT identical questions, analyzed response quality objectively

Read Full Experiment →

Testing ChatGPT's Non-Ranking Nature: Espresso Machine Brand Visibility Analysis

Comprehensive experiment testing whether ChatGPT maintains consistent rankings across 400 queries, challenging traditional notions of AI search rankings.

Key Findings:

ChatGPT does not maintain consistent rankings like Google
Response order varies significantly between identical queries
Traditional ranking metrics don't apply to AI search
Brand visibility requires different measurement approaches

Methodology:

Tested 400 identical queries about espresso machine brands, analyzed response consistency and ranking patterns

Read Full Experiment →

The ChatGPT Funnel: Accurately Determining Impressions, Clicks and Conversions

Proven methodology for tracking real ChatGPT performance data using server logs and OAI-SearchBot detection. No estimates, just accurate metrics.

Key Findings:

Server log analysis provides accurate ChatGPT traffic data
OAI-SearchBot user agent enables precise detection
Traditional analytics miss AI-driven traffic
Conversion tracking requires specialized methodology

Methodology:

Analyzed server logs for OAI-SearchBot user agent, correlated with ChatGPT responses, validated through controlled testing

Read Full Experiment →

About the Author

Ben Tannenbaum

Founder & CEO, Aiso

Ben Tannenbaum is the founder of Aiso, a marketing tech company helping brands optimize their visibility in AI responses. With expertise in B2B SaaS development and data platform architecture, Ben brings a unique perspective to AI search optimization. His previous experience building data exchange platforms in commercial real estate taught him how to create "give-to-get" solutions that solve industry-wide problems while maintaining trust and data security.

View Full Profile →Connect on LinkedIn →

Academic References

Our methodology is grounded in the work of these foundational thinkers in the philosophy of science. We encourage you to explore their original works to deepen your understanding of scientific methodology.

The Logic of Scientific Discovery

Karl Popper (1959)

Foundational work on falsificationism and the demarcation between science and pseudoscience.

Read →

Conjectures and Refutations: The Growth of Scientific Knowledge

Karl Popper (1963)

Further development of Popper's philosophy of science and critical rationalism.

Read →

Science and Hypothesis

Henri Poincaré (1902)

Classic work on the nature of scientific hypothesis and the role of convention in science.

Read →

The Value of Science

Henri Poincaré (1905)

Exploration of the philosophical foundations of science and mathematics.

Read →

Cargo Cult Science

Richard Feynman (1974)

Famous Caltech commencement address on scientific integrity and avoiding self-deception.

Read →

Our Commitment to You

Scientific Rigor in Practice

These commitments aren't just theoretical principles—they're practical guidelines that shape every experiment we run and every insight we publish. When you use Aiso's data to make decisions about AI optimization, you can trust that it's based on rigorous, reproducible science.

Controlled experiments with proper controls

Transparent methodologies you can verify

Honest reporting of limitations and uncertainties

Reproducible results you can test yourself

Join the Conversation

We believe that scientific methodology benefits from community scrutiny and feedback. If you have questions about our methods, suggestions for improvement, or want to challenge our findings, we welcome the conversation.

Questions or feedback?

We're committed to continuous improvement in our scientific methodology. Reach out if you have questions or suggestions.

Ready to See Scientific Rigor in Action?

Experience our methodology firsthand with a personalized AI visibility analysis for your brand.

Book a Demo →Sign In →

View Our Experiments →Read More Research →