51 Variants, 8 Minutes, Zero Live Traffic: How I Pre-Test Landing Pages with a Simulated Expert Panel

Matt Danese

Demand gen leader. AI builder. 8+ years at Meta, Webflow, Medely, Octus, and Regal.ai.

The standard advice on A/B testing is: run one test at a time, isolate your variable, wait for statistical significance, then implement the winner and run the next test. This process, done correctly, takes 14–21 days per test. If you're testing 4 elements on a landing page — headline, subhead, CTA, hero image — at one test at a time, you're looking at 8–12 weeks before you've run through even a basic optimization cycle.

Most demand gen teams I've talked to don't have 12 weeks. They have a campaign launching in two weeks and a landing page that hasn't been touched since it was built by an agency 18 months ago. The standard advice is useless for them. It's designed for large-traffic consumer e-commerce, not B2B campaigns with 200–400 conversions per month.

So I built something different. Not better — different. For a specific type of problem that most B2B teams actually face.

The insight: pre-traffic validation

If you can't run a statistically valid live test in your timeline, the next best thing is to use human judgment at scale before any traffic hits the page. The question is: whose judgment, and how do you get it fast?

The answer I landed on was a simulated expert panel. I built a two-phase system. Phase 1 generates copy variants and scores them using five distinct evaluator personas:

Each variant gets scored 0–100 by each evaluator. The composite score is the average. Variants above a threshold advance to a second round where the top performers compete head-to-head with more detailed feedback. The process runs entirely through the Claude API.

62→87 Score improvement on a live landing page — evaluating 51 copy variants in 8 minutes before a single dollar of traffic was spent.

What actually happened

The first real test of this system was on a landing page for a webinar campaign. The original page had a composite panel score of 62/100. The CMO persona liked the strategic framing. The Skeptical Buyer hated the headline — said it used three different B2B buzzwords in one sentence and that any real buyer would stop reading immediately. The CRO Specialist flagged that the primary CTA was below the fold on mobile.

I generated 51 variants targeting those three specific failure modes. The system scored all 51 variants in 8 minutes. The winning variant — new headline, restructured first paragraph, CTA moved above fold — scored 87/100. The Skeptical Buyer's score went from 48 to 79. That single persona's feedback, surfaced in 8 minutes, probably saved two weeks of a live test that would have been telling me the same thing with real ad budget behind it.

Does this replace live testing? No. Phase 2 of the system deploys the pre-validated winner to real traffic and runs ongoing mutations week-over-week, compressing a 14-day A/B cycle into hours.

The prompt architecture matters more than the model

The temptation when building something like this is to prompt Claude generically: "Score this landing page copy on a scale of 1 to 100." Generic prompts produce generic scores. Every variant ends up in a 65–75 band and you can't differentiate between them.

The panel works because each evaluator has a distinct, opinionated point of view with a specific axis of evaluation. The Skeptical Buyer doesn't evaluate everything — they evaluate trust and credibility signals. The CRO Specialist doesn't evaluate messaging strategy — they evaluate friction and clarity. When evaluators have narrow mandates, their scores become meaningful signals rather than average opinions.

The other critical piece is requiring brief justification for every score. A score without justification is useless for iteration. A score with a specific critique — "this headline buries the value prop behind a question; buyers don't want to think, they want to understand immediately" — gives you the exact edit to make on the next variant. Justifications are what turn a scoring system into a feedback loop.

Want the full system PRD?

Subscribe to The Demand Engine(er) — free — and get instant access to all 5 system PRDs.

Get the PRDs →