Most A/B testing programs die because the hypothesis backlog runs dry. You test the button color, you don't see a clear result, and you move on to a different priority while the testing program quietly dies. What kills it isn't the testing itself — it's the absence of a systematic method for generating well-formed hypotheses grounded in actual performance data. This stack gives you that method: pull performance data from your ad platforms and landing pages, run it through Claude's hypothesis generator, and build a structured test backlog with prioritized hypotheses, effort estimates, and expected impact so you always know exactly what to test next.
The Stack
The Prompt
This stack is built around the A/B Test Hypothesis Generator Prompt. Here's the abbreviated version — the full prompt with all variables and usage notes is on its own page.
You are a B2B conversion specialist building a structured A/B test backlog. Review the ad platform and landing page performance data below. For each underperforming element (CTR below 0.5%, CVR below 2%, or CPL above target), generate a specific, testable hypothesis with: the variable being tested, the expected direction of impact, the success metric, and a brief for the test variant. Rank all hypotheses by expected impact × implementation effort.[ ... continued — see full prompt ]
The Workflow
-
Export campaign performance data from ad platforms
Pull CTR, CVR, and CPL by ad variant from Google Ads and LinkedIn Ads for the last 90 days. Include creative performance broken down by audience segment — the same ad often performs very differently by audience.
-
Pull landing page conversion data from GA4
Export conversion rate, bounce rate, and scroll depth for the destination pages tied to those campaigns. The ad and the page are a system — underperformance often lives at the handoff between them.
-
Paste both into the A/B Test Hypothesis Generator prompt
Frame the analysis as backlog generation, not just performance review. Claude needs to know what you're trying to improve, not just what's performing poorly.
-
Review Claude's ranked hypothesis backlog
Claude generates specific, testable hypotheses with the variable, expected direction, success metric, and test brief for each — ranked by expected impact relative to implementation effort. Start with high-impact, low-effort tests.
-
Build the Notion test backlog and assign owners
Export Claude's output to Notion as a running backlog. Assign each test to an owner, add an estimated start date, and track results in the same document so you build institutional knowledge over time.
What This Replaces
- A/B test programs that die when the obvious test ideas run out after the first two weeks
- Hypotheses generated from opinion rather than from patterns in actual performance data
- Test results that live in isolated spreadsheets and never compound into a repeatable system
Related Stacks
New stacks drop weekly.
Each one includes the tools, the Claude prompt, and the workflow logic. Free — built for in-house B2B demand gen managers.