Claude Prompt for Ad Copy Variant Generation with Expert Panel Scoring

Generate 10 copy variants per element and score each against a simulated 5-persona expert panel — before any traffic hits the page.

Matt Danese

Senior Demand Generation Manager · 8+ years building B2B demand gen programs at Meta, Webflow, Medely, and Regal.ai. Specializes in AI automation for paid media, lead scoring, attribution, and marketing ops. · LinkedIn

Ad copy variant generator: Generate and pre-validate B2B ad copy variants by feeding Claude a copy element, target audience, and channel. The prompt produces 10 variants across 10 distinct angles, scores each against a 5-persona expert panel (CMO, Skeptical Buyer, CRO Specialist, Senior Copywriter, ROI-Focused CEO), and returns the top 3 winners with a recommendation — replacing weeks of live A/B testing with a 10-minute autoresearch loop.

The Prompt

Production Prompt — Copy and use verbatim
You are an autoresearch system for B2B SaaS ad copy. You generate copy variants and score them with a simulated 5-person expert panel before any traffic hits the page. You know that traditional A/B testing limits a team to 20-30 variants per year because of the 14-day wait period — and that pre-launch panel scoring breaks that limit by generating and validating 50+ variants in under 10 minutes.

INPUTS

I will paste the copy element to test below. Could be: a landing page headline, a CTA button, ad copy for a specific channel, a form field label, a thank-you page message. Also include the target audience and the channel.

{PASTE_COPY_ELEMENT_HERE}

{PASTE_TARGET_AUDIENCE_HERE}
(Example: "Senior decision-makers in [function] at [company stage / size] companies evaluating [product category].")

{PASTE_CHANNEL_HERE}
(Example: "LinkedIn Sponsored Content," "Google RSA headline," "Enterprise landing page hero.")

WHAT I NEED FROM YOU

Run the autoresearch loop and produce the output in this exact order:

1. Generate 10 Variants
Produce 10 distinct variants of the copy element. Each variant should take a different angle:
- Pain-agitate-solve
- Before / after
- Provocative question
- Social proof
- Specific outcome / number
- Authority / credibility
- Contrarian / counter-conventional
- Urgency / scarcity (only if appropriate to the brand)
- Direct / no-frills
- Problem-first

Each variant: 1 line of copy.

2. Expert Panel Scoring
Score every variant 0-100 against each of the five panel personas. Each persona scores against their specific question:

- CMO / VP Marketing (senior marketing leader at a mid-market or enterprise B2B company): "Would this make me stop scrolling?"
- Skeptical Buyer (budget owner who has seen every pitch): "Do I believe this claim?"
- CRO Specialist (conversion expert reviewing the page): "Is this clear and action-driving?"
- Senior Copywriter (B2B SaaS copywriting expert): "Is this compelling and differentiated?"
- ROI-Focused CEO (enterprise decision-maker evaluating vendors): "Would I put this on my site?"

Output as a table: Variant | CMO | Skeptical Buyer | CRO | Copywriter | CEO | Average.

3. Top 3 Winners
The three highest-scoring variants by average. For each: state the score, the strongest panel score (which persona scored it highest and why), and the weakest panel score (which persona scored it lowest and why).

4. Cross-Bred Combinations (optional, second round)
If multiple elements are being tested in parallel (e.g., headline AND CTA), generate 6-9 cross-bred combinations from the top 3 winners of each element. Score the combinations as complete units.

5. Recommendation
Single recommended variant to deploy as the autoresearch winner. State why this one over the second-place finisher.

JUDGMENT RULES

- The five-persona panel is calibrated to surface different failure modes. A variant that scores 90 from the CMO but 50 from the Skeptical Buyer is too punchy and not credible. A variant that scores 90 from the Copywriter but 50 from the CRO is clever but not conversion-driving. The average is informative; the spread is more informative.
- Generic CTAs ("Get Started," "Learn More") score systematically lower than specific CTAs ("Book My Enterprise Demo," "See Pricing for 500+ Seats"). The CTA specificity effect is approximately 14 points on the panel scale.
- High-friction form fields (phone, company size) score lower when placed early in a form. The CRO panelist will mark these down 20-30 points relative to the same fields placed later.
- "Human language" in thank-you copy outperforms corporate language. "Our team reviews every submission personally — expect a direct reply within one business day, not a nurture sequence" outscores "Your request has been received."
- Do not score every variant in the 80-90 range. The panel is a forcing function; if every variant scores high, the panel is broken. Real distributions show 40-90 ranges with clear winners.
- If you don't have enough context to score a variant honestly, say so. "Insufficient context on target audience pain points to score this variant against the Skeptical Buyer persona" is the right answer when you're not sure.

OUTPUT FORMAT

Return as {OUTPUT_FORMAT}.

If "markdown": variants list, scoring table, top 3 detail, recommendation.
If "html": styled report with the scoring leaderboard prominently displayed.

Begin.

How to Use It

Traditional A/B testing limits a team to 20–30 variants per year because of minimum traffic requirements and 14-day wait periods per test. The autoresearch approach in this prompt breaks that limit by using a simulated expert panel to pre-validate variants before any traffic hits the page. In practice, the workflow is: run the autoresearch prompt to identify the top 3 variants from 10 generated options, then run those 3 as live tests. You're testing validated winners against each other rather than testing blind.

Claude (Sonnet or Opus) is the right model for this task. The 5-persona scoring requires holding distinct critical perspectives simultaneously — the CMO scores for attention-stopping, the Skeptical Buyer scores for believability, the CRO scores for action orientation, the Copywriter scores for craft, the ROI-Focused CEO scores for enterprise credibility. Claude maintains those distinct voices more reliably than GPT-4 class models, which have a tendency to score most variants in the 80–90 range. The "hallmark of a broken panel" warning in the prompt is there because GPT-4 does exactly this.

The cross-bred combinations section (Step 4) is the highest-leverage feature. When you're testing multiple elements simultaneously — headline AND CTA — generate variants of each separately, identify the top 3 winners per element, then generate cross-bred combinations. A top-3 headline × top-3 CTA grid produces 9 combinations to test, all of which have already been panel-validated. This is where the compound efficiency of the autoresearch approach really shows.

Example Output

Live Example

Example output coming soon — currently running this prompt against live data and will publish the redacted output once it's ready.

Common Failure Modes

Failure modes will be added as this prompt is run in production.

Variations

Two variations of this prompt are worth knowing.

Landing Page Headline Focus

Scoped specifically to landing page hero headlines, with the expert panel calibrated for B2B SaaS landing page conversion rather than ad engagement. Includes additional CRO-specific scoring criteria. Use this version before running your hypotheses through the A/B Test Hypothesis Generator.

Coming soon

[PROMPT GOES HERE]

LinkedIn-Specific Ad Copy

Adapted for LinkedIn Sponsored Content ad copy, accounting for LinkedIn's specific engagement patterns — professional context, news feed placement, character limits, and the expectation that the reader is in work mode. Links to the Ad Creative Audit prompt for scoring the live results.

Coming soon

[PROMPT GOES HERE]

Get one new prompt every Monday.

Plus the system behind it. Free. Built for in-house demand gen managers at B2B SaaS companies.

Subscribe free →

Frequently Asked Questions

Does this work with ChatGPT or only Claude?

Use Claude. The 5-persona panel scoring requires holding genuinely distinct critical perspectives — not just variations on "this is good." GPT-4 class models tend to score variants in a compressed range (80–90 for everything) because they default to positive framing. Claude scores more honestly, including low scores for weak variants, which means the rank order is actually informative. If every variant scores 85 or above, the panel is broken — check your model choice first.

How do I use the panel scores to pick what to test live?

Focus on the average score AND the spread. A variant that averages 85 with consistent scores across all five personas is a strong candidate. A variant that averages 85 because the CMO scored it 98 and the Skeptical Buyer scored it 65 is a polarizing variant — it might work if your audience skews toward senior marketers, but it may fail if the skeptical buyer persona is more representative. Use the spread to understand which audience risks each variant carries.

The prompt mentions a 14-point delta between "Get Started" and "Book My Enterprise Demo" — is that real?

Yes. Specific CTAs consistently outperform generic ones in the panel scoring because they score significantly higher with the CRO Specialist and ROI-Focused CEO personas. "Get Started" fails on "Is this clear and action-driving?" because it doesn't specify what starting entails. "Book My Enterprise Demo" passes that test. The specificity effect holds in real A/B tests as well — this isn't a panel artifact, it's a real conversion pattern.

Can I adapt this for B2C copy or email subject lines?

Yes for email subject lines — the panel scoring logic translates cleanly to email engagement prediction. For B2C copy, swap out the expert panel personas for ones calibrated to your B2C audience. The CMO and Skeptical Buyer personas can stay; replace the ROI-Focused CEO with a persona appropriate to your product category. The structural logic — generate variants, score against distinct perspectives, identify winners by average and spread — works for any copy optimization task.