A/B Testing: How to Run Tests That Actually Improve Conversions

Tarık Tunç
Jan 18, 2025
5 min read

A/B testing (also called split testing) is a method of comparing two versions of a webpage or element — the original (control) and a modified version (variant) — by measuring which produces higher conversion rates when shown to equivalent traffic samples. It is the most reliable method for determining whether a change to your website actually improves performance or just appears to.

Without A/B testing, website changes are made on intuition or based on what other companies have reported — neither of which is reliable for predicting what will work on your specific site with your specific audience.

⠀

Why Most A/B Tests Fail

⠀

Before describing how to run A/B tests correctly, it's worth understanding why most tests produce unreliable results:

Insufficient sample size: The most common error. A test that runs until you "see a winner" rather than until a statistically valid sample is reached will produce false positives frequently. Small improvements (2–5% lift) require large sample sizes to detect reliably.

Testing too many elements simultaneously: If you change the headline, the CTA button color, and the hero image at the same time, you can't attribute the result to a specific change. Multivariate testing can account for this, but requires substantially larger sample sizes.

Stopping tests early: Checking results daily and stopping when a variant appears to lead is called "peeking" and produces false positives at high rates. Tests must run until the predetermined sample size is reached.

Testing low-impact elements: Testing button colors on a page with 50 visits per week will require months to reach significance. A/B testing is most productive when focused on high-traffic pages and high-impact elements (headlines, CTAs, offer framing, form length).

Ignoring seasonality: A test that runs across a promotional period or holiday season may produce results that reflect the unusual traffic composition rather than the change being tested.

⠀

Designing a Valid A/B Test

⠀

Step 1 — Identify the problem

A/B tests should solve a diagnosed problem, not test arbitrary changes. Start with analytics data (high exit rate on a key page) or qualitative data (heatmaps showing users miss the CTA, recordings showing form abandonment) that identifies a specific friction point.

Step 2 — Form a specific hypothesis

A testable hypothesis follows the format: "If we [change X], then [metric Y] will improve by [estimated amount], because [reasoning based on diagnostic data]."

Example: "If we reduce the contact form from 8 fields to 4 fields, the form completion rate will increase by at least 15%, because session recordings show users abandoning at the phone and budget fields."

A specific hypothesis ensures you're testing for a clear reason and defines what result would confirm or refute the hypothesis.

Step 3 — Calculate required sample size

Use an A/B test sample size calculator (freely available online) with:

Baseline conversion rate (current rate from analytics)
Minimum detectable effect (the smallest improvement worth detecting — typically 5–15%)
Statistical significance threshold (95% is standard)
Statistical power (80% is standard)

⠀

This produces the minimum number of conversions needed per variant before you can trust the result.

Step 4 — Run the test to completion

Set the test to run until each variant reaches the required sample size. Do not stop the test early if one variant appears to lead before the required sample is reached.

⠀

A/B Testing Tools

⠀

Google Optimize was discontinued in 2023. Current options:

VWO (Visual Website Optimizer): Full-featured A/B testing platform with visual editor, multivariate testing, and heatmaps. Most commonly used for mid-market businesses. Starts at $199/month.

Optimizely: Enterprise A/B testing platform with advanced statistical models and server-side testing capability. Better suited for large-scale operations with development resources.

AB Tasty: Mid-market A/B testing with a visual editor and good native reporting. European-based tool with strong GDPR compliance features.

Convert: Clean, straightforward A/B testing platform with good GA4 integration. More affordable than VWO and Optimizely.

Unbounce and Instapage: Landing page builders with built-in A/B testing. Limited to pages built in their own builder, but a practical option for landing page testing without a separate tool.

⠀

Reading A/B Test Results Correctly

⠀

Statistical significance measures the probability that the observed difference between control and variant is not due to random chance. A 95% confidence level means there's a 5% probability the result is a false positive. The commonly reported "95% confidence" should be understood as: if you run 20 tests at 95% confidence, one winner will likely be a false positive.

Effect size matters more than significance alone: A statistically significant 0.5% improvement in conversion rate may not be practically meaningful. A 15% improvement that barely reaches significance may be exactly what you need. Always evaluate both statistical significance and the magnitude of the effect.

Segment the results: A test where the overall result is neutral may contain a significant improvement for one user segment (mobile users, organic search visitors) and a significant decline for another. Segment analysis often reveals the most actionable insights.

Understand what you're measuring: A conversion rate improvement in a test that uses add-to-cart as the conversion event may not translate to a purchase rate improvement. Test for the outcome that matters, or include multiple metrics in the analysis.

Most tests won't produce clear winners: Approximately 10–20% of properly designed and executed A/B tests produce statistically significant improvements. This is normal. Neutral or inconclusive results are still valuable — they tell you the change you tested didn't help, saving you from implementing it permanently.

Blakfy designs and runs A/B testing programs for clients — building hypothesis frameworks from diagnostic data, configuring tests correctly, and interpreting results in ways that compound into meaningful conversion rate improvements over time.

⠀

Frequently Asked Questions

⠀

How much traffic do I need to run meaningful A/B tests?

The traffic requirement depends on your baseline conversion rate and the size of the improvement you're trying to detect. As a rough guide: if your landing page converts at 3% and you want to detect a 15% relative improvement (from 3% to 3.45%), you need approximately 5,000 sessions per variant. For lower conversion rates or smaller improvements, you need more. Sites with fewer than 1,000 monthly conversions often cannot run statistically valid A/B tests — for these, focus on usability research instead.

How long should an A/B test run?

A test should run until both the required sample size is reached AND at least two full business cycles have passed (typically 2+ weeks). Running a test for less than one full week risks capturing atypical traffic behavior on specific days. Running for fewer than two weeks misses weekly traffic pattern variation.

What should I test first?

Test elements that have the biggest impact on conversions and are on your highest-traffic pages. The typical priority order: headline/value proposition on the main landing page → CTA copy and placement → form length and fields → hero image or product photography → pricing page structure. Small changes to low-traffic pages or secondary design elements produce slow, unreliable results.

What's the difference between A/B testing and multivariate testing?

A/B testing compares two complete versions of a page or a single element. Multivariate testing simultaneously tests multiple elements (headline × CTA × hero image) in all combinations, allowing you to identify the best-performing combination of elements. Multivariate testing requires much larger traffic volumes to reach significance — it's appropriate only for very high-traffic pages where you want to optimize multiple elements efficiently.