A/B Testing
A/B testing is an experiment where you show two versions of something to different groups of users to see which performs better.
What it means
An A/B test is the cleanest way to know if a change actually helps. You take two versions (A and B), randomly show each to half your traffic, measure the result, and pick the winner. The randomization is what makes it work: it removes confounding variables like time of day, traffic source, or user type.
A/B tests are useful for headlines, button colors, layout changes, pricing, and onboarding flows. They're not useful for changes you can't roll back, big strategic decisions, or anything where the difference is too small to detect with your traffic volume.
The trickiest part of A/B testing isn't running the test, it's interpreting it. You need enough sample size to detect the actual effect, you need to wait long enough for the result to stabilize, and you need to resist 'peeking' early and calling a winner before the test is statistically significant.
Why it matters
A/B testing replaces opinions with evidence. Without it, you make changes based on what you think will work, then attribute any change in metrics to your edit (when it might just be normal variance). With it, you know whether your change actually moved the metric or just rode the noise.
Example with real numbers
Concrete example showing how this metric works in practice.
Scenario
You want to know if a green CTA button converts better than your current orange one. You set up an A/B test: 50% of visitors see orange, 50% see green.
What it means
After 10,000 visitors per variant, the orange button converts at 4.2% and the green at 4.8%. The difference is statistically significant. Roll out green to 100% of traffic.
Common mistakes
Things people get wrong when measuring a/b testing.
Mistake 01
Calling a winner too early. With small samples, normal variance can look like a winner.
Mistake 02
Running too many tests at once. Overlapping experiments make it hard to attribute results.
Mistake 03
Testing changes too small to matter. Going from one shade of blue to another rarely produces detectable differences.
Mistake 04
Forgetting about novelty effects. A new design might lift conversion temporarily because it's new, then return to baseline.
How to track it
Use a dedicated A/B testing tool (PostHog, Optimizely, Google Optimize) for proper randomization and statistical handling. For simple tests on landing pages, you can use two URLs and compare conversion rates over a set period.
Related concepts
Other terms worth learning if you're studying this one.
Common questions about a/b testing
A/B testing is an experiment where you show two versions of something to randomly split groups of users to determine which performs better on a chosen metric.
Long enough to reach statistical significance with enough traffic. For most sites, that's at least 1-2 weeks per test. Shorter tests are usually unreliable.
Depends on the size of the effect you're measuring. Small effects (lifting from 4% to 4.5%) need tens of thousands of visitors per variant. Large effects (4% to 8%) need only thousands. Use a sample size calculator before starting.
When you don't have enough traffic to reach significance, when the change is too small to matter, or when you can't roll back if the variant turns out worse. Sometimes user research is faster than A/B testing.