The Statistics of A/B Testing

Whether or not you believe this to be (as Joel Spolsky does) the “best post […] about A/B testing, ever”, it definitely is one of the easiest to understand and one of the few posts on split testing that is statistically sound (i.e. useful).

Is [a given A/B test] conclusive? Has [variant] A won? Or should you let the test run longer? Or should you try completely different text?

The answer matters. If you wait too long between tests, you’re wasting time. If you don’t wait long enough for statistically conclusive results, you might think a variant is better and use that false assumption to create a new variant, and so forth, all on a wild goose chase! That’s not just a waste of time, it also prevents you from doing the correct thing, which is to come up with completely new text to test against.

Normally a formal statistical treatment would be too difficult, but I’m here to rescue you with a statistically sound yet incredibly simple formula that will tell you whether or not your A/B test results really are indicating a difference:

  1. Define N as “the number of trials.”
  2. Define D as “half the difference between the ‘winner’ and the ‘loser’.”
  3. The test result is statistically significant if D2 is bigger than N.

Update: Now even easier, thanks to the online split test calculator.