Whether or not you believe this to be (as Joel Spol­sky does) the “best post […] about A/B test­ing, ever”, it def­i­nitely is one of the eas­i­est to under­stand and one of the few posts on split test­ing that is sta­tis­ti­cally sound (i.e. useful).

Is [a given A/B test] con­clu­sive? Has [vari­ant] A won? Or should you let the test run longer? Or should you try com­pletely dif­fer­ent text?

The answer mat­ters. If you wait too long between tests, you’re wast­ing time. If you don’t wait long enough for sta­tis­ti­cally con­clu­sive results, you might think a vari­ant is bet­ter and use that false assump­tion to cre­ate a new vari­ant, and so forth, all on a wild goose chase! That’s not just a waste of time, it also pre­vents you from doing the cor­rect thing, which is to come up with com­pletely new text to test against.

Nor­mally a for­mal sta­tis­ti­cal treat­ment would be too dif­fi­cult, but I’m here to res­cue you with a sta­tis­ti­cally sound yet incred­i­bly sim­ple for­mula that will tell you whether or not your A/B test results really are indi­cat­ing a difference:

  1. Define N as “the num­ber of trials.”
  2. Define D as “half the dif­fer­ence between the ‘win­ner’ and the ‘loser’.”
  3. The test result is sta­tis­ti­cally sig­nif­i­cant if D2 is big­ger than N.

Update: Now even eas­ier, thanks to the online split test cal­cu­la­tor.