If you didn’t read the House of Com­mons Library’s sta­tis­ti­cal lit­er­acy guides recently (or you need a refresher on what, exactly, sta­tis­ti­cal sig­nif­i­cance means), then you can do much worse than stu­dent War­ren Davies’ short run­down on the mean­ing of sta­tis­ti­cal sig­nif­i­cance:

In sci­ence we’re always test­ing hypothe­ses. We never con­duct a study to ‘see what hap­pens’, because there’s always at least one way to make any use­less set of data look impor­tant. We take a risk; we put our idea on the line and expose it to poten­tial refu­ta­tion. There­fore, all sta­tis­ti­cal tests in psy­chol­ogy test the prob­a­bil­ity of obtain­ing your given set of results (and all those that are even more extreme) if the hypoth­e­sis were incor­rect — i.e. the null hypoth­e­sis were true. […]

This is what sta­tis­ti­cal sig­nif­i­cance test­ing tells you — the prob­a­bil­ity that the result (and all those that are even more extreme) would have come about if the null hypoth­e­sis were true. […] It’s given as a value between 0 and 1, and labelled p. So p = .01 means a 1% chance of get­ting the results if the null hypoth­e­sis were true; p = .5 means 50% chance, p = .99 means 99%, and so on.

In psy­chol­ogy we usu­ally look for p val­ues lower than .05, or 5%. That’s what you should look out for when read­ing jour­nal papers. If there’s less than a 5% chance of get­ting the result if the null hypoth­e­sis were true, a psy­chol­o­gist will be happy with that, and the result is more likely to get published.

Sig­nif­i­cance test­ing is not per­fect, though. Remem­ber this: ‘Sta­tis­ti­cal sig­nif­i­cance is not psy­cho­log­i­cal sig­nif­i­cance.’ You must look at other things too; the effect size, the power, the the­o­ret­i­cal under­pin­nings. Com­bined, they tell a story about how impor­tant the results are, and with time you’ll get bet­ter and bet­ter at inter­pret­ing this story.

To get a real feel for this, Davies pro­vides a simple-to-follow exam­ple (a loaded die) in the post.

via @sandygautam