If there is just one piece of knowledge that the reader of this statistics primer should take away, I hope that it would be to think about statistics before rather than after conducting a study, and if necessary asking for expert advice. Doing this will make it more likely that you will gain useful answers to the hypotheses you raise. General points when designing a study include:
- A measurement that is accurate and reproducible is important, to minimise measurement error and narrow the standard deviation to the true variability between subjects rather than the true variability plus random variability due to inaccurate measurement.
- A measurement that is normally distributed may be helpful as it allows us to perform less conservative statistics and therefore be more likely to identify significant differences.
- The larger the sample, the narrower the standard error distribution for a given standard deviation, and this means it will be clearer to pick up if any differences between means are significant. A power calculation focuses the mind on the sample size required to minimise type 2 (false negative) errors should the study not demonstrate significant differences between samples.
- A single simple hypothesis is best. Trying to perform too many tests on one data set increases the chance that one test might throw up a false positive by random chance, since for p = 0.05 the likelihood that each one is positive by chance is set at 1 in 20; if we do 20 tests, the chances are we may get one popping up positive at random even if it wasn’t really positive. There are statistical corrections for this, a simple one being the Bonferroni correction, where the required p value is divided by the number of tests being performed on the same data set; thus five tests would require an uncorrected p value of 0.01 to be significant at the 0.05 level.
That concludes this primer. I hope it was helpful. If you have spotted any errors, take pity on a poor non-mathematician and point them out kindly so that I may correct them. While I have included some formulae and their derivations, the main aim was to impart general principles rather than to be a rigorous and comprehensive account.