When the sample size is relatively small, the normal distribution of variability of a sample mean is actually not a very good approximation to its true distribution. This means that Z-tests are unreliable for small sample sizes. An improvement is Student’s t-test. This was designed not by a student but by a brewer going under the pseudonym of Student.
The t-test recognises the inaccuracies of smaller samples by having a distribution that becomes wider the lower the sample size. For very large sample sizes, the t-distribution is the same as the normal distribution. A wider test means that differences will have to be greater to have the same probability of error, in other words false positive errors will be less likely. This is also called being more conservative.
The formulae for determining the t-score for the t-test take a similar form to those for the Z-scores. In the case of an standard error of the mean plot, the Z-score comparing means of two samples a and b relates to the difference between the means divided by a standard error term. As we saw earlier, the denominator, the standard error term, when expressed in terms of SD, was:
SE = √(SDa2/na + SDb2/nb)
The t-test uses the identical formula. The difference lies in the table that matches the t-score to a probability value. In fact for the t-test, as for the binomial distribution, there is a different table for every value of n. The value used in the table is not quite the number of subjects, but instead what is known as the number of degrees of freedom (df). In this situation, the df is the sum of n for both groups minus 2.
Obviously with all the tables for all the degrees of freedom, computers make a particularly attractive alternative for t-tests.
Pooled Standard Deviations
We mentioned earlier in the proportion comparisons section that sometimes an intermediate calculation is performed to derive a “pooled” proportion. A similar process is used in a t-test to derive a pooled SD; whenever we assume that the population variances, and hence SDs, are equal, we derive this “pooled” SD as the best estimate of the true population SD.
We first calculate the two actual SDs for the two samples. A simple way to estimate the population value would be to take the average of the two. A better way would be to average the two corresponding variances, since these are the more fundamental quantities. If the sample sizes are different, it would be better still to take a weighted average, giving more weight to the SD derived from a sample with a greater number of subjects.
So the general form for assuming equal variances, and therefore taking a single SD (denoted SD’) or variance value, is:
t = (μa-μb)√n/(SD’√2)
(Exactly the same can apply to a Z-score calculation)
If the sample sizes are the same, we just average the variances, so the SD term is:
SD’ = √(1/2 * (SDa2 + SDb2)
If the sample sizes are different, the weighted average SD makes the whole equation long-winded so is expressed in two parts. The overall t-score equation becomes:
t = (μa-μb)/(SD’√(1/na + 1/nb))
And the SD’ term is:
SD’ = √(((na-1)SDa2 + (nb-1)SDb2)/(na + nb -2))
The “n-1” terms are I think why the degrees of freedom is used for t-test calculations. If there is a variable measured in two subjects, it can vary only in one dimension, getting further apart or closer together between the two subjects; hence n-1 = 1. If there are three subjects, there is an extra degree of freedom of variation, like the three points of a triangle being pulled in 2D. Four points is like a tetrahedron in 3D and so on. The degrees of freedom is the number of subjects – 1, hence na-1 and nb-1. When combining two samples the total degrees of freedom are added, so it is the number in both groups – 2, hence na + nb -2.
Finally if we do not assume equal variances, we have a t-score again in the form of the equivalent Z-score:
t = (μa-μb – Δ)/√(SDa2/na + SDb2/nb)
But unlike for the Z-score, we also need a degrees of freedom term for the t-table, some kind of amalgamated n value.
So we use the above formula as is, and calculate the degrees of freedom (df) as below:
df = (SDa2/na + SDb2/nb)2/(((SDa2/na)2)/(na-1) + ((SDb2/nb)2)/(nb-1))
Considering unequal variances was actually beyond Student’s remit, and so is known as Welch’s t-test. The df term above gives a good approximation so that ordinary t-tables can still be used.
Independent vs dependent samples
The type of t-test above, and the Z-tests describes earlier, are for two independent samples. In broad terms, this means that the samples are of different subjects. No one subject is in both samples. When measuring bp, one subject cannot be in company Nice and company Nasty!
The next section describes what to do with dependent samples.