In the blood pressure example discussed earlier, there was one sample, and the question was whether it matched up to a required true population mean value.
Sometimes this is the situation in medical studies, for example testing if the mean change in a parameter is significantly different from zero. More often we are comparing two samples, testing if they belong to the same population. We might ask the question, “Do employees of company Nice have significantly lower bp than those of company Nasty?”. In other words, are the means of two samples, one from each group, different enough that they would have to be considered to belong to two separate populations.
We cannot simply use the mean of one sample like the population reference or desired mean and plot the mean of the other sample on its SE curve. This is because both samples have variability.
I used to have a vague notion that one could plot the two SE curves together; if their overlap fell outside their critical values for p=0.025 (for a two-tailed test) then they were significantly different. This is equivalent to seeing if the 95% confidence interval error bars on a histogram overlap. If the bars do not overlap they are significantly different. This notion is wrong.
Instead, we should think in terms of comparing the two samples, and when we compare things mathematically we generally subtract them and look at the difference. We have seen that, broadly speaking, the Z score on an SE plot is a mean value divided by the SE.
But to subtract two sample SE plots, we have to think in terms of variances, of which SEs are simply their square root. This is because, the way variance is defined, namely in terms of the sum of squares of differences, it has some peculiar mathematical properties useful to us.
If we convert SE plots to variance plots (a normal distribution probability curve of the means squared instead of the means), we can more simply mathematically subtract them.
When we subtract the numerators we are simply subtracting the squares of the true means. When we subtract the denominators, we subtract the variances (squares of the SEs). But, as discussed before in this primer, a property of variance is such that when we subtract the variances of two independent samples, the end result is adding to the overall variance; this makes common sense; combining two sources of randomness will increase the overall randomness, whether the mathematical operation on the values was addition or subtraction.
So if the standard deviations of the two samples are equal, the “subtracted” plot of variance probability has a peak at the difference between the squares of the means, and a combined variance double the variance of the individual samples. But we need to return to the units of measurement. So we take the square root; the “combined” SE becomes not double but √2 times the individual SEs.
If the means of the two samples are μa and μb, our Z-score of difference of means divided by combined SE is now:
Z = (μa-μb)√n /(SD√2)
We see that the only difference in the Z score formula from the single sample one is the √2 on the denominator.
Returning to my naive assumption about overlapping confidence intervals, one can now see where I went wrong. Mathematically subtracting the variances, not the SEs, gives a double variance, so returning to SE we multiply by √2 not 2.
We can derive the combined Z-score formula more generally and a little more formally below:
The Z score is in general related to SD by:
Z = (μ-μ0)/SD,
Squaring this to make it easier to convert to variances:
Z2 = (μ-μ0)2/SD2,
In the case of a composite variance plot of two samples of means μa and μb, and null hypothesis difference of the means of Δ (usually this is zero), we essentially subtract the means and add the variances of the individual variance plots, Va and Vb. So:
Z2 = (μa-μb – Δ)2/(Va+ Vb)
The square root of the variances of the sample means is the SE, and this is equal to the SDs of the parameters divided by √n, where n is the sample size. So:
Z2 = (μa-μb – Δ)2/(SDa2/na + SDb2/nb)
Converting to Z to use the standard table, we finally have:
Z = (μa-μb – Δ)/(√(SDa2/na + SDb2/nb))
Sometimes the assumption is made that the individual SDs of the two samples are the same and the number of values in each sample might be the same, so when the hypothesised difference between the means is zero, we have the simplified equation of:
Z = (μa-μb)√n/(SD√2)
Assuming equal or unequal SDs is an important statistical distinction; it is often described as assuming equal or unequal variances, which obviously means the same thing. If we are indeed assuming equal variances, and we determine the SDs in both samples, they are likely nevertheless to be a little different. So we take a “weighted average” SD. This is described in more detail in the section on Student’s t-test for independent samples.
Let us return to the blood pressure example. We take a sample of 20 employees from company Nasty and find a mean systolic bp of 180 mmHg. We take a sample of 20 from company Nice and find a mean bp of 165 mmHg.
In drawing the null hypothesis we assume that the two samples really belong to the same population with a single mean bp and a single SD, with an earlier used figure of 22.5 mmHg for the SD.
Using the formula that assumes equal SDs, the Z score is 15 x √20 / 22.5 / √2 = 2.1. We were interested a priori in any difference in bp, not just if company nice had lower bp, so we use a two-tailed p value, which from the table is p=0.036. So we reject the hypothesis that employees of companies Nice and Nasty have the same bp.
See that if we had a difference of 10mm Hg not 15 mmHg, as we had in our one-sample example earlier, we would now get a Z score lower than 2, and the two tailed p-value would not be significant. Because of the extra variability inherent in two independent samples, a difference in means that was significant for one sample is no longer significant when comparing two samples.