Comparing Two Sample Proportions

This is the situation where we would be comparing the number of positive vs negative outcomes in two samples. The same provisos apply to using a Z-score normal distribution statistical calculation as did to the one sample proportion calculation. We take pa as the proportion found one sample and pb as the proportion found in the other. As usual, na and nb are the numbers of subjects in the two samples.

Here the formula becomes a little long-winded unless we perform a preliminary calculation of the weighted pooled mean proportion (p) across the two samples. We divide the sum of the positives by the sum of the totals. For example, if we were comparing two samples, one which scored a mean of 50 out of 90 on a test, and the other which scored a mean of 55 out of 95 on another test, pa = 0.56, pb = 0.58 and the pooled proportion, p, would be (50+55)/(90+95) = 0.57.

The number of positives, eg correct answers, is the proportion times the total number of questions in the test, so:

p = (napa + nbpb)/(na + nb)

Once we have p, the formula for the Z-score is:

Z = (pa-pb – Δ)/√(p(1 – p)(1/na + 1/nb))

 Again no standard deviation is required; the data and variability are already circumscribed. For example, the proportion of correct answers in a test might vary from 0.5 to 0.7, but actual numbers might vary from 0.5 to 0.7 or from 5 million to 7 million.

Power calculation for two sample proportions

So we obviously might want to do a power calculation for determining the sample size required for minimising false negative errors when comparing two proportions. Happily, because it is a proportion, we don’t need a pilot study to guess the SD. We determine the pooled proportion, p, as before:

 p = (napa + nbpb)/(na + nb)

A simplified version of the formula assumes that the hypothesised difference between the mean proportions is zero. If the maximum acceptable unimportant difference between the means is denoted da, the formula for the recommended size of each of the two samples, n, is:

n = (Za√(2p(1 – p)) + Zb√(2p(1 – p) – 0.5da2))2/da2

 As always with proportion Z-calculations, we need to check that n * p >= 5 and n * (1-p)>= 5 or we would have to use a binomial proportion instead.

Using % values as proportions

If we take the proportion of subjects with a certain discrete yes:no variable, the n value is the number of subjects sampled. Percentages are proportions, so as long as the number of subjects used to derive the % is available, we can convert them to proportions. For example, 1000 people, equal numbers of men and women, are polled: 60% of women favour one candidate and 50% of men favour that candidate. The proportions are 0.6 and 0.5 and the number in each sample is 500.

However, I am not sure about comparing variables like exam test results expressed as %. If we want to compare proportions of pass marks, ie yes:no between two samples, that is clearly fine. For example 30% of 55 males achieved 50% or better on a test compared with 35% of 60 females is appropriate for using proportions.

But is it OK to compare a mean score of 30% in 55 males vs a mean score of 35% in 60 females? Our n values would clearly be the number of males and females, not the number of questions in the test. If it was the same test, with obviously the same standard deviation of marks, I wonder if we can simply compare proportions of 0.3 and 0.35 on the basis that a % is already circumscribed. For the pooled proportion value, we should use a weighting again based on the number of subjects in each group – the larger group should have the pooled % closer to its mean % value.

The alternative would be to treat 30 and 35 as mean values, find the standard deviation of marks in the test and use the formula for comparision of means rather than comparison of proportions. I think it boils down to weighing the assumption of normal distribution if treating % as values vs the assumption that it is the same exam and a normal approximation to a binomial distribution if treating % as proportions.

Leave a comment