# Comparing a Sample Proportion with an Expected Proportion

Sometimes when looking at a single sample the parameters that we analyse are proportions rather than numbers, e.g. if the incidence of a positive vs negative in a sample is significantly different from zero or from a baseline.

We would also use this test to compare two proportions drawn from the same sample. For example, voters are polled on two candidates together to determine if one is more popular than the other. The proportion who vote for Candidate A is 0.2 and the proportion who vote for Candidate B is 0.25. (It doesn’t matter if there were other candidates that some voters chose or if there were some “don’t know” responses, as long as we are only interested in a comparison between two of the options.)

On the other hand, if we wanted to compare the same Candidate’s popularity across two samples, eg men and women, we would use the two sample test of proportions described on the next page.

We define the baseline or required proportion according to the null hypothesis as p0 and the measured proportion as p (not to be confused with the p-value!). For two options chosen from the same sample we would simply designate them pa and pb.

The sample proportion variabilities actually have a binomial probability distribution rather than a normal probability distribution, but there is a different binomial plot for each value of sample size (n) so there would have to be a different table for each value of n.

However, if the proportions in question are fairly near 50% and there is a reasonably large sample size, one can use a normal distribution as an approximation to the binomial. In practice, we can make this assumption if n * p >= 5 and n * (1-p)>= 5 for both the measured sample p and the null hypothesis p0. Therefore we can never use normal approximations if we are comparing a sample with a null hypothesis of zero proportion, e.g determining if the incidence of a clinical sign in a certain population is significantly different from zero, but we can if for example we have 20 subjects and p = 0.25 and p0 = 0.75.

If we can use a normal approximation, we use the Z-score. The variance of proportions is actually simply p(1-p) so the Z-score formula here takes the form of:

Z =  (p – p0)/√(p0(1 – p0)/n)

When comparing two options in the same sample, pa and pb, I would choose the proportion that was closest to 0.5 as my p0 and the other one as my p; this gives a smaller Z-score, a greater probability value and therefore a more conservative test (less likely erroneously to reject the null hypothesis).

Note that the standard deviation has dropped out of the equation; when we are dealing with proportions the range is always 0 to 1 so the data and their variability are already circumscribed; the normal distribution is already defined with a mean somewhere reasonably close to 0.5, and extremes towards 0 and 1.

## Power calculation for comparing a sample proportion with an expected proportion

We may similarly want to perform a power calculation to determine required sample size to minimise false negative errors. If Zα and Zβ are, as before, the Z scores corresponding to the critical probability values for false positive and false negative errors (eg 0.05 for each), p0 is the null hypothesis proportion and pa is the proportion that is at the limit of being close enough to p0 to be considered unimportant, the corresponding equation is:

n = (Zα√(p0(1 – p0)) + Zβ√(pa(1 – pa)))2/(pa – p0)2

Note that unless it is a one tailed test where the direction is pre-set, one should choose the direction of acceptable variance that makes pa closer to 50:50. In other words, if a 5% variance in proportion is acceptable and the null hypothesis proportion is 80%, choose a pa value of 75% not 85%, or rather 0.75 not 0.85.

Again, even if we use a two tailed test for the first hypothesis Zα, we use a one tailed test for the second Zβ.