Let us take a medical example of a parameter that has an approximately normal distribution in the adult human population, namely systolic blood pressure. In order to reduce the risk of heart disease and stroke, the chief medical officer of a multinational company wants the mean systolic blood pressure (bp) of its members to be no higher than 180 mmHg. The company nurses are worried they might not hit this target, and want to get a “preview” before the time when everyone’s blood pressure is to be recorded. Then they might have time to act if necessary by prescribing antihypertensives or de-stress programmes to the employees before the official measurement.
They do not have the resources to measure everyone’s blood pressure to get this preview, so they take a sample. The question is, when they get the mean value from the sample, how confident are they that this sample mean will reflect the true population mean? If the mean is a little higher that required, might that reflect random variation in sampling the mean and the overall population mean is really at a safe level, or indeed if it is a little lower, might the overall mean really be too high and the sample was an underestimate?
Determining a Significant Difference
Statistics is all about estimation to a certain level of confidence. It is best to think about the above uncertainty in a backwards manner. Instead of asking (hypothesising), “Is the actual true population mean higher than the required population mean?”, we hypothesise that it is not higher, secretly wondering if we are going to end up rejecting this hypothesis – i.e. concluding that it is higher. Hypothesising something to see if we are going to reject it is called making the null hypothesis.
Arbitrarily it is often decided that if a hypothesis is as unlikely as one in twenty (5% – or probability p=0.05, where 0 is impossible and 1 is certain), then we can reject it. So we consider the mean and standard error curve for the whole population; the mean according to the null hypothesis is the same as that for the desired mean bp, namely 180 mmHg. Then we look at where the measured mean for the sample of the population lies on this curve. If it lies higher than the point on the curve that corresponds to a probability value of 0.05, it is likely that the actual population mean is in fact significantly higher than the desired mean. We reject the null hypothesis at the p=0.05 level of significance. Usually we quote the actual p value to show exactly how confident we are in rejecting the null hypothesis.
The way we do this calculation is as follows. The normal distribution curve is a complex formula, so what was done in the good old days before computers was to standardise the curve to be independent of curve width, the latter of course being quantified by its standard error. Thus each x-axis value point is described in terms of how many standard error values it is away from the mean. Once this is done, no matter what the data set, the curve always has the same shape. This means that if we take the value of difference between observed sample mean (μ) and desired population mean (σ), which is here the same as the null hypothesis mean (μ0), and convert this difference into standard error units by dividing by the standard error, the latter will always correspond to a certain probability value.
The Z-Score
This standardised x-axis value on the normal distribution curve is called the Z-score.
In a plot of normal distribution of sample means about the true population mean, which is what we want in this example, to find the Z-score of a sample mean we take the difference between the sample mean and the true population mean and then divide it by the standard error of the mean.
The Z-score is also a useful number to describe in a standardised manner how far a single value is away from the mean value. For example, in expressing bone density to describe severity of osteoporosis, one could give an actual value in terms of X-ray attenuation. It is more useful to a clinician or patient, however, to consider how the value deviates from typical density. The Z-score describes how many standard deviations (SD) a subject’s bone density is away from average. Here we consider a plot of normal distribution of values about a mean value rather than a plot of sample means about a population mean; to find the Z-score of a value we take the difference between the value and the mean and then divide by the SD.
So in a normal distribution plot of values about a mean value, μ, we calculate the Z-score of a value by:
Z = (value – μ)/SD
In a normal distribution plot of sample means about the true population mean , σ, we calculate the Z-score of a particular sample mean value, μ, by:
Z = (μ – σ)/SE
We described in the previous section how the standard error of a mean of n values relates to the standard deviation of the values:
SE=SD/√n
Therefore the Z-score of a sample mean of n values is:
Z = (μ-σ)√n /SD
Determining if a Sample Mean is Greater than a Desired Mean
Returning to our example of blood pressure, our null hypothesis was that the true population mean was no greater than a certain desired mean value. We can denote this null hypothesis upper limit mean equal to the population mean as μ0. So:
Z = (μ-μ0)√n /SD
We can estimate the SD of the population by calculating the SD of the sample, and hope that they are approximately the same. We now have enough information to calculate the Z-score, and hence the p-value associated with a certain sample mean value.

If a sample of 20 people has a mean blood pressure (bp) of 185 mmHg and a SD of 22.5 mmHg, this is one SE greater than a desired bp of 180 mmHg. The one-tailed p value, the area of the section of the plot greater than the 1 SE line, is 0.16. (Note that this encompasses the 0.14 probability of the segment between 1 and 2 SE plus the 0.02 probability of the segment above 2 SE.) If the sample’s mean bp was 190 mmHg, this would be 2 SE greater than the desired mean, and the p-value is only 0.02. The bp would now be considered significantly greater than desired bp.
Thus, if the desired bp is 180 mmHg, and a sample of 20 employees have a mean bp of 185 mmHg with an SD of 22.5 mmHg, the sample mean is one standard error, or one SD/√n, greater than the null hypothesis mean, so the Z-score is 1. When we look up this Z-score on a table that someone laboriously calculated from the complex formula (not shown here!) that describes the normal distribution curve, we get the corresponding p-value. On the table (or on the computer software), a Z-score of 1 corresponds to a probability of 0.16. This means that there is only a 16% chance that the sample could have been taken from a population whose mean was at the desired level according to the null hypothesis. This chance is quite low, but not low enough to satisfy the commonly used critical value of 0.05, or 5%. So we cannot conclude from the sample mean that the true population mean is higher than the desired mean.
On the other hand, if the same sample had a mean bp of 190 mmHg the Z-score would be 2, i.e. the sample mean would be 2 SE higher than the null hypothesis mean. On the table, the probability of this is only 0.02. This would satisfy our criteria for rejecting the null hypothesis at the 0.05 level. Here we would say that the sample indicates that the whole population of employees are actually significantly hypertensive compared to the required mean level, with a p value of 0.02. The chief medical officer will be unhappy when he gets the results of the official measurement of the whole company!
- Next Page: One-Tailed versus Two-Tailed Probability
- Goto: Introduction and Contents