In our example where we were sampling the blood pressure of the employees of a company, one point worth mentioning is that in this specific case we were only interested in whether or not the bp was too high. We only considered the right side of the probability curve. This is called the one-tailed p-value. In general a one-tailed test is where the hypothesis considers only the right (too high) or left (too low) probability extremes, but not both.
But what if the question was instead whether the bp was too high or too low? In other words, did it significantly differ from the required level of 180 mmHg in either direction? In this case we must consider both sides of the probability curve, called a two-tailed p value. Because the curve is symmetrical, a Z score of 1, i.e. one standard error from the mean, now corresponds to a p value of 0.32, i.e. double 0.16.
Even though the mean might be higher or lower, for this hypothesis it could be either, so to reject the hypothesis it could be too low or too high. Just because a sample mean happens be higher than the desired level, a priori it could have been lower and possibly too low to be consistent with the population or desired mean.
In medicine, we more usually perform two-tailed tests than one-tailed tests. For example, does a sample drug dose in a pill significantly differ from the desired dose? Is a change in walking speed after an intervention significantly different from no change (either better or worse)? We only perform a one-tailed test when we really aren’t interested in the other direction, or it makes no sense. But in the latter case one then often wonders if the distribution is really normal.
Some kind of quality-control test, where a parameter just has to be at least a certain value and it doesn’t matter how much more, is the main example of a one-tailed test I can think of, e.g. “Is an intervention at least cost-effective by a certain amount to warrant funding?”. This is sometimes called a non-inferiority comparison.