Biostatistics for the Clinician

Copyright (C) 1996

Oser, G., Johnson, C.W. & Abedor, A. J.

Biostatistics for the Clinician: Why and How

Lesson 1: Summary Measures of Data (Descriptive Statistics)

Why Important?
1. Types of Variables, Quality of Measurements
2. Central Tendency
3. Variability
4. Exploratory Data Analysis (EDA)
5. Standard Scores
6. Distributions

Lesson 2: Inferential Statistics + Review of Lesson 1

Why Important?
1. Sampling: Distribution of Means
2. Performing Tests
- 2.1 Parametric Tests
  - 2.1.1 t-tests
- 2.2 Non-parametric Tests

Lesson 3: Clinical Decision Making in a Multi-variable Environment + Review of Lesson 2

1. Correlation & Regression
2. Decision Analysis

Biostatistics for the Clinician: Why and How

Why Important?

A critical distinction between the scientific approach and other methods of inquiry lies in the emphasis placed on real world validation. Where research has shown that particular approaches are appropriate and effective for specific applications, those approaches are to be selected by the practicing physician. Physicians are called upon to defend decisions on the basis of empirical research evidence. Consequently, the clinician must be an intelligent consumer of medical research outcomes, able to understand, interpret, critically evaluate and apply valid results of the latest medical research.

There is an old story (Norman & Streiner, 1986) about three little French boys who happened to see a man and woman naked on a bed in a basement apartment. The four year old said, "Look, that man and woman are wrestling!". The five year old said, "You silly, they're not wrestling they're making love!" The six year old said, "Yes! And very poorly too!!" The four year did not understand. The five year old had achieved a conceptual understanding. The six year old understood it well enough, presumably without actual experience, to be a critical evaluator. The intent of the following instruction is to make you a critical evaluator of medical research, a "six year old biostatistician".

The instruction provides a brief overview of the most frequently used and most important descriptive and inferential biostatistical methods as they are relevant for the clinician. The goal is that the student will appreciate how the application of the theories of measurement, statistical inference, and decision trees contributes to better clinical decisions and ultimately to improved patient care and outcomes. Conceptual understanding, rather than computational ability, will be the focus. Development of an adequate vocabulary, an examination of fundamental principles and a survey of the widely used procedures or tools to extract information from data, will form a basis for fruitful collaboration with a professional biostatistician when appropriate. The needs of practicing physicians, not the skills for sophisticated medical research, will inform the presentations.

Goal of Experimental Method:

To prove that the treatment and only the treatment caused the effect.

Usefulness of Biostatistics:

Place limits on effects of chance in small sample experiments
(Alpha or False Positives).
Determine sample size needed to detect clinically relevant effects
(Beta or False Negatives, 1-Power).
Control for effects of one or more confounding variables.
Assist in developing alternative designs for human experiments.
Use maximum information content measurement.
Measure intangibles such as intelligence, depression, and well-being.

Lesson 1: Summary Measures of Data
(Descriptive Statistics)

1: Types of Variables and Measures

Now, you might ask, why do I need to know about levels of measure and types of variables? Good Question! You need to know, in order to judge whether appropriate statistical techniques have been used, and consequently whether conclusions are valid. In other words, you can't tell whether the results in a particular medical research study are credible unless you know what types of measuring scales have been used in obtaining the data.

1.1 Levels of Measure

Four Types of Scales

There are four types of measurement scales (nominal, ordinal, interval and ratio; see Figure). Each of the four scales, respectively, provides higher information levels about variables measured with the scale.

Nominal Scales

Nominal scales name and that is all that they do. Some examples of nominal scales are sex (male, female), race (black, hispanic, oriental, white, other), political party (democrat, republican, other), blood type (A, B, AB, O), and pregnancy status (pregnant, not pregnant; see Figure).

[Insert Figure 1 about here]

Ordinal Scales

Ordinal scales both name and order. Some examples of ordinal scales are rankings (e.g., football top 20 teams, pop music top 40 songs), order of finish in a race (first, second, third, etc.), cancer stage (stage I, stage II, stage III), and hypertension categories (mild, moderate, severe; see Figure).

[Insert Figure 1 about here]

Interval Scales

Interval scales name, order and have the property that equal intervals in the numbers on the scale represent equal quantities of the variables being measured. Some examples of interval scales are fahrenheit and celsius temperature, SAT, GRE and MAT scores, and IQ scores. The zero on an interval scale is arbitrary. On the celsius scale, 0 is the freezing point of water. On the fahrenheit scale, 0 is 32 degrees below the freezing point of water (see Figure).

[Insert Figure 1 about here]

Ratio Scales

Ratio scales have all the properties of interval scales plus a meaningful, absolute zero. That is, zero represents the total absence of the variable being measured. Some examples of ratio scales are length measures in the english or metric systems, time measures in seconds, minutes, hours, etc., blood pressure measured in millmeters of mercury, age, and our common measures of mass, weight, and volume (see Figure).

[Insert Figure 1 about here]

They are called ratio scales because ratios are meaningful with this type of scale. It makes sense to say 100 feet is twice as long as 50 feet because length measured in feet is a ratio scale. Likewise it makes sense to say a Kelvin temperature of 100 is twice as hot as a Kelvin temperture of 50 because it represents twice as much thermal energy (unlike fahrenheit temperatures of 100 and 50).

[Insert Figure 1 about here]

1.2 Qualitative vs. Quantitative Variables

Nominal and ordinal scales are called qualitative measures. Interval and ratio scales are called quantitative measures (see Figure).

[Insert Figure 1 about here]

Now, when statistical analyses are applied, the statistics must take into account the nature of the underlying measurment scale, because there are fundamental differences in the types of information imparted by the different scales (see Figure). Consequently, nominal and ordinal scales must be analyzed using what are called non-parametric or distribution free statistics. On the other hand, interval and ratio scales are analyzed using parametric statistics. Parametric statistics typically require that the interval or ratio variables have distributions shaped like bell (normal) curves, a reasonable assumption for many of the variables frequently encountered in medical practice.

[Insert Figure 1 about here]

1.3 C.R.A.P. Detector #1.1

Dependent variables should be sensible. Ideally, they should be clinically important, but also related to the independent variable.

1.4 C.R.A.P. Detector #1.2

In general, the amount of information increases as one goes from nominal to ratio. Classifying good ratio measures into large categories is akin to throwing away data.

2: Central Tendency:

Why Important?

Why do you need to know about measures of central tendency? You need to be able to understand summaries of large amounts of data that use simple measures to best represent the location of the data as a whole. Collectively, such measures or values are referred to as measures of central tendency. Measures of central tendency are ubiquitous in the medical research literature. The most frequently used measures of central tendency are the mean, median and mode.

2.1 Mean

The most frequently used measure of central tendency is the mean. The mean, or more formally, the arithmetic mean, is simply the average of the group. That is, the mean is obtained by summing all the numbers for the subjects in the group and dividing by the number of subjects in the group. The mean is useful only for quantitative variables (see Figure).

[Insert Figure 2.3 about here]

2.2 Median

The median is the middle score. That is, the median is the score for which half the subjects have lower scores and half have higher scores. Another way to say this is that the median is the score at the fiftieth percentile in the distribution of scores (see Figure).

[Insert Figure 2.3 about here]

2.3 Mode

The mode is the most frequent score. Another way to say this is that the mode is the score that occurs most often (see Figure)..

[Insert Figure 2.3 about here]

2.4 Summary Principle

In a symmetric distribution with one mode like the normal distribution the mean, median and mode all have the same value. But, in a non-symmetric distribution their values will be different. In general, as the distribution becomes more lopsided the mean and the median move away from the mode. With extremely skewed distributions the mean will be somewhat misleading as a measure of central tendency, because it is heavily influenced by extreme scores. So for example, if we take a distribution of doctor's incomes, some doctors make huge sums of money, and the median or the mode is more representative of doctor's incomes as a whole than the mean, because the very high incomes of some doctors inflates the average, making it less representative of doctors as a whole (see Figure 2.3).

[Insert Figure 2.3 about here]

3: Variability

Why Important?

Why do you need to know about measures of variability? You also need to be able to understand summaries of large amounts of data that use simple measures to best represent the variability in the data. Measures of variability also occur very frequently in the medical research literature. If all data values are the same, then, of course, there is zero variability. If all the values lie very close to each other there is little variability. If the numbers are spread out all over the place there is more variability. Again there are many measures of variability. Some of the most frequently used measures of variability are the standard deviation, interquartile range and the range.

3.1 Standard Deviation

The standard deviation can be thought of as the average distance that values are from the mean of the distribution (see Figure).. This means that you must be able to compute a meaningful mean to be able to compute a standard deviation. Consequently, computation of the standard deviation requires interval or ratio variables. In a distribution having a bell (normal) curve, approximately 68% of the values lie within 1 standard deviation of the mean. On the other hand, approximately 2.1% of the values lie in each tail of the distribution beyond 2 standard deviations from the mean (see Figure)..

[Insert Figure 2.5 about here]

[Insert Figure SD Formula Figure about here]

3.2 Interquartile Range

Remember the median was the point in the distribution where 50% of the sample were below and 50% are above. Quartiles can be defined at the 25th percentile, the 50th percentile, the 75th percentile and the 100th percentile. The interquartile range, then, from the 25th percentile to the 75th percentile, includes 50% of the values in the sample. The interquartile range is the distance between the 25th percentile and the 75th percentile. The interquartile range is a measure of variability that can be appropriately applied with ordinal variables and therefore may be used especially in conjunction with non-parametric statistics (see Figure)..

[Insert Interquartile Range & Range Figure about here]

3.3 Range

The range is simply the difference between the highest and lowest value in the sample (see Figure)..

[Insert Interquartile Range & Range Figure about here]

4: Exploratory Data Analysis (EDA)

Why Important?

Why do you need to know about exploratory data analysis (EDA)? The purpose of EDA is to provide a simple way to obtain a big picture look at the data and a quick way to check data for mistakes to prevent contamination of subsequent analyses. Exploratory data analysis can be thought of as a preliminary to a more in depth analysis of the data (see Figure)..

A primary tool in exploratory data analysis is the box plot (see figure). What does a box plot tell you? You can, for example, determine the central tendency, the variability, the quartiles, and the skewness for your data. You can quickly visually compare data from multiple groups. A small rectangular box is drawn with a line representing the median, while the top and bottom of the box represent the 75th and 25th percentiles, respectively. If the median is not in the middle of the box the distribution is skewed. If the median is closer to the bottom, the distribution is positively skewed. If the median is closer to the top, the distribution is negatively skewed. Extreme values and outliers are often represented with asterisks and circles (see Figure)..

[Insert Box and Whisker plot about here]

4.1 Hinges

The top and bottom edges of the box plot are referred to as hinges or Tukey's hinges (see Figure)..

[Insert Box and Whisker plot about here]

4.2 Ranges

? (see Figure).

[Insert Ranges Figure about here]

[Insert Box and Whisker plot about here]

4.3 Outliers

Outliers and extreme values are often represented with circles and asterisks, respectively. Outliers are values that lie from 1.5 to 3 box lengths (the box length represents the interquartile range) outside the hinges. Extreme values lie more than 3 box lengths outside the hinges. In a box and whisker plot the actual values of the scores will typically lie adjacent to the outlier and extreme value symbols to facilitate examination and interpretation of the data (see Figure)..

[Insert Box and Whisker plot about here]

4.4 Box & Whisker plots

Box and whisker plots represent more completely the range of values in the data by extending vertical lines to the largest and smallest values that are not outliers, extending short horizontal segments from these lines to make more apparent the values beyond which outliers begin (see Figure).

[Insert Box and Whisker plot about here]

5. Standard Scores

Why Important?

Why do you need understand standard scores or z-scores? Again, they appear frequently in the medical literature. A natural question to ask about a given value from a sample is, "How many standard deviations is it from the mean?". The z-score answers the question. The question is important because it addresses not only the value itself, but also the relative position of the value. For example, if the value is 3 standard deviations above the mean you know it's three times the average distance above the mean and represents one of the higher scores in the distribution. On the other hand, if the value is one standard deviation below the mean then you know it is on the low end of the midrange of the values from the sample. But, there is much more that is important about z-scores.

5.1 z-Scores

For every value from a sample, a corresonding z-score can be computed. The z-score is simple the signed distance the sample value is from the mean. There is a simple formula for computing z-scores (see Figure):

[Insert z-score formula here]

5.2 General z-Score Properties

Because every sample value has a correponding z-score it is possible then to graph the distribution of z-scores for every sample. The z-score distributions share a number of common properties that it is valuable to know. The mean of the z-scores is always 0. The standard deviation of the z-scores is always 1. The graph of the z-score distribution always has the same shape as the original distribution of sample values. The sum of the squared z-scores is always equal to the number of z-score values. Furthermore, z-scores above 0 represent sample values above the mean, while z-scores below 0 represent sample values below the mean (see Figure).

[Insert z-score graph about here]

5.3 Gaussian z-Score Properties

If the sample values have a Gaussian (normal) distribution then the z-scores will also have a Gaussian distribution. The distribution of z-scores having a Gaussian distribution has a special name because of its fundamental importance in statistics. It is called the standard normal distribution. All Gaussian or normal distributions can be transformed using the z-score formula to the standard normal distribution. Statisticians know a great deal about the standard normal distribution. Consequently, they also know a great deal about the entire family of normal distributions. All of the previous properties of z-score distributions hold for the standard normal distribution. But, in addition, probability values for all sample values are known and tabled. So, for example, it is known that approximately 68% of values lie within one standard deviation of the mean. Approximately 95% of values lie with 2 standard deviations of the mean. Approximately 2.1% of values lie below 2 standard deviations below the mean. Approximately 2.1% of values lie above 2 standard deviations above the mean (see Figure).

[Insert standard normal z-score graph about here]

6. Distributions

Why Important?

Why do you need to know about distributions? Again the primary answer is that various kinds of distributions occur repeatedly in the medical research literature. Any time a set of values is obtained from a sample, each value may be plotted against the number or proportion of times it occurs in a graph having the values on the horizontal axis and the counts or proportions on the vertical axis. Such a graph is one way in which a frequency distribution may be displayed, since a frequency distribution is simply a table, chart or graph which pairs each different value with the number or proportion of times it occurs.

It turns out that some distributions are particularly important because they naturally occur frequently in clinical situations. Some of the most important distributions are Gaussian, Binomial and Poisson distributions.

6.1 Gaussian

The Gaussian distribution or bell curve (also known as the normal distribution) is by far the most important, because it occurs so frequently and is the basis for the parametric statistical tests. When values are obtained by summing over a number of random outcomes the sum tends to assume a Gaussian distribution. The Gaussian distribution gives a precise mathematical formulation to the "law of errors". The idea being that when measurements are made most of the errors will be small and close to the actual value while there will be some measurements that will have greater error but as the size of the errors of measurement increase the number of such errors decreases (see Figure).

[Insert Gaussian Figure about here]

6.2 Binomial

The family of binomial distributions is relevant whenever independent trials occur which can be categorized as having two possible outcomes and known probabilities are associated with each of the outcomes. For example, without knowing the correct answers for true-false questions there would be equal probabilities of each answer being right or wrong. The binomial distribution would describe the probabilities associated with various numbers of right and wrong answers on such a true-false test. As another example, assume that we want to determine the probability of that a genetically based defect will occur in the children of families having various sizes, given the presence of the characteristic in one of the parents. The binomial distribution would describe the probabilities that any number of children from each family would be expected to inherit the defect. These are both examples of dichotomous variables, which when graphed over multiple trials can be expected to assume a binomial distribution (see Figure).

[Insert Binomial Figure about here]

6.3 Poisson

Another important set of discrete distributions is the Poisson distribution. It is useful to think of the Poisson distribution as a special case of the binomial distribution, where the number of trials is very large and the probability is very small. More specifically, the Poisson is often used to model situations where the number of trials is indefinitely large, but the probability of a particular event at each trial approaches zero. The number of bacteria on a petri plate can be modeled as a Poisson distribution. Tiny areas on the plate can be viewed as trials, and a bacterium may or may not occur in such an area. The probability of a bacterium being within any given area is very small, but there are a very large number of such areas on the plate. A similar case would be encountered when counting the number of red cells that fall in a square on a hemocytometer grid, looking at the distribution of the number of individuals in America killed by lightening strikes in one year, or the occurrence of HIV associated needle sticks in US hospitals each year. The Poisson approximation to the binomial distribution is good enough to be useful even when N is only moderately large (say N > 50) and p only relatively small (p < .2) (Hayes, 1981) (see Figure).

[Insert Poisson Figure about here]

Lesson 2: Inferential Statistics
+ Review of Lesson 1

Why Important?

Inferential statistics is that branch of statistics that has to do with the almost magical ability of statisticians to be able to generalize from small samples to large populations with known probabilities of error. Without inferential statistics the biomedical researcher would be limited to making statements that would only summarize sample data from events that have already occurred. Of course, you want to be able to go far beyond statements about patients that you have data on, to patients you do not yet have data on, or patients for which you do not have complete data -- the much broader populations of patients. This is the job of inferential statistics.

1. Sampling: Distribution of Means

Why Important?

The key concept for inferential statistics, the crucial concept, the one that bridges the gap between the sample and the population, is the sampling distribution. Without it the great overwhelming majority of statistical inferential tests you will encounter in the literature could not be done. It forms the basis of all parametric statistical inferential tests, that is, all inferential tests designed for normally distributed interval or ratio variables. How can it do that? Just as you can make probability statements about sample values when you know the shape of the population distribution (e.g., the probability of an IQ > 130 is approximately .023), when you know the sampling distribution of the mean you can make probability statements about means of samples (e.g., the probability that a sample of 25 will have a mean IQ > 106 is approximately .023).

Inferential statistics is all about being able to make these latter kinds of statements about sample values. More importantly it is about being able to make similar kinds of probability statements about population values, particularly population means.

1.1 Sampling Distribution of Means

A sampling distribution of means is exactly what the name implies. It is just a distribution consisting of sample means obtained by taking a very large number of successive samples from the same population and computing each sample's mean. It is only different from other distributions in the respect that other distributions typically consist of the values contained in a single sample. In a sampling distribution the values contained in the distribution are means. Each mean is computed from a different sample from the same population. For example, a sampling distribution of mean diastolic blood pressures would have mean diastolic blood pressures rather than individual diastolic blood pressures on the horizontal axis (see BP Sampling Distribution Figure). A sampling distribution of mean IQ's would have mean IQ's rather than individual IQ's on the horizontal axis (see IQ Sampling Distribution Figure).

[Insert BP Sampling Distribution Figure about here]

[Insert IQ Sampling Distribution Figure about here]

1.2 Properties of Sampling Distribution of Means

A natural question to ask about sampling distributions of means once you have grasped the concept is, "What do you know about these distributions"? You know a great deal about them and that's part of what makes them so important. In particular, you know that:

the mean of the sampling distribution equals the mean of the population,
the standard deviation of the sampling distribution equals the standard deviation of the population divided by the square root of the sample size,
the sampling distribution is approximately normal,
the approximation to the normal improves as the sample size increases,
all the above are true regardless of the shape of the population distribution.

The last statement is particularly profound. It says that no matter what kind of distribution the variable has in the population, whether it is normal, or flat, or peaked, or Poisson, or binomial or wavy, or whatever, the sampling distribution will not only always be approximately normal, it will have all the rest of the properties. All of these have been mathematically proven to be true. These facts are so important they have been given a special name. Collectively, they comprise what is known to statisticians as The Central Limit Theorem. Given sample data, it allows us to make approximate statements about population means obtained from any population.

[Insert CLT Sampling Distribution Figure about here]

1.3 Standard Error

Now, let's suppose you obtain a sample of 25 IQ's from some population and it turns out the sample IQ mean is, say 107.8. Another natural question to ask would be, "How accurate is that sample mean of 107.8"? In other words, how accurate is it as an estimate of the population mean? The standard error provides a precise way to answer this sort of question. The standard error, more precisely, the standard error of the mean, is just another name for the standard deviation of the sampling distribution. So it measures the variability of the sampling distribution. But, the sampling distribution consists of sample means. So, it measures the variability of the sample means about the population mean since the mean of the sampling distribution is the population mean.

So, just as the standard deviation of a sample tells you about the average distance the values in the sample are from the mean of the sample, the standard error estimates from sample data the average distance sample means are from the population mean. In other words, it gives you a measure of the amount of error to be expected in a sample mean as an estimate of the population mean. In the IQ example above the standard error was 3. Since the sampling distribution is approximately normal this tells you then that approximately 68% of the time the sample mean would be within about 3 IQ points of the population mean, likewise about 95% of the time it would be within about 6 IQ points of the population mean, and about 99% of the time it would be within about 9 IQ points of the population mean.

Biostatistics for the Clinician

Copyright (C) 1996

Oser, G., Johnson, C.W. & Abedor, A. J.

Table of Contents

Goal of Experimental Method:

To prove that the treatment and only the treatment caused the effect.

Usefulness of Biostatistics:

Place limits on effects of chance in small sample experiments (Alpha or False Positives).

Determine sample size needed to detect clinically relevant effects (Beta or False Negatives, 1-Power).

Control for effects of one or more confounding variables.

Assist in developing alternative designs for human experiments.

Use maximum information content measurement.

Measure intangibles such as intelligence, depression, and well-being.

Four Types of Scales

Nominal Scales

[Insert Figure 1 about here]

Ordinal Scales

[Insert Figure 1 about here]

Interval Scales

[Insert Figure 1 about here]

Ratio Scales

[Insert Figure 1 about here]

[Insert Figure 1 about here]

[Insert Figure 1 about here]

[Insert Figure 1 about here]

[Insert Figure 2.3 about here]

[Insert Figure 2.3 about here]

[Insert Figure 2.3 about here]

[Insert Figure 2.3 about here]

[Insert Figure 2.5 about here]

[Insert Figure SD Formula Figure about here]

[Insert Interquartile Range & Range Figure about here]

[Insert Interquartile Range & Range Figure about here]

[Insert Box and Whisker plot about here]

[Insert Box and Whisker plot about here]

[Insert Ranges Figure about here]

[Insert Box and Whisker plot about here]

[Insert Box and Whisker plot about here]

[Insert Box and Whisker plot about here]

[Insert z-score formula here]

[Insert z-score graph about here]

[Insert standard normal z-score graph about here]

[Insert Gaussian Figure about here]

[Insert Binomial Figure about here]

[Insert Poisson Figure about here]

[Insert BP Sampling Distribution Figure about here]

[Insert IQ Sampling Distribution Figure about here]

[Insert CLT Sampling Distribution Figure about here]

[Insert Standard Error Figure about here]

Place limits on effects of chance in small sample experiments
(Alpha or False Positives).

Determine sample size needed to detect clinically relevant effects
(Beta or False Negatives, 1-Power).