Oser, G., Johnson, C.W. & Abedor, A. J.
Lesson 3: Clinical Decision Making in a Multi-variable Environment
+ Review of Lesson 2
A critical distinction between the scientific approach and other methods
of inquiry lies in the emphasis placed on real world validation. Where
research has shown that particular approaches are appropriate and effective
for specific applications, those approaches are to be selected by the
practicing physician. Physicians are called upon to defend decisions
on the basis of empirical research evidence. Consequently, the clinician
must be an intelligent consumer of medical research outcomes, able
to understand, interpret, critically evaluate and apply valid results
of the latest medical research.
There is an old story sited in Norman & Streiner (1986) about three little
French boys who happened to see a man and woman naked on a bed in a basement
apartment. The four year old said, "Look, that man and woman are wrestling!".
The five year old said, "You silly, they're not wrestling they're making
love!" The six year old said, "Yes! And very poorly too!!" The four year
did not understand. The five year old had achieved a conceptual
understanding. The six year old understood it well enough, presumably
without actual experience, to be a critical evaluator. The intent of the
following instruction is to make you a critical evaluator of medical
research, a "six year old biostatistician". So thats the challange of
these three lessons - to turn you into a six year old statistician.
The instruction provides a brief overview of the most frequently
used and most important descriptive and inferential biostatistical methods as
they are relevant for the clinician. The goal is that the student will
appreciate how the application of the theories of measurement, statistical
inference, and decision trees contributes to better clinical decisions
and
ultimately to improved patient care and outcomes.
Conceptual understanding,
rather than computational ability, will be the focus. Development of an
adequate vocabulary, an examination of fundamental principles and a survey
of the widely used procedures or tools to extract information from data,
will form a basis for fruitful collaboration with a professional
biostatistician when appropriate.
The object of this then is to help you understand the tools and procedures
that are used in statistics and to when you really want to do statistics
ask a bioistatistician to help you with the statistics. So the objective
here is not to make you into biostatisticians, but into appreciators of
what biostatistics can contribute to the appropriate care of your patients
and to seek the appropriate help when necessary.
The needs of practicing physicians,
not the skills to be a biostatistician or for sophisticated medical research,
will inform the presentations.
In the three lessons here we're going to concentrate in the first hour on
summary measures of data. That is, how you take data from a population or
a group and somehow express it in a way in one or two ways like maybe a mean
or a standard deviation,
(Show transparency of lesson outlines)
We're going to look at several ways of doing that. Mainly defs of vars, what central tendency is,
variability how it affects information... We're going to take a peek at exploratory data analysis
which is an area that would be very useful if you were going to be collecting data yourself. And,
lastly we'll be looking at distributions of information, Gaussian, binomial and Poisson
distributions.
(Show BIOSTATISTICS Transparency)
Now if you had to boil it all down the goal of biostatistics is very straightforward. And , that is to
prove that the treatment and only the treatment caused the effect. So that is really the job of
biostatistics When you have a complex being like a human animal and you're performing
experiments on that human animal you try to control as many variables as you can. You may take
a sample that has only one gender. You may take a sample that has only a narrow range of ages.
But, still there are many other kinds of things that might affect the outcomes that you'd see if you
apply a treatment to that individual. So that when you boil it all down what biostatistics is trying
to do is to eliminate or to minimize anything that might interfere with your being able to prove
that the treatment and only the treatment caused the effect.
Motivation:
Now, biostatistics is useful in some areas but not in others. And as a little expression of that, what
I'd like us to do is to form groups of 6 people. .... One spokesman.. One question... Identify
groups... spokesman for group... OK.
Question: I come to you and I say I want to know whether there are more women or men in this
room. How do I calculate that?
And, the next question is :
If you find a difference in the number of men and women is the difference statistically significant
and if so what test would you use to try to establish that?
Reports from groups:
Group A: says subset pops and add up numbers.
Group B: Count em ... that's the answer.
Second part of question.. Once counted how know statistically signficant. Say 150 present and
say you find out there are 80 women and 70 males. Is that a statistically significant difference?
Group C: One data point - NS - no test.
Group D: Use t-test
Group E: Striped shirt...
Anybody else.. Say it was 70 to 60
Group F: Chi-square
The answer is, "No Test" is useful there and the reason is you have all the data.
Back to the original proposition, take a small group of people here, and for that small group
measure the ratio, and then try to extrapolate to the room. Then you need statistics because your
inferring from that small group what the room is like. When you count the entire population,
statistics doesn't mean anything. Statistics is only necessary when you're trying to infer from a
small group to a larger group. So in that case statistics is worth zip, zero ... and you count. So
what I'd like you to think about here is when you think biostatistics don't think complicated.
Don't think the names of the tests t-test, chi-squared and so on. Try to approach it from a fresh
point of view of a person who's kind of naive, walks in and says. I think I'll try to figure out how
to do this sort of thing. That's the way you'll develop a conceptual feel for this sort of thing
rather than these little memories of times past (usually not to good a memory) when you bounced
against some biostatistics course.
Second problem. We perform a test of fitness on people. Want to examine whether it affects their
longevity. So we're trying to prove a hypothesis that longevity is associated with the treatment
fitness. So we're try to say that treatment caused that effect. We perform this on 300000 males
and we follow them for 50 years to see how long they lived. So we measure their fitness of these
300000 males, we track them out for 50 years and see when they die. I'll give you the results.
Results are: For those who are fit we find that the mean longevity is 75 years and for those that
are not fit it is 70 years.
Questions:
Is that result statistically significant?
What test would you use?
Group A: t test didn't calculate
Group B: Clueless
Don't worry about that
Volunteer: Can't control...smoking, eating, treatment defs... good questions,
Picture switches to George
Any other suggestions: Control variability by looking at more homogeneous population.
Outcome measure has variability to it ... take into account
Experimental design question.
Student t-test ...Why does it make a difference
The last time you read the New England Journal of Medicine how many studies did you read that
had 300000 males in the trial?
Probably some that have hundreds of thousands, or tens of thousands. But there are probably no
medical studies that involve that many. What happens is when you get very large numbers like
that the variability washes out, because you have in those big groups, all kinds of people and the
chances that you're going to get a few oddballs in one group to change the result is very very
small. So for large numbers then you don't need statistical tests. If you're able to go out and get
300000 volunteers, and you all talked about experimental design.. You know measuring the
treatment ... those are all good things ... making sure that people comply and other kinds of things
with it those are very important aspects of experimental design. But once you get big numbers
you don't need statistics. Because the numbers are there and they're are overwhelming and the
variability is taken into account by the fact that you have great big groups.
OK same problem as number 2. This time we have 50 males. We get the same result. What tests
should we use and is statistically significant?
Let me be more precise:
Does increased fitness increase longevity in the study? That's my question.
OK
Group A: Clueless...come on now
Group B: May be too small too find the difference ... need way to find if the group is large
enough to measure effect you're looking for... Important we'll look at it.
Group C volunteer: didn't give us all the info.. Need variability more overlap vs less overlap
important
What kind of test if I did give you measure of variability? .....
Use t-test probably. Two groups, comparing 2 groups, variation in means You don't have to
worry about that sort of thing really. What have to realize is that variability is a problem here and
it is particularly a problem here for small groups.
(How would we know discussion)
Hoping that some have had some stat in the past..
Last question: Same results as last. Applied test. Know variability and I've found that it is
statistically significant. Is that useful to you as physicians and what measure would you apply or
what do you call the measure of usefullness to physicians when you look at results like that? So
I've told you the statistics are significant. Does that mean something to you? And if it does what
do you call that kind of analysis you go through. In other words, I've combed the journal articles
and I've only picked out the journal articles that show these are statistically significant so you
know by that the results were probably not due to chance. That's one of the mistakes that
biostatistics prevents you from making is that the result you got was just by chance having a
funny group or something like that, OK. So another result thats statistically significant, is that the
end of your analysis? Do you care about anything else? (Question reproducible ...yes)
What kinds of questions do you ask as a physician once you results from a statistician that have
been validated as statistically significant?
Group A: Is it relevant, can we implement in making patients lives better? Is it feasible.
Another: Can you apply it to your population?
So in this case we have 50 males we've left out one group in the population anyway. So the
question is should we go back and look at another group or something like that so we can apply it
to the population that may be in our practice.
Another suggestion: Measure of fitness? Return to basal rate after 5 min step test?
Other suggestions for what a physician should look at: am I going to be applying to patients as a
whole or individual patients because if I apply to individuals it's not appropriate? Can you apply
results that some statistical importance to individual patients?
Does anyone have an alternative?
Modern medicine is based upon this leap of faith. That the best evidence you have is
epidemiological evidence. And that the one distraction physicians have to be careful about is that
physicians have to be careful because they've seen results in 10 patients of their own and they
tend to neglect the population kinds of statistics.
Now appropriateness is certainly a question. Can you apply everything you know to every
person. You'll have to answer that based upon individual characteristics. But knowing nothing
else about a person other than you had this result I would suggest that medicine as a discipline
would strongly suggest that you have no alternative but to use this. Otherwise you're distracted
by all kinds of bias of patients that you have your thoughts about , etc.
Quality of tests, significance level of test.
Is it clinically significant? And, does it make sense to my patients? Doe s this result worth
whatever it requires to be fit throughout my life. You look at things like smoking cessation.
There are many countries around the world where smoking cessation is nothing like it is in the
United states. And they have made some decisions that their lifestyle is more important and that
includes smoking than is heart and lung disease, cost to the general population and all the terrible
things that smoking causes. Now, I wouldn't suggest thats a very wise clinical decision. The
point is though you have to make a decision based upon a broader array of things than the
numbers that the statistician gives you. If for example you have treatment in which there is a
small outcome, but the sample size is largel enough for you to determine the statistical
significance it may not be worth it. The change may be so small in your patients lives, that it may
not be worth the treatment aspects of it. So fundamentally what you should be looking at is go to
the literature, find out if somebody has applied the appropriate tests to be statistically significant
and if they are thats where you're job begins then. To say whether or not this is clinically
significant or appropriate for my patients. So collaboration with a biostatistician if you're doing
the analysis but in the final go round it is the physician who determines whether or not you apply
this result to your particular patients taking into account all the individual aspects that you know
about them and whether or not the result is worth it.
OK were going to go back and look at what biostatistics is going to do for us here. It's going to
limit the effects of chance and thats the main thing that it does. We talked about this with small
samples in particular so its going to take of things like false positives for example we might get in
a small sample.
It helps us determine sample size...someone mentioned you have to have a big enough sample to
find effects. So its going to help you figure out whether the sample size is big enough to help you
detect the results that you think are clinically important. To give you an example what if you have
a new formula that increases the weight per week of newborns and you want to know whether
this particular formula is a useful formula or not in increasing the weights of newborns. Well you
have to answer the question to the statistician. First of all is what do you consider a useful
increase in weight per week as a clinician. So you're going to be asked questions by a statistician
if they're going to help you with these things as to what you think is clinically important. In order
to design the experiment to have enough people in the sample to find small effects, you'll have to
define what is the smallest useful effect to you.
Is 5 years useful in longevity? Is 1 pound more than formula X important in this formula for
newborns? We're now going to try and control for confounding variables, gender, other kinds of
things that might confound results. We're going to try and design alternatives ways of measuring
humans because humans are not laboratory chemicals the controls and things are very difficult in
humans and I hope that in your epidemiology lectures you'll have talked about other ways than
randomized trials. Because randomized trials have tremendous flaws, there a perfect experimental
design but humans don't always accept to be in a random trial. So you have special kinds of
people who say I'm going to be in a random trial and others don't. So you already have a special
kind of population that's skewed a bit. The question is whether you can design alternative ways
of measuring effects without going to the extreme of a randomized clinical trial.
Use the maximum information content. Measuring intangibles. Measures like that like IQ general
well-being.
Goal of Experimental Method:
To prove that the treatment and only the treatment caused the effect.
Usefulness of Biostatistics:
Place limits on effects of chance in small sample experiments
(Alpha or False Positives).
Determine sample size needed to detect clinically relevant effects
(Beta or False Negatives, 1-Power).
Control for effects of one or more confounding variables.
Assist in developing alternative designs for human experiments.
Use maximum information content measurement.
Measure intangibles such as intelligence, depression, and well-being.
Now, you might ask, why do I need to know about levels of measure
and types of variables?
Good Question! You need to know, in order to judge whether appropriate
statistical techniques have been used, and consequently whether conclusions
are valid. In other words, you can't tell whether the results in a
particular medical research study are credible unless you know what types
of measuring scales have been used in obtaining the data.
(Show levels of measure transparency)
Now well move into some more familiar territory. When you start to measure the impact of a
treatment you have to ask yourself what variables am I dealing with here? What are my choices
of variables? On the left hand side you see that there are two classifications of variables. There
are qualitative variables and there are quantitative variables. Now one isn't necessarily better than
the other. One we're a little more used to doing stuff with. With quantiative variables for
example, we can do averages and things like that, we know there are numbers, you can add them
up and divide and things like that. Its a little trickier some times with qualitative variables. But in
human experiments there's no way you can get around it. There are two classes those - nominal
and ordinal.
Four Types of Scales
There are four types of measurement scales (nominal, ordinal, interval and
ratio; see Figure). Each of the four scales, respectively, provides
higher information levels about variables measured with the scale.
Nominal Scales
What does nominal comes from? Name. So nominal comes from name and the important thing is
there is no measure of distance between those. You're either married or not married, gender is
determined, yes or no, and so there is no question of how far apart in a quantitative sense those
categories are so they're just names.
Nominal scales name and that is all that they do. Some examples of nominal
scales are sex (male, female), race (black, hispanic, oriental, white, other),
political party (democrat, republican, other), blood type (A, B, AB, O),
and pregnancy status (pregnant, not pregnant; see Figure).
[Insert Figure 1 about here]
Ordinal Scales
In the next group we have a little more sophistication than naming. What does ordinal imply?
Ranking. So there in some order. Higher and lower. We don't rank gender as higher and lower.
But we do rank stages of cancer for example as higher and lower. We have pain ratings that are
higher and lower. So were now a t a more sophisticated level of measure. A finer tuned level of
measurement. But we've now added only one element. We know that something is higher than
something or lower than something or more painful than something or less painful than
something.
Ordinal scales both name and order. Some examples of ordinal scales are
rankings (e.g., football top 20 teams, pop music top 40 songs), order
of finish in a race (first, second, third, etc.), cancer stage (stage I,
stage II, stage III), and hypertension categories (mild, moderate, severe;
see Figure).
In the next group we have a quantitative group and people divide these into interval and ratio
variables. And, I wouldn't worry about the division so much.
[Insert Figure 1 about here]
Interval Scales
What about interval variables?
Why is that called an interval variable like temperature for example? What is the difference
between 36 degrees and 37 degrees compared to the difference between 40 degrees and 41
degrees? Is the difference the same? So the intervals are the same on the scale. So now we know
not only is one higher than the other but that the distances or the intervals on the scales are the
same. So again it's a higher level of information that we have.
Interval scales name, order and have the property that equal intervals
in the numbers on the scale represent equal quantities of the variables
being measured. Some examples of interval scales are fahrenheit and celsius
temperature, SAT, GRE and MAT scores, and IQ scores. The zero on an
interval scale is arbitrary. On the celsius scale, 0 is the freezing point
of water. On the fahrenheit scale, 0 is 32 degrees below the freezing
point of water (see Figure).
[Insert Figure 1 about here]
Ratio Scales
Ratio scales have all the properties of interval scales plus a meaningful,
absolute zero. That is, zero represents the total absence of the variable
being measured. Some examples of ratio scales are length measures in
the english or metric systems, time measures in seconds, minutes, hours, etc.,
blood pressure measured in millmeters of mercury, age, and our common
measures of mass, weight, and volume (see Figure).
[Insert Figure 1 about here]
They are called ratio scales because ratios are
meaningful with this type of scale. It makes sense to say 100 feet
is twice as long as 50 feet because length measured in feet is a ratio
scale. Likewise it makes sense to say a Kelvin temperature of 100 is
twice as hot as a Kelvin temperture of 50 because it represents twice
as much thermal energy (unlike fahrenheit temperatures of 100 and 50).
[Insert Figure 1 about here]
Nominal and ordinal scales are called qualitative measures. Interval and
ratio scales are called quantitative measures (see Figure).
[Insert Figure 1 about here]
With the ratio variables the only difference that we have is there is a true zero so that you can
actually talk about ratios. That is a person's lung capacity can be twice somebody else's lung
capacity. In order to do that you have to have a true zero to develop ratios like that. But really
for all statistical purposes it makes no difference. The important thing is If you measures like
these interval measures you should keep them at the finest level of measure you have. Don't put
say temperature measures into categories like the temperature was less than this or greater than
that or in another group less than this and greater than that and so on. Don't cluster or group
those and make them into ordinal variables. If you dothen you're throwing away information. So
if you have information at the interval level record it at the interval level. If its at the ordinal level
record it at that level. And of course if you're at the nominal level you're stuck with recording it
at that level. So never cluster your variables together when you begin your experiments in a way
that you lose information.
Now, when statistical analyses are applied,
the statistics must take into account the nature of the underlying
measurment scale, because there are fundamental differences in the
types of information imparted by the different scales (see Figure).
Consequently, nominal and ordinal scales must be analyzed using what are
called non-parametric or distribution free statistics. On the other hand,
interval and ratio scales are analyzed using parametric statistics.
Parametric statistics typically require that the interval or ratio variables
have distributions shaped like bell (normal) curves, a reasonable assumption
for many of the variables frequently encountered in medical practice.
(Show kinds of variables transparency)
Dependent variables and independent variables.
Basically what your trying to do generally is say that we're looking at an outcome like gastric
ulcers and you determine other variables that may or may not affect that out come so the
independent variables are the ones you manipulate, the treatments that you manipulate, and the
dependent ones are the measures that are the outcomes of that.
[Insert Figure 1 about here]
There are a couple little C.R.A.P. detectors you can use here you can look at.
CRAP detectors to find circular reasoning and other kinds of stuff.
Dependent variables should be sensible. Ideally, they should be
clinically important, but also related to the independent variable.
In general, the amount of information increases as one goes from nominal
to ratio. Classifying good ratio measures into large categories is akin to
throwing away data.
Why do you need to know about measures of central tendency? You
need to be able to understand summaries of large amounts of data that
use simple measures to best represent the location of the data as a whole.
Collectively, such measures or values are referred to as measures of
central tendency. Measures of central tendency are ubiquitous in the medical
research literature. The most frequently used measures of central tendency
are the mean, median and mode.
(Show Central Tendency transparency)
We do try to display information in some graphical form. And, this is a display of physicians
salaries in 1999 after all the health plans have come forward and... No this was old data actually.
But, the point of this slide is to show you that there are various ways to represent a distribution of
data. The mode is the most frequent, the median has equal numbers above and below it and the
mean is the average value. And, if the distribution is a nice symmetric distribution, all three of
those collapse into one.
The most frequently used measure of central tendency is the mean.
The mean, or more formally, the arithmetic mean, is simply the average
of the group. That is, the mean is obtained by summing all the numbers
for the subjects in the group and dividing by the number of subjects
in the group. The mean is useful only for quantitative variables (see
Figure).
[Insert Figure 2.3 about here]
The median is the middle score. That is, the median is the score for
which half the subjects have lower scores and half have higher scores.
Another way to say this is that the median is the score at the fiftieth
percentile in the distribution of scores (see Figure).
[Insert Figure 2.3 about here]
The mode is the most frequent score. Another way to say this is that
the mode is the score that occurs most often (see Figure)..
[Insert Figure 2.3 about here]
In a symmetric distribution with one mode like the normal distribution
the mean, median and mode all have the same value. But, in a non-symmetric
distribution their values will be different. In general, as the distribution
becomes more lopsided the mean and the median move away from the
mode. With extremely skewed distributions the mean will be
somewhat misleading as a measure of central tendency, because it is
heavily influenced by extreme scores. So for example, if we take a
distribution of doctor's incomes, some doctors make huge sums of money, and
the median or the mode is more representative of doctor's incomes as a
whole than the mean, because the very high incomes of some
doctors inflates the average, making it less representative of
doctors as a whole (see Figure 2.3).
[Insert Figure 2.3 about here]
Why do you need to know about measures of variability? You also
need to be able to understand summaries of large amounts of data that
use simple measures to best represent the variability in the data.
Measures of variability also occur very frequently in the medical research
literature. If all data values are the same, then, of course, there is
zero variability. If all the values lie very close to each other there is
little variability. If the numbers are spread out all over the place
there is more variability. Again there are many measures of variability.
Some of the most frequently used measures of variability are the standard
deviation, interquartile range and the range.
(Show standard deviation formula transparency)
We can measure the spread of information or data by looking at the standard deviation and its just
the mean value which we extract from the information. We sum the individual values subtracted
from the mean, square it, divide it by the number of values and take the square root. The reason
for that is pretty clear. The reaon that we subtract and square is pretty clear. Whether the value is
above the mean or below the mean it comes out the same when we square it. So positive and
negative makes no difference here. When you divide by the number of values to get an average
we square root this whole thing here because we square it up here, to get back to the original
measures. So by squaring to get rid of the negative and postive values we get squared measures
and we square root it to get back to the original kinds of measures like feet cubic inches or
whatever else it might be.
The standard deviation can be thought of as the average distance
that values are from the mean of the distribution (see Figure).. This
means that you must be able to compute a meaningful mean to be able to
compute a standard deviation. Consequently, computation of the
standard deviation requires interval or ratio variables. In a distribution
having a bell (normal) curve, approximately 68% of the values lie within
1 standard deviation of the mean. On the other hand, approximately
2.1% of the values lie in each tail of the distribution beyond 2 standard
deviations from the mean (see Figure)..
[Insert Figure 2.5 about here]
[Insert Figure SD Formula Figure about here]
Remember the median was the point in
the distribution where 50% of the sample were below and 50% are
above. Quartiles can be defined at the 25th percentile, the 50th
percentile, the 75th percentile and the 100th percentile. The
interquartile range, then, from the 25th percentile to the 75th
percentile, includes 50% of the values in the sample. The interquartile
range is the distance between the 25th percentile and the 75th percentile.
The interquartile range is a measure of variability that can be
appropriately applied with ordinal variables and therefore may
be used especially in conjunction with non-parametric statistics (see Figure)..
[Insert Interquartile Range & Range Figure about here]
(Show EDA transparency)
Another way to display data thats been proposed by exploratory data analsys is to rank the data
from low to high then measure the median and then the quartile values that is the values between
which one half of the data resides. The rest of the data is out in the wings here. And here you
find the interquartile range which is the range from the lower to the upper quartile. And, the range
is the extreme values (max -min)
The range is simply the difference between the highest and lowest
value in the sample (see Figure)..
[Insert Interquartile Range & Range Figure about here]
Why do you need to know about exploratory data analysis (EDA)?
The purpose of EDA is to provide a simple way
to obtain a big picture look at the data and a quick way to check
data for mistakes to prevent contamination of subsequent analyses.
Exploratory data analysis can be thought of as a preliminary to a more
in depth analysis of the data (see Figure)..
A primary tool in exploratory data analysis is the box plot (see figure).
What does a box plot tell you? You can, for example, determine the central
tendency, the variability, the quartiles, and the skewness for your data.
You can quickly visually compare data from multiple groups.
A small rectangular box is drawn with a line representing the median,
while the top and bottom of the box represent the 75th and 25th percentiles,
respectively. If the median is not in the middle of the box the distribution
is skewed. If the median is closer to the bottom, the distribution is
positively skewed. If the median is closer to the top, the distribution
is negatively skewed. Extreme values and outliers are often represented
with asterisks and circles (see Figure)..
[Insert Box and Whisker plot about here]
The top and bottom edges of the box plot are referred to as hinges or Tukey's
hinges (see Figure)..
[Insert Box and Whisker plot about here]
?
(see Figure).
[Insert Ranges Figure about here]
[Insert Box and Whisker plot about here]
Outliers and extreme values are often represented with circles and
asterisks, respectively. Outliers are values that lie from 1.5 to 3
box lengths (the box length represents the interquartile range) outside
the hinges. Extreme values lie more than 3 box lengths outside the hinges.
In a box and whisker plot the actual values of the scores will typically
lie adjacent to the outlier and extreme value symbols to facilitate
examination and interpretation of the data (see Figure)..
[Insert Box and Whisker plot about here]
Box and whisker plots represent more completely the range of values
in the data by extending vertical lines to the largest and smallest values
that are not outliers, extending short horizontal segments from these
lines to make more apparent the values beyond which outliers begin
(see Figure).
[Insert Box and Whisker plot about here]
(Show infant mortality transparency)
I have an example here which is a plot of infant mortality in Houston Texas in 1978 and this show
two congressional districts 15 & 18. And what you see here is the median (about 20 deaths per
thousand live births). And here you see some census tracks (number of live births in the census
tracks) (on the left) and these are the number of live births (on the right) in those census tracks
and this is the ratio of those infant mortalities. So there were a hundred deaths per thousand live
births in census track 126 (apparently 176 live births). Here in Houston Texas we have infant
mortality rates that approach those in Bangladesh and other undeveloped countries. And this kind
of diagram shows you those extremes very dramatically.
Why do you need understand standard scores or z-scores?
Again, they appear frequently in the medical literature.
A natural question to ask about a given value from a sample is,
"How many standard deviations is it from the mean?". The z-score
answers the question. The question
is important because it addresses not only the value itself, but also
the relative position of the value. For example, if the value is 3
standard deviations above the mean you know it's three times the
average distance above the mean and represents one of
the higher scores in the distribution. On the other hand, if the
value is one standard deviation below the mean then you know it is
on the low end of the midrange of the values from the sample.
But, there is much more that is important about z-scores.
For every value from a sample, a corresonding z-score can be
computed. The z-score is simple the signed distance the sample value
is from the mean. There is a simple formula for computing z-scores
(see Figure):
[Show z-score formula transparency here]
And lastly we want to talk about how we take a whole bunch of data...If I have a mean value of
originally you know we talked about the mean value of how long people are going to live and we
had to somehow be able to take that value and compare to other values from other experments.
How do we collapse all that stuff? Well we have a way of normalizing or collapsing it and
generally what you do is you get something like this mean value and subtract from each one of the
values and divide it by its standard deviation and that then parametizes this distribution so this
distribution has a mean of 0 all the time and a standard deviation of 1 an so you can build tables
for it. So no matter what your data looks like, no matter what the mean value is, you can reduce
it to a table if you reformulate your data like this. So we can take all kinds of experiments then
and build tables for them because we can normalize it or reduce it by doing things like forming a z
value.
Ok that's it for today tomorrow we'll start with the other concepts in biostatistics.
Because every sample value has a correponding z-score it is
possible then to graph the distribution of z-scores for every
sample. The z-score distributions share a number of common
properties that it is valuable to know. The mean of the z-scores
is always 0. The standard deviation of the z-scores is always
1. The graph of the z-score distribution always has the same shape as the
original distribution of sample values. The sum of the squared z-scores
is always equal to the number of z-score values. Furthermore, z-scores
above 0 represent sample values above the mean, while z-scores below
0 represent sample values below the mean (see Figure).
[Insert z-score graph about here]
If the sample values have a Gaussian (normal) distribution then
the z-scores will also have a Gaussian distribution. The distribution
of z-scores having a Gaussian distribution has a special name because
of its fundamental importance in statistics. It is called the standard
normal distribution. All Gaussian or normal distributions can be
transformed using the z-score formula to the standard normal distribution.
Statisticians know a great deal about the standard
normal distribution. Consequently, they also know a great deal about
the entire family of normal distributions. All of the previous properties
of z-score distributions hold for the standard normal distribution.
But, in addition, probability values for all sample values are known
and tabled. So, for example, it is known that approximately 68% of
values lie within one standard deviation of the mean. Approximately
95% of values lie with 2 standard deviations of the mean. Approximately
2.1% of values lie below 2 standard deviations below the mean.
Approximately 2.1% of values lie above 2 standard deviations above the
mean (see Figure).
[Insert standard normal z-score graph about here]
Why do you need to know about distributions? Again the
primary answer is that various kinds of distributions occur
repeatedly in the medical research literature.
Any time a set of values is obtained from a sample, each value
may be plotted against the number or proportion of times it occurs
in a graph having the values on the horizontal axis and the counts
or proportions on the vertical axis. Such a graph is one way
in which a frequency distribution may be displayed, since a frequency
distribution is simply a table, chart or graph which pairs each
different value with the number or proportion of times it occurs.
It turns out that some distributions are particularly important because
they naturally occur frequently in clinical situations. Some of the
most important distributions are Gaussian, Binomial and Poisson
distributions.
The Gaussian distribution or bell curve (also known as the normal
distribution) is by far the most important, because it
occurs so frequently and is the basis for the parametric statistical tests.
When values are obtained by summing over a number of random outcomes
the sum tends to assume a Gaussian distribution. The Gaussian
distribution gives a precise mathematical formulation to the
"law of errors". The idea being that when measurements are made
most of the errors will be small and close to the actual value while
there will be some measurements that will have greater error but
as the size of the errors of measurement increase the number of
such errors decreases (see Figure).
[Insert Gaussian Figure about here]
The family of binomial distributions is relevant whenever independent trials
occur which can be categorized as having two possible outcomes and known
probabilities are associated with each of the outcomes. For example,
without knowing the correct answers for true-false questions there would be
equal probabilities of each answer being right or wrong. The binomial
distribution would describe the probabilities associated with various
numbers of right and wrong answers on such a true-false test.
As another example, assume that we want to determine the probability of
that a genetically based defect will occur in the children of families having
various sizes, given the presence of the characteristic in one of the parents.
The binomial distribution would describe the probabilities that any number
of children from each family would be expected to inherit the defect.
These are both examples of dichotomous variables, which when graphed
over multiple trials can be expected to assume a binomial distribution
(see Figure).
[Insert Binomial Figure about here]
Another important set of discrete distributions is the Poisson distribution.
It is useful to think of the Poisson distribution as a special case
of the binomial distribution, where the number of trials is very large and the
probability is very small. More specifically, the Poisson is often used to
model situations where the number of trials is indefinitely large, but the
probability of a particular event at each trial approaches zero. The number
of bacteria on a petri plate can be modeled as a Poisson distribution. Tiny
areas on the plate can be viewed as trials, and a bacterium may or may
not occur in such an area. The probability of a bacterium being within
any given area is very small, but there are a very large number of such
areas on the plate. A similar case would be encountered
when counting the number of red cells that fall in a square on a
hemocytometer grid, looking at the distribution of the number of
individuals in America killed by lightening strikes in one year, or
the occurrence of HIV associated needle sticks in US hospitals each
year. The Poisson approximation to the binomial distribution is good
enough to be useful even when N is only moderately large (say N > 50) and
p only relatively small (p < .2) (Hayes, 1981) (see Figure).
[Insert Poisson Figure about here]
Inferential statistics is that branch of statistics that has to
do with the almost magical ability of statisticians to be able to generalize
from small samples to large populations with known probabilities of error.
Without inferential statistics the
biomedical researcher would be limited to making statements that would
only summarize sample data from events that have already occurred.
Of course, you want to be able to go far beyond statements about patients
that you have data on, to patients you do not yet have data on, or
patients for which you do not have complete data -- the much broader
populations of patients. This is the job of inferential
statistics.
The key concept for inferential statistics,
the crucial concept, the one that bridges the gap between the
sample and the population, is the sampling distribution. Without it
the great overwhelming majority of statistical inferential tests you
will encounter in the literature could not be done. It forms the
basis of all parametric statistical inferential tests, that is, all
inferential tests designed for normally distributed interval or ratio
variables. How can it do that? Just as you can make probability
statements about sample values when you know the shape of the
population distribution (e.g., the probability of an IQ > 130 is
approximately .023), when you know the sampling distribution of the
mean you can make probability statements about means of samples
(e.g., the probability that a sample of 25 will have a mean IQ > 106
is approximately .023).
Inferential statistics is all about being
able to make these latter kinds of statements about sample values.
More importantly it is about being able to make similar kinds of
probability statements about population values, particularly population
means.
A sampling distribution of means is exactly what the name implies. It
is just a distribution consisting of sample means obtained by taking a
very large number of successive samples from the same population and
computing each sample's mean. It is only
different from other distributions in the respect that other distributions
typically consist of the values contained in a single sample. In a
sampling distribution the values contained in the distribution are means.
Each mean is computed from a different sample from the same population.
For example, a sampling distribution of mean diastolic blood pressures
would have mean diastolic blood pressures rather than individual diastolic
blood pressures on the horizontal axis (see BP Sampling Distribution Figure).
A sampling distribution of mean IQ's would have mean IQ's rather than
individual IQ's on the horizontal axis (see IQ Sampling Distribution Figure).
[Insert BP Sampling Distribution Figure about here]
[Insert IQ Sampling Distribution Figure about here]
A natural question to ask about sampling distributions of means once
you have grasped the concept is, "What do you know about these
distributions"? You know a great deal about them and that's part of what
makes them so important. In particular, you know that:
- the mean of the sampling distribution equals the mean of the population,
- the standard deviation of the sampling distribution equals the
standard deviation of the population divided by the square root of the
sample size,
- the sampling distribution is approximately normal,
- the approximation to the normal improves as the sample size increases,
- all the above are true regardless of the shape of the population distribution.
The last statement is particularly profound. It says that no matter what
kind of distribution the variable has in the population, whether it is
normal, or flat, or peaked, or Poisson, or binomial or wavy, or whatever,
the sampling distribution will not only always be approximately normal, it
will have all the rest of the properties.
All of these have been mathematically proven to be true. These facts are
so important they have been given a special name. Collectively, they
comprise what is known to statisticians as The Central Limit Theorem.
Given sample data, it allows us to make approximate statements about
population means obtained from any population.
[Insert CLT Sampling Distribution Figure about here]
Now, let's suppose you obtain a sample of 25 IQ's from some population
and it turns out the sample IQ mean is, say 107.8. Another natural
question to ask would be, "How accurate is that sample mean of 107.8"?
In other words, how accurate is it as an estimate of the population
mean? The standard error provides a precise way to answer this sort of
question. The standard error, more precisely, the standard error of
the mean, is just another name for the standard deviation of the
sampling distribution. So it measures the variability of the sampling
distribution. But, the sampling distribution consists of sample means.
So, it measures the variability of the sample means about the population
mean since the mean of the sampling distribution is the population mean.
So, just as the standard deviation of a sample tells you about the average
distance the values in the sample are from the mean of the sample, the
standard error estimates from sample data the average distance sample means
are from the population mean. In other words, it gives you a measure
of the amount of error to be expected in a sample mean as an estimate of
the population mean. In the IQ example above the standard error was
3. Since the sampling distribution is approximately normal this tells
you then that approximately 68% of the time the sample mean would be
within about 3 IQ points of the population mean, likewise about 95% of
the time it would be within about 6 IQ points of the population mean,
and about 99% of the time it would be within about 9 IQ points of the
population mean.
[Insert Standard Error Figure about here]