Biostatistics for the Clinician
Biostatistics for the Clinician
![]()
University of Texas-Houston
Health Science CenterLesson 1.5
Exploratory Data Analysis
Lesson 1: Summary Measures of Data 1.5 - 1
Biostatistics for the Clinician
1.5 Exploratory Data Analysis (EDA)
1.5.1 Why Important?
Exploratory data analysis (EDA) provides a simple way to obtain a big picture look at the data, and a quick way to check data for mistakes to prevent contamination of subsequent analyses. Exploratory data analysis can be thought of as preliminary to more in depth statistical data analysis.1.5.2 Box Plots
A primary tool in exploratory data analysis is the box plot (see the figure below).
Exploratory Data Analysis ![]()
A small rectangular box is drawn with a line representing the median, while the top and bottom of the box represent the 75th and 25th percentiles (3rd and 1st quartiles), respectively. If the median is not in the middle of the box the distribution is skewed. If the median is closer to the bottom, the distribution is positively skewed. If the median is closer to the top, the distribution is negatively skewed. Extreme values and outliers are often represented with asterisks and circles (again see the figure).
What does a box plot tell you? You can, for example, quickly determine the central tendency, the variability, the quartiles, and the skewness for your data. You can quickly visually compare data from multiple groups.
Lesson 1: Summary Measures of Data 1.5 - 6