7. Boxplot


A boxplot is a convenient little plot that brings up the “essentials” about your sample. In R, a boxplot can be built within seconds using the function boxplot().

boxplotNot sure what this plot means? Check out the illustration to the right. Here is a quick explanation about what is displayed in this plot. As you may see by yourself, it contains a box with whiskers and a “thick line somehow in the middle of the box”. That “thick line somehow in the middle of the box” represents the median (which we had previously calculated with median(...)). The bottom and top of the box are defined by the first and third quartiles of the data, respectively. These values/lines thus frame the middle 50% of your sample. This area is also called interquartile range. The whiskers under and over the box correspond to the minimum and maximum values in your sample; note that this is true in the case where the data does NOT contain outliers. If/when an outlier is detected, it appears as an open circle and the whiskers represent the minimum and maximum values in the “outlier-free” dataset.

 

Note: It is important to know that boxplot() defines outliers by searching for values which are NOT located under the third quartile plus 1.5 times the interquartile range AND over the first quartile minus 1.5 times the interquartile range.

 

Here is an example of the code in R:

my.dataset <- c(1,2,3,4,5,6,7,8,9,10)
boxplot(my.dataset)

Skjermbilde 2016-05-20 22.44.48