# Comparing two variables – Pearson’s product-moment correlation

The Pearson product-moment correlation (often called Pearson’s r, among others) is a parametric test which measures the linear relationship between two variables. In brief, Pearson’s correlation virtually draws a line through the data points trying to make the best fit line; the coefficient tells you how well the data are “dispatched” relative to that line.

This test comes with assumptions, and one must check that everything is OK before going further:

• this is a parametric test, samples/variables must be normally distributed (run the Shapiro-Wilk test),
• the variables are continuous,
• the variables work in pairs,
• outliers are not allowed,
• the variances of these variables are “relatively” similar (Run Fisher’s F-test).

Let’s see this with an example. Here, we consider the weight and height of 16 individuals. Both weight and height are continuous variables, arranged in pairs ( 1 weight entry and 1 height entry per individual).

We need to check that both variables are normally distributed:

```weight<-c(84,64,73,78,70,79,74,68,73,63,62,69,54,64,66,70)
height<-c(183,174,179,174,164,184,179,154,167,170,168,164,166,163,154,174)
par(mfrow=c(1,2))
hist(weight, col="red", prob=TRUE)
hist(height, col="green", prob=TRUE)
shapiro.test(weight)
shapiro.test(height)
```

As you may see with the histograms and using the Shapiro-Wilk test, both sets are normally distributed. Let’s draw the boxplots and check for similar variance:

```par(mfrow=c(1,2))
boxplot(weight, main="weight")
boxplot(height, main="height")
var.test(weight,height)
```

Variances are apparently not significantly different according to Fisher’s F test, and no outlier seems to show up on the boxplots. We can proceed…
We may now vizualise these 2 variables in a scatter plot where we add a line of best fit:

```plot(weight~height)
abline(lm(weight~height))
```

Now that the assumptions are checked and that we have a quick idea of the linear relationship, let’s check Pearson’s product-moment correlation. The function is `cor.test()`. Note that the function is the same as for Spearman’s rho and Kendall’s tau. The extra parameter `method=" "` defines which correlation coefficient is to be considered in the test (choose between `"pearson"`, `"spearman"` and `"kendall"`; if the parameter `method` is omitted, the default test will be Pearson’s r).

In this test, the null hypothesis H0 states that there is no relationship between the variables.

```cor.test(height, weight, method="pearson")
```

The test concludes that it is very unlikely that there exists no relationship between the variables (p-value under 0.05). The alternative hypothesis (there is a relationship…) is thus accepted.