# Linear regression

Linear regression helps you simplifying a dataset by modelling and drawing a straight line representing this dataset. It is often used to find a relationship between a continuous response variable and a continuous independent/predictor variable. Examples are numerous: finding the relationship between bodyweight and height is one of them, for instance.

Let’s use the following dataset as an example:

```bodyweight <- c(70, 75, 72, 58, 80, 80, 48, 56, 103, 51)
size <- c(177, 178, 167, 153, 174, 177, 152, 134, 191, 136)
dataset.df <- data.frame(bodyweight, size) ```

Everything starts with a plot. A scatter plot of the dataset is usually a good beginning.

```plot(bodyweight~size, ylab="bodyweight (kg)", xlab="size (cm)", col="blue", pch = 10)
```

Then we try to fit a linear model with the function `lm()` which we have already encountered when performing analysis of variance (ANOVA).

```lm(bodyweight~size)
```

Note that you find in this output everything you need to draw the expected line: the intercept is clearly indicated (-56.2716) and is followed by the value of the slope (0.7661). Let’s add these values to the function `abline()` with the syntax `abline(intercept, slope)` which will create the regression line on the existing plot:

```abline(-56.2716, 0.7661)
```

Note also that we can directly use the result of `lm()` into the  function `abline()` to obtain the exact same graph:

```abline(lm(bodyweight~size))
```

At all time, it is of course possible to store the result of `lm()` into a vector for later use. Here we’ll simply call it `lin.mod`. Using the function `summary()`, more information about the model may be obtained:

```lin.mod <-lm(bodyweight~size)
summary(lin.mod)
```

This output provides you with several interesting values such as quartiles, median, minimum and maximum at the top, and the (adjusted) R-squared (R2) at the bottom, which describes how well the model matches the data (NB: be careful when interpreting R-squared, see this blogpost for some info).

Finally, it is good practice to check the model by plotting the line in the following manner to visualize in a few plots how good your model fits with the actual data:

```plot(lin.mod)
```