2. How to import data into R


There are several ways and commands to import data from a file into R. Providing that your data are in an Excel file (.xls, .xlsx), in a text file (.txt), in a comma-separated values file (.csv), you will have to choose the appropriate function.

Before going further, check the function file.choose().

file.choose()

file.choose()

This simple line of command opens an explorer window which allows you to find and select the file in which your data are saved. When selected, the R console displays the path of the file between quotation marks in the format/syntax that R recognizes. If you copy/paste this into functions and operations in R, you save yourself a lot of work and trouble!!!

 

Now, let’s talk about importing data from .xlsx files

There is a possibility to retrieve directly the data elements enclosed in a .xlsx file (file produced by MS Excel 2010 and over) via the xlsx package. Providing that the xlsx package is already installed in R, you must activate it using the command library(xlsx). If the xlsx package has never been installed, refer to the section Installing Packages in the page First steps with R. In brief, the following code installs and activates the whole  xlsx package (NB: this may take a couple of minutes):

install.packages('xlsx')
library(xlsx)

Once done, we can proceed with importing data. Let’s take the following example from the file called myexcelfile.xlsx:

Skjermbilde 2016-09-05 15.26.11

Now, type in the following command:

my.xlsx.data <- read.xlsx("d:\\myexcelfile.xlsx", sheetName = "Sheet1")

This retrieves the content of the worksheet named "Sheet1" in the file myexcelfile.xlsx located at the root of the D: drive, for example. Once retrieved, the data will be stored in the object my.xlsx.data.  Let’s look at it:

file.choose()

Skjermbilde 2016-09-05 15.48.46

There are several other functions that can read data from MS Excel files. Among them are read.xls(), read.excel() and read_excel(). Check this useful article posted at codeRclub for more info on these functions and for more info on how to import Excel data into R.

 
 
Working with .txt files in which your data/entries are separated with tabs (tab-delimitated files)

Data may be nicely arranged in simple text files (.txt) by the means of tabs. Importing them into R may be done simply by using the command read.delim(). Let’s look at the following example in the file called mydata.txt:

Skjermbilde 2016-09-05 15.54.03

Here is the code to import the data in the object called my.imported.data:

my.imported.data <- read.delim("d:\\mydata.txt", header=TRUE)

And the resulting object looks like this:
Skjermbilde 2016-09-05 15.56.36

Alternatively, the function read.table() may be used when using \t as separators:

my.imported.data <- read.table("d:\\mydata.txt", header=TRUE, sep="\t")

 
 

What about data stored in .csv files?

Now, if your data are in a .csv file, here is the way to go. First, have a look at the original file mydata.csv where data are separated by semi-colons:

Skjermbilde 2016-09-05 16.05.21

Here is how to import it with read.table():

my.imported.data <- read.table("d:\\mydata.csv", header=TRUE, sep=";")

This code will fetch the file called mydata.csv at the root of the D: drive and will read it. It will take into account the fact that all data elements are found separated by a semi-colon (sep=";"). Eventually, everything will be stored in the object my.imported.data, which looks like this now:

Skjermbilde 2016-09-05 16.12.53
 

There are two functions called read.csv() and read.csv2() which do exactly the same job. In fact, they are built from read.table() but admits a few differences in terms of default settings: they are preset for .csv files. Note that read.csv2 is the variant used in countries that use a comma as decimal point and a semicolon as field separator. As this is the case with the machine that I’m using for making this tutorial, here is the example using read.csv2():

my.imported.data <- read.csv2("d:\\mydata.csv", header=TRUE)

And the resulting object is:
Skjermbilde 2016-09-05 16.20.05

Note that we would have had the same result if we had used:

my.imported.data <- read.csv("d:\\mydata.csv", header=TRUE, sep=";")