1. One-sample analysis? For what…?


One does not need to have more than one single group to perform statistical tests! In fact there is (short) list of tests used to verify hypotheses on one group. Here are a few examples.

As you will soon find out (but you might have already realized that…), many statistical tests, such as Student’s t-test or ANOVA, assume the data originate from a normal distribution. Very often, you will be forced to check for normality of distribution in all the tested groups prior to performing the test of choice. The Shapiro-Wilk test checks this for you! It is performed on one group at a time and is thus one of the first, but most important, one-sample statistical tests that you will learn to use.

There are a few other single-sample tests which are used to compare the known mean of a population to the mean of a specific subpopulation. This means that you may check whether a bunch of individuals exhibit the same “properties” as the population they originate from just by comparing the mean and standard deviation of that subgroup to the mean (and, if available, the standard deviation) of the population. Not sure to understand the point? Let’s take an example:

During the course of the year 2016, a university teacher got the feeling that his students were particularly clever and efficient. Wondering whether this was “just a feeling” or a fact, he organized an IQ test and obtained the score of 40 of these students. Knowing only the average IQ of the students at that precise university (which is 120), he performed a one-sample test to verify whether the average IQ score of his group of “clever students” differed from 120.

 

Let’s have a look at this single sample of data with stat.desc():

Now that we know how to get descriptive statistics, let’s have a quick look at the sample using the function stat.desc() in the package pastecs (note that you might need to activate pastecs by selecting Packages>Load package…>pastecs and OK). Retrieve the IQ scores contained in the IQ.txt file by clicking here, copy the values and import them into R under the variable y, then run stat.desc():

options(scipen="100")
options(digits="2")
y<-c(123, 135, 142, 127, 117, 134, 123, 133, 111, 130, 113, 118, 105, 126, 125, 122, 147, 106, 117, 129, 144, 112, 126, 140, 129, 115, 118, 129, 125, 137, 135, 138, 134, 124, 128, 135, 124, 137, 121, 120)
stat.desc(y)

Skjermbilde 2016-06-02 10.43.47

As you may see, the average IQ score in the group of 40 students is 126.35, which is above the university average of 120. Looking at the maximum value and the top of the boxplot below, some individuals are high on the IQ scale, but there are clearly individuals below the reference IQ of 120:

boxplot(y, ylab="IQ score")

Skjermbilde 2016-06-02 22.25.42

 

Note also that the distribution of individual IQ scores seems “rather normal” looking at the histogram:

hist(y)

Skjermbilde 2016-06-02 22.32.10

 

The question is: is this enough to declare that these 40 students are better than the rest? To answer that we definitely need a test. Click here to find out how to solve this problem.