T-Test mtcars
Note: This article shows an R example on how to conduct a t-test on the dataset mtcars. For general information about t-tests please refer to the following article: T-Test.
In short: In this article a t-test is conducted on the dataset mtcars which examines if the mean fuel efficiency (mpg) differs significantly between automatic and manual cars. The test examines whether the vehicle transmission has an impact on fuel consumption.
Contents
The Dataset[edit]
The dataset mtcars is included in the R base package and contains information on fuel consumption and 10 aspects of automobile design and performance for 32 automobiles.
The variables that will be included in the t-test are:
- mpg: Miles/(US) gallon —> fuel efficiency measurement for cars (continuous datatype)
- am: Transmission (0 = automatic, 1 = manual)
Inspecting the Dataset[edit]
The command head() displays the first entries of the data frame which provides an initial overview of the data. Set.the command str() display the internal structure of the data.
# Inspect the dataset head(mtcars) str(mtcars)
Output[edit]
> head(mtcars)
mpg cyl disp hp drat wt qsec vs am gear carb
Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
> str(mtcars)
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...
We can see that the transmission types are represented by 0 (automatic) and 1 (manual) in the am column. Furthermore, the am and mpg variables are stored as numeric data.
Splitting the Values[edit]
# Split mpg values by transmission type mpg_auto <- mtcars$mpg[mtcars$am == 0] mpg_manual <- mtcars$mpg[mtcars$am == 1]
To perform the t-test we split the mpg values by transmission types to have two groups to compare to each other. Therefore we create two new variables mpg_auto and mpg_manual. In R, a variable is created by storing the data on the right side of the arrow under the name on the left side of the arrow. Let’s have a closer look at the variable mpg_auto. The code on the right-hand side translates as: Take the data from the mtcars data set from the mpg column, but only if the value in the am column is 0. This means that only the values for automatic cars are taken and used in the new variable mpg_autos. We proceed similarly with the variable mpg_maual. Only now the criterion must be met that the value in the am column is 1.
Checking Assumptions[edit]
Before conducting the t-test we can examine the dataset and check for the different assumptions of a t-test to make sure that it produces valid results. The assumptions of a t-test are:
- response variable should be continuous or ordinal
- normal distribution for each factor or a sufficiently large sample size (≥ 30)
- data must be drawn randomly from a representative sample
- student’s t-test requires equal variance in the two groups; welch t-test can deal with unequal variance
#check for assumptions shapiro.test(mpg_auto) shapiro.test(mpg_manual) #equal variance between groups var.test(mpg_auto,mpg_manual) boxplot(mpg ~ am, data = mtcars)
We have already seen that the mpg values are continuous data and it can be assumed that the data were drawn randomly from a representative sample. We can test the other assumptions by using the shapiro test and the f-test. We use shapiro.test() to test whether the two groups are normally distributed. Var.test() is used to check whether the variance of the two groups is the same.
Output[edit]
> shapiro.test(mpg_auto)
Shapiro-Wilk normality test
data: mpg_auto
W = 0.97677, p-value = 0.8987
> shapiro.test(mpg_manual)
Shapiro-Wilk normality test
data: mpg_manual
W = 0.9458, p-value = 0.5363
> var.test(mpg_auto,mpg_manual)
F test to compare two variances
data: mpg_auto and mpg_manual
F = 0.38656, num df = 18, denom df = 12, p-value = 0.06691
alternative hypothesis: true ratio of variances is not equal to 1
95 percent confidence interval:
0.1243721 1.0703429
sample estimates:
ratio of variances
0.3865615
The Shapiro-Wilk test delivers a non-significant p-value which means we cannot reject the 0-hypothesis that the data is normally distributed. We thus assume that mpg is normally distributed. The p-value of the f-test is above 0,05, so the variance doesn’t differ significantly.
T-Test[edit]
# Perform an independent two-sample t-test
t_test_result <- t.test(mpg_manual, mpg_auto,
var.equal = TRUE) # Welch's t-test (default)
# Print the test result
print(t_test_result)
To perform the t-test the command t.test() is used. Since mpg_manual and mpg_auto are being compared with each other, they are placed in the brackets after t.test. This is a two-tailed test, which is why the order in which the two variables are named does not matter. Alternative = ‘two.sided’ specifies that it is a two-tailed test, but this is also used by default by R in the command, which is why we don’t need necessarily to write it in this case. The variance of the two groups is specified as equal in the code through var.equal = TRUE. Therefore, R uses the students t-test which assume that the variances are equal. The Welch t-test is used by R by default and could also be used. You can also use this test when the variance differs. The results of the test on the right side of the arrow are stored again in the variable on the left side. The results are then displayed in R using the print() command.
Output[edit]
Two Sample t-test data: mpg_manual and mpg_auto t = 4.1061, df = 30, p-value = 0.000285 alternative hypothesis: true difference in means is not equal to 0 95 percent confidence interval: 3.64151 10.84837 sample estimates: mean of x mean of y 24.39231 17.14737
The hypothesis which is tested is:
H0: There is no difference in mean fuel efficiency (mpg) between manual and automatic cars (μ1 = μ2)
H1: There is a difference in mean fuel efficiency (mpg) between manual and automatic cars (μ1 ≠ μ2)
The p-value of 0.001374 is less than 0.05. Thus, the null hypothesis can be rejected, which means there is a significant difference between the two groups. It can be concluded that the vehicle transmission has an impact on fuel consumption.
Visualise the difference[edit]
# Basic boxplot to compare tooth length distributions
### Basic boxplot to compare mpg distributions
boxplot(mpg ~ am, data = mtcars,
names = c("Automatic", "Manual"),
col = c("skyblue", "lightgreen"),
main = "MPG by Transmission Type",
xlab = "Transmission Type",
ylab = "Miles Per Gallon")
### Add mean points
points(x = c(1, 2),
y = tapply(mtcars$mpg, mtcars$am, mean),
pch = 19, col = "red")
Output[edit]
The mean values are marked as red dots in the box plot. As can be seen, they differ clearly from one another, which is also confirmed by the T-test.
The author of this entry is Hauke Haese. Last edited: 06.02.2026