Two-way ANOVA CO2

Note: This article shows an R example on how to conduct a two-way analysis of variance (ANOVA) on the dataset CO2. For general information about ANOVA please refer to the following article: ANOVA.

In short: In this example a two-way ANOVA is used to analyse the uptake of CO2 of a plant based on its origin and its treatment as well as the interaction of those factors. This examines if chilling the plants overnight and the origin of the respective plant have an influence on its ability to take up CO2. The effect is then visualized in an interaction plot.

To run the script you can open RStudio and paste this code into a script or console.

If needed, you can install the ggplot2 library by running

install.packages("ggplot2")

.

The Dataset[edit]

The dataset CO2 in included in the R base package and displays the uptake of six plants from Quebec and six plants from Mississippi at several levels of ambient CO2 concentration. Half the plants of each type were chilled overnight before the experiment was conducted.

The variables we will look at in our analysis are:

  • Uptake: a numeric vector containing carbon dioxide uptake rates of the examined plants
  • Type: a factor containing information on the origin of the respective plant with the factor levels “Quebec” and “Mississippi”.
  • Treatment: a factor containing information on the treatment over night before the experiment was conducted with the factor levels “chilled” and “nonchilled”.

Examining the Dataset[edit]

Before conducting the analysis, we can examine the dataset and check for the different assumptions of a two-way ANOVA to make sure that the ANOVA produces valid results. The assumptions of the ANOVA are:

  • Normality of the residuals (This can roughly be approximated by checking the normality of the dependent variable uptake)
  • Equal variance for each combination of factor levels.
  • Independent measurements
  • Equal size in each treatment group (the combination of Type and Treatment)
#Load necessary library
library(ggplot2)
library(datasets)
#View the structure of the CO2 dataset

str(CO2)

#boxplot to check for equal variance (Fig.1)
boxplot(uptake ~ Type*Treatment, data = CO2)

#table to check for balanced design
table(CO2$Treatment, CO2$Type)

Output[edit]

The analysis delivers the following results:

> str(CO2)
Classes ‘nfnGroupedData’, ‘nfGroupedData’, ‘groupedData’ and 'data.frame':	84 obs. of  5 variables:
 $ Plant    : Ord.factor w/ 12 levels "Qn1"<"Qn2"<"Qn3"<..: 1 1 1 1 1 1 1 2 2 2 ...
 $ Type     : Factor w/ 2 levels "Quebec","Mississippi": 1 1 1 1 1 1 1 1 1 1 ...
 $ Treatment: Factor w/ 2 levels "nonchilled","chilled": 1 1 1 1 1 1 1 1 1 1 ...
 
 > table(CO2$Treatment, CO2$Type)
            
             Quebec Mississippi
  nonchilled     21          21
  chilled        21          21
>
Fig. 1: Boxplot showing the interaction between the factors Type and Treatment and their effect on uptake.

The independent measurements are a characteristic of the dataset which is given in this case. The normality and variance between the factor combinations can be approximated with boxplots(Fig. 1) that visualize the interaction between Type and Treatment. The variance is roughly similar, and no distribution seems to be completely skewed. The design is balanced. The assumptions are thus met sufficiently.

Two-way ANOVA[edit]

An ANOVA is performed where

uptake ~ Type * Treatment

tests the effects of Type, Treatment, and their interaction on uptake. An interaction means that the effect of one variable depends on the effect of another variable. In this case we test if Treatment depends on Type.

summary(anova_result)

will show the ANOVA table with F-values, p-values, etc., to determine the significance of each factor.

#Perform ANOVA to analyse CO2 uptake based on Type and Treatment from the dataset CO2

anova_result <- aov(uptake ~ Type * Treatment, data = CO2)

#Display the ANOVA table

summary(anova_result)

#Optional: To visualize the interaction between factors Type and Treatment (Fig.2)

ggplot(CO2, aes(x = Treatment, y = uptake, color = Type)) +
geom_point() +
geom_line(aes(group = Type)) +
labs(title = "CO2 Uptake by Treatment and Type", x = "Treatment", y = "CO2 Uptake") +
theme_minimal()

Output[edit]

The analysis delivers the following results:

Df Sum Sq Mean Sq F value   Pr(>F)
Type            1   3366    3366  52.509 2.38e-10 ***
Treatment       1    988     988  15.416 0.000182 ***
Type:Treatment  1    226     226   3.522 0.064213 .
Residuals      80   5128      64
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
  • Df: The Degrees of freedom are a measure of the Independent values that can vary in our design. For each factor they are calculated by n-1.
  • Sum Sq: The sum of squares is a measure of how much variance is explained by the independent variables. The group Sum Sq plus the residuals Sum Sq equals to the total variation. If we divide Sum Sq of one variable by the total variation, we get the explained variation by variable group. The factor Type explains 3366 out of 9708 of the variation. The interaction explains 226 out of 9708 of the variation.
  • Pr: The P - value indicates the probability that the 0 hypothesis that there is no effect on uptake is true. For The single effect of type and treatment, the P-value is low enough (>0.05) so we can reject the 0-hypothesis and assume a significant effect on the uptake of CO2 by each independent variable individually. For the interaction effect of both factors the p - value is above 0.05 so the 0 - hypothesis cannot be rejected, and we don't assume a significant interaction effect between Type and Treatment. However, the P-value is only marginally above 0.05. To evaluate the results, the other values should therefore also be taken into account.
Fig.2: Interaction plot that shows the effect of Type and Treatment on Uptake and their interaction.

The interaction effect can be visualized in an interaction plot(Fig. 2). This type of plot displays the values of the dependent variable Uptake on the y-axis. The x-axis shows the values of the predictor Treatment. The lines show the second predictor Type. Parallel lines indicate no interaction effect. Different slopes indicate an interaction effect. The lines a fairly parallel. This visualizes the result of the ANOVA, which indicates no significant interaction effect.

Further Resources[edit]

CO2 dataset: RDocomentation

ANOVA: Cookbook for R

Q-Q plot: Cookbook for R



The author of this entry is Jana Simon