In this post, I want to discuss a concept that is often mistaken by some of my collegues. With all crops, we are used to repeating experiments across years to obtain multi-year data; the structure of the resulting dataset is always the same and it is exemplified in the box below, that refers to a multi-year genotype experiment with winter wheat.

rm(list = ls())
library(tidyverse)
library(nlme)
library(emmeans)
filePath <- "https://www.casaonofri.it/_datasets/WinterWheat.csv"
dataset <- read.csv(filePath)
dataset <- dataset %>%
  mutate(across(c(1:3, 5), .fns = factor))
head(dataset)
##   Plot Block Genotype Yield Year
## 1    2     1 COLOSSEO  6.73 1996
## 2  110     2 COLOSSEO  6.96 1996
## 3  181     3 COLOSSEO  5.35 1996
## 4    2     1 COLOSSEO  6.26 1997
## 5  110     2 COLOSSEO  7.01 1997
## 6  181     3 COLOSSEO  6.11 1997

We can see that we have a column for the blocks, a column for the experimental factor (the genotype, in this instance), a column for the year and a column for the response variable (the yield, in this instance).

If we intend to take the genotype, year and block effects as fixed, we can fit the correct model with the lm() function and the resulting ANOVA table looks like the following one.

mod1 <- lm(Yield ~ Year/Block + Genotype*Year, data = dataset)
anova(mod1)
## Analysis of Variance Table
## 
## Response: Yield
##               Df  Sum Sq Mean Sq  F value    Pr(>F)    
## Year           6 159.279 26.5466 178.3996 < 2.2e-16 ***
## Genotype       7  11.544  1.6491  11.0824 2.978e-10 ***
## Year:Block    14   3.922  0.2801   1.8826   0.03738 *  
## Year:Genotype 42  27.713  0.6598   4.4342 6.779e-10 ***
## Residuals     98  14.583  0.1488                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As we have made different experiments in different years (and with a different randomisation in each year, hopefully), the block effect can only be considered as nested within each year (‘Year/Block’ and not ’Year*Block’). As for the rest, the ANOVA table is very close to that for any types of two-factors factorial experiments.

The things change dramatically if we have a perennial crop, or a crop with a cycle lasting for more than one year, because yield measurements are taken repeatedly in the same plots across years. For this reason, even though the dataset looks very much like the previous one, it must be analysed in a totally different manner. Based on my experience as an editor of International Journals, I can say that several authors, still today, tend to forget this, resulting in several rejections or, at least, delays in publication.

Let’s consider the dataset below, that refers to the yield of lucerne genotypes in three consecutive years, taken from the same plots in a single experiment lasting for three years.

filePath <- "https://www.casaonofri.it/_datasets/alfalfa3years.csv"
dataset <- read.csv(filePath)
dataset <- dataset %>%
  mutate(across(c(1:3), .fns = factor))
head(dataset)
##   Plot Block Genotype Year    Yield
## 1    1     1 4cascine 2006 6.631775
## 2   25     2 4cascine 2006 6.705397
## 3   60     3 4cascine 2006 6.499588
## 4   63     4 4cascine 2006 7.087686
## 5    7     1 casalina 2006 8.033558
## 6   33     2 casalina 2006 8.265165

If we analyse tha data as in the winter wheat example, we neglect that the observations are grouped within the same plots and, therefore, they are correlated. Consequently, the independence assumption is broken and it is no wonder that the manuscript must be rejected. What should we do, then?

First of all, we should build a new variable to uniquely identify each plot; it is easy, if we think that yield values taken for the same genotype in the same block must have been taken in the same plot.

# Refernce the plots
dataset$Plot <- dataset$Block:dataset$Genotype

Next, we can follow the usual rules to build the model (Piepho et al., 2004):

Consider one single year and build the treatment model
Consider one single year and build the block model
Include the year factor into the model and combine the year with all the effects in the treatment model, by crossing or nesting as appropriate.
Consider that the ‘plot’ factor in the block model references the randomisation units, i.e. those units which received the the genotypes by a randomisation process. Assign to this plot factor a random effect.
Excluding the terms for randomisation units, nest the year in all the other terms in the block model.
Combine random effects for randomisation units with the repeated factor, by using the colon operator, in order to derive the correct error terms to accommodate correlation structures.

The models at the different steps are as follows (with R notation):

treatment model: YIELD ~ GENOTYPE
block model: YIELD ~ BLOCK + BLOCK:PLOT
treatment model: YIELD ~ GENOTYPE * YEAR
block model: YIELD ~ BLOCK + (1|BLOCK:PLOT)
block model: YIELD ~ BLOCK + BLOCK:YEAR + (1|BLOCK:PLOT)
block model: YIELD ~ BLOCK + BLOCK:YEAR + (1|BLOCK:PLOT) + (1|BLOCK:PLOT:YEAR)

For analogy with the winter wheat experiment we take the block, year and genotype effects as fixed (but you can change this), so that the final model is:

YIELD ~ BLOCK + BLOCK:YEAR + GENOTYPE * YEAR + (1|BLOCK:PLOT) + (1|BLOCK:PLOT:YEAR)

where the last term does not need to be fitted in R, as it is the residual term, that is fitted by default. The resulting analysis (with ‘lme’) is:

mod2 <- lme(Yield ~ Block + Block:Year + Genotype*Year, 
           random = ~1|Plot,
           data = dataset)
anova(mod2)
##               numDF denDF  F-value p-value
## (Intercept)       1   137 3449.090  <.0001
## Block             3    57    0.272  0.8450
## Genotype         19    57    0.586  0.9009
## Year              1   137  102.171  <.0001
## Block:Year        3   137    0.008  0.9990
## Year:Genotype    19   137    0.078  1.0000
emmeans(mod2, ~Genotype)
##  Genotype          emmean    SE df lower.CL upper.CL
##  4cascine           11.59 0.883 57     9.82     13.4
##  casalina           12.46 0.883 57    10.69     14.2
##  classe             11.59 0.883 57     9.83     13.4
##  costanza            9.89 0.883 57     8.13     11.7
##  dimitra            11.75 0.883 57     9.99     13.5
##  FDL0101            12.22 0.883 57    10.45     14.0
##  garisenda          11.76 0.883 57     9.99     13.5
##  LaBellaCampagnola  12.17 0.883 57    10.40     13.9
##  LaTorre            11.36 0.883 57     9.59     13.1
##  linfa              11.23 0.883 57     9.46     13.0
##  marina             11.76 0.883 57    10.00     13.5
##  Palladiana         10.89 0.883 57     9.12     12.7
##  PicenaGR           12.12 0.883 57    10.35     13.9
##  PR56S82            11.56 0.883 57     9.79     13.3
##  PR57Q53            11.70 0.883 57     9.93     13.5
##  prosementi         11.79 0.883 57    10.02     13.6
##  RivieraVicentina    9.98 0.883 57     8.21     11.8
##  robot              12.11 0.883 57    10.34     13.9
##  Selene             12.11 0.883 57    10.34     13.9
##  Zenith             11.94 0.883 57    10.17     13.7
## 
## Results are averaged over the levels of: Block 
## Degrees-of-freedom method: containment 
## Confidence level used: 0.95

If we compare the above analyses with a ‘naive’ (wrong) analysis that neglects the repeated measures, we see big differences and, especially, we see that the SE for genotype means is much higher in the correct analysis.

mod3 <- lm(Yield ~ Year/Block + Genotype*Year, 
           data = dataset)
anova(mod3)
## Analysis of Variance Table
## 
## Response: Yield
##                Df  Sum Sq Mean Sq  F value Pr(>F)    
## Year            1  956.50  956.50 103.7377 <2e-16 ***
## Genotype       19  104.27    5.49   0.5952 0.9072    
## Year:Block      3    7.65    2.55   0.2766 0.8422    
## Year:Genotype  19   13.95    0.73   0.0796 1.0000    
## Residuals     197 1816.41    9.22                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
emmeans(mod3, ~Genotype)
##  Genotype          emmean    SE  df lower.CL upper.CL
##  4cascine           11.59 0.877 197     9.86     13.3
##  casalina           12.46 0.877 197    10.73     14.2
##  classe             11.59 0.877 197     9.87     13.3
##  costanza            9.89 0.877 197     8.17     11.6
##  dimitra            11.75 0.877 197    10.03     13.5
##  FDL0101            12.22 0.877 197    10.49     14.0
##  garisenda          11.76 0.877 197    10.03     13.5
##  LaBellaCampagnola  12.17 0.877 197    10.44     13.9
##  LaTorre            11.36 0.877 197     9.63     13.1
##  linfa              11.23 0.877 197     9.50     13.0
##  marina             11.76 0.877 197    10.04     13.5
##  Palladiana         10.89 0.877 197     9.16     12.6
##  PicenaGR           12.12 0.877 197    10.39     13.8
##  PR56S82            11.56 0.877 197     9.83     13.3
##  PR57Q53            11.70 0.877 197     9.97     13.4
##  prosementi         11.79 0.877 197    10.06     13.5
##  RivieraVicentina    9.98 0.877 197     8.25     11.7
##  robot              12.11 0.877 197    10.38     13.8
##  Selene             12.11 0.877 197    10.38     13.8
##  Zenith             11.94 0.877 197    10.21     13.7
## 
## Results are averaged over the levels of: Block 
## Confidence level used: 0.95

I just want to conclude by saying that the above mixed model analyses regards the design as a sort of ‘split-plot in time’ and it is not necessarily correct, as it assumes that the within-plot correlation is the same for all pairs of observations, regardless of their distance in time. Further analyses might be necessary to assess whether serial correlation structures are necessary. But we’ll consider this in a future post.

Thanks for reading and happy coding! And … don’t forget to check out my new book!

Prof. Andrea Onofri
Department of Agricultural, Food and Environmental Sciences
University of Perugia (Italy)
Send comments to: andrea.onofri@unipg.it

Reference

Piepho, H.-P., Büchse, A., Richter, C., 2004. A Mixed Modelling Approach for Randomized Experiments with Repeated Measures. Journal of Agronomy and Crop Science 190, 230–247.

Repeated measures with perennial crops

Reference

Tags

Recent posts

Dealing with correlation in designed field experiments: part II

Dealing with correlation in designed field experiments: part I

Testing for interactions in nonlinear regression

Fitting time-to-event models with an environmental covariate

Correcting for multiplicity in the 'emmeans' package

Archives