Subsampling is very common in field experiments in agriculture. It happens when we collect several random samples from each plot and we submit them to some sort of measurement process. Some examples? Let’s imagine that we have randomised field experiments with three replicates and, either,:

- we collect the whole grain yield in each plot, select four subsamples and measure, in each subsample, the oil content or some other relevant chemical property, or
- we collect, from each plot, four plants and measure their heights, or
- we collect a representative soil sample from each plot and perform chemical analyses in triplicate.

For all the above examples, we end up with 3 by 4 equal 12 data for each treatment level. The question is: do we have 12 replicates? This is exactly the point: **subsamples should never be mistaken for true-replicates, as the experimental treatments were not independently allocated to each one of them**. In literature, subsamples are usually known as sub-replicates or pseudo-replicates: for the above examples, we have three true-replicates and four pseudo-replicates per true-replicate. Let’s see how to handle pseudo-replicates in data analysis. But, first of all, do not forget that: **experiments with pseudo-replicates are valid only when we also have true-replicates!** If we only have pseudo-replicates… well, there is nothing we can do in data analysis that transforms our experiment into a valid one…

# A motivating example

Let’s consider a dataset from an experiment where we had 30 genotypes in three blocks and recorded the Weight of Thousand Kernels (TKW) in three subsamples per plot, which were labelled by using the ‘Sample’ variable. In the box below, we load the data and all the necessary packages.

```
rm(list=ls())
library(tidyverse)
library(nlme)
library(emmeans)
filePath <- "https://www.casaonofri.it/_datasets/TKW.csv"
TKW <- read.csv(filePath)
TKW <- TKW %>%
mutate(across(1:4, .fns = factor))
head(TKW)
## Plot Block Genotype Sample TKW
## 1 1 1 Meridiano 1 28.6
## 2 2 1 Solex 1 33.3
## 3 3 1 Liberdur 1 22.3
## 4 4 1 Virgilio 1 28.1
## 5 5 1 PR22D40 1 26.7
## 6 6 1 Ciccio 1 34.2
```

# The wrong analysis

A ‘naive’ analysis would be to perform an ANOVA on all data, without making any distinction between true-replicates and sub-replicates. Let’s do this by using the code shown in the box below.

```
# Naive analysis
mod <- lm(TKW ~ Block + Genotype, data=TKW)
anova(mod)
## Analysis of Variance Table
##
## Response: TKW
## Df Sum Sq Mean Sq F value Pr(>F)
## Block 2 110.3 55.169 7.510 0.0006875 ***
## Genotype 29 7224.7 249.129 33.913 < 2.2e-16 ***
## Residuals 238 1748.4 7.346
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
muG <- emmeans(mod, ~Genotype)
muG
## Genotype emmean SE df lower.CL upper.CL
## Achille 34.7 0.903 238 32.9 36.4
## Alemanno 41.6 0.903 238 39.8 43.4
## AncoMarzio 32.7 0.903 238 31.0 34.5
## Arnacoris 24.5 0.903 238 22.7 26.3
## Casanova 38.6 0.903 238 36.9 40.4
## Chiara 30.0 0.903 238 28.2 31.8
## Ciccio 33.5 0.903 238 31.7 35.2
## Ciclope 37.4 0.903 238 35.6 39.1
## Claudio 40.7 0.903 238 38.9 42.5
## Creso 37.8 0.903 238 36.0 39.6
## Dario 30.9 0.903 238 29.1 32.7
## Duilio 32.1 0.903 238 30.3 33.9
## Dylan 36.5 0.903 238 34.8 38.3
## Imhotep 34.8 0.903 238 33.1 36.6
## Iride 30.8 0.903 238 29.0 32.6
## Isildur 26.0 0.903 238 24.2 27.8
## K26 33.5 0.903 238 31.7 35.2
## Latinur 37.5 0.903 238 35.7 39.3
## Liberdur 22.8 0.903 238 21.1 24.6
## Meridiano 30.8 0.903 238 29.0 32.6
## Neolatino 36.9 0.903 238 35.1 38.7
## Normanno 36.8 0.903 238 35.0 38.6
## PR22D40 26.2 0.903 238 24.4 28.0
## PR22D89 44.0 0.903 238 42.3 45.8
## Principe 37.2 0.903 238 35.4 39.0
## Saragolla 35.9 0.903 238 34.1 37.6
## Sfinge 40.5 0.903 238 38.8 42.3
## Simeto 41.1 0.903 238 39.3 42.9
## Solex 34.6 0.903 238 32.9 36.4
## Virgilio 29.5 0.903 238 27.7 31.2
##
## Results are averaged over the levels of: Block
## Confidence level used: 0.95
pairwise <- as.data.frame(pairs(emmeans(mod, ~Genotype)))
sum(pairwise$p.value < 0.05)
## [1] 225
```

We see that the SE for genotype means is 0.903, the F test for the genotypes is highly significant and there are 225 significant pairwise comparisons among the 30 genotypes.

As we said, this is simple, but it is also **terribly wrong**. By putting true-replicates and pseudo-replicates on an equal footing, we have forgotten that the 270 observations are grouped by plot and that the observations in the same plot are more alike than the observations in different plots, because they share the same ‘origin’. We say that the observations in each plot are correlated and, therefore, the basic assumption of independence of residuals is broken. Our analysis is invalid and our manuscript can be very likely rejected.

But, why are the editors so critical when we mistake pseudo-replicates for true-replicates? We’ll see this in a few minutes.

# The correct way to go

A fully correct model for our dataset is:

\[ y_{ijks} = \mu + \alpha_i + \beta_j + \gamma_{k} + \varepsilon_{s}\]

where \(y\) is the thousand kernel weight for the i^{th} genotype, j^{th} block, k^{th} plot and s^{th} subsample, \(\alpha\) is the effect of the i^{th} genotype, \(\beta\) is the effect of the j^{th} block, \(\gamma\) is the effect of the the k^{th} plot and \(\varepsilon\) is the effect of the s^{th} subsample. The presence of the \(\gamma\) element accounts for the effects of plots as a grouping factor and restores the independence of model residuals.

Obviously, the difference between plots (for a given genotype and block) must be regarded as a random effect, as well as the difference between subsamples, within each plot. Indeed. we have two random effects and, therefore, this is a mixed model. These two random effects are assumed to be normal, independent from each other, with mean equal to 0 and variances respectively equal to \(\sigma^2_p\) and \(\sigma^2_e\). (BTW: I am regarding the block effect as fixed! You may not agree, but this does not change what I am going to say later…).

We can fit this mixed model by using the `lme()`

function in the `nlme`

package, including the ‘Plot’ (i.e. the grouping factor) as a random effect. Obviously, we need to have a variable in the dataset that uniquely identifies each plot.

```
# Mixed model fit
mod.mix <- lme(TKW ~ Block + Genotype,
random = ~1|Plot, data=TKW)
anova(mod.mix)
## numDF denDF F-value p-value
## (Intercept) 1 180 11563.536 <.0001
## Block 2 58 2.005 0.1439
## Genotype 29 58 9.052 <.0001
emmeans(mod.mix, ~Genotype)
## Genotype emmean SE df lower.CL upper.CL
## Achille 34.7 1.75 58 31.2 38.2
## Alemanno 41.6 1.75 58 38.1 45.1
## AncoMarzio 32.7 1.75 58 29.2 36.2
## Arnacoris 24.5 1.75 58 21.0 28.0
## Casanova 38.6 1.75 58 35.1 42.1
## Chiara 30.0 1.75 58 26.5 33.5
## Ciccio 33.5 1.75 58 30.0 37.0
## Ciclope 37.4 1.75 58 33.9 40.9
## Claudio 40.7 1.75 58 37.2 44.2
## Creso 37.8 1.75 58 34.3 41.3
## Dario 30.9 1.75 58 27.4 34.4
## Duilio 32.1 1.75 58 28.6 35.6
## Dylan 36.5 1.75 58 33.0 40.0
## Imhotep 34.8 1.75 58 31.3 38.3
## Iride 30.8 1.75 58 27.3 34.3
## Isildur 26.0 1.75 58 22.5 29.5
## K26 33.5 1.75 58 30.0 37.0
## Latinur 37.5 1.75 58 34.0 41.0
## Liberdur 22.8 1.75 58 19.3 26.3
## Meridiano 30.8 1.75 58 27.3 34.3
## Neolatino 36.9 1.75 58 33.4 40.4
## Normanno 36.8 1.75 58 33.3 40.3
## PR22D40 26.2 1.75 58 22.7 29.7
## PR22D89 44.0 1.75 58 40.5 47.5
## Principe 37.2 1.75 58 33.7 40.7
## Saragolla 35.9 1.75 58 32.4 39.4
## Sfinge 40.5 1.75 58 37.0 44.0
## Simeto 41.1 1.75 58 37.6 44.6
## Solex 34.6 1.75 58 31.1 38.1
## Virgilio 29.5 1.75 58 26.0 33.0
##
## Results are averaged over the levels of: Block
## Degrees-of-freedom method: containment
## Confidence level used: 0.95
pairwise <- as.data.frame(pairs(emmeans(mod.mix, ~Genotype)))
sum(pairwise$p.value < 0.05)
## [1] 91
```

We see several differences with respect to the previous fit:

- in the ‘naive’ model, the error term for the genotype effect (MS
_{e}) is 7.346 with 238 DF and it represents the whole subsample-to-subsample variability. In the mixed model, the error term for the genotype effect is not shown, but it has only 58 degrees of freedom. - The SE for genotype means is 1.75 and it is much higher than that from the ‘naive’ fit.
- The number of significant pairwise comparisons between genotypes has dropped to 91.

Let’s concentrate on the correct error term for the genotype effect; we know that the *error term must be obtained by a comparison of plots treated alike* (see Fisher, 1937. The design of experiments). From this, it is immediately clear that the MS_{e} from te ‘naive’ analysis compares the subsamples treated alike (and not the plots). Now, if we take the ‘naive’ analysis and include the ‘plot’ among the experimental factors, we get:

```
mod2 <- lm(TKW ~ Block + Genotype + Plot, data=TKW)
anova(mod2)
## Analysis of Variance Table
##
## Response: TKW
## Df Sum Sq Mean Sq F value Pr(>F)
## Block 2 110.3 55.169 65.269 < 2.2e-16 ***
## Genotype 29 7224.7 249.129 294.736 < 2.2e-16 ***
## Plot 58 1596.2 27.521 32.559 < 2.2e-16 ***
## Residuals 180 152.1 0.845
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

We see that the ‘plot’ effect, once the genotype and block effects have been removed, takes 58 DFs and leave us with 180 DFs in the residuals. Indeed, we have decomposed the MS_{e} from the ‘naive’ analysis in two parts, one measuring the plot-to-plot variability and the other one measuring the subsample-to-subsample variability, within each plot. The MS for this latter element is equal to \(\sigma_e\) from mixed model analysis, while the MS for the former element is the correct error term to test for the genotype effect, because it *compares the plots treated alike*. It is equal to \(3 \times \sigma^2_p + \sigma^2_e\) and it should be used to calculate the SEs for genotype means, as \(\sqrt{27.521/9} = 1.748689\) and not as \(\sqrt{7.346/9} = 0.90345\).

We can now understand why the editors reject our manuscripts if we do not analyse the data properly: we may strongly overestimate the precision of our experiment and, consequently, commit a lot of false positive errors!

# A simpler alternative

In presence of subsampling, we strongly recommend the previous method of data analysis. But, a simpler alternative exists, which is feasible whenever the number of subsamples is the same for all plots: we can proceed in two-steps. In the first step, we calculate the means of subsamples for each plot and, in the second step, we submit the plot means to ANOVA, by considering the genotypes and the blocks as fixed factors.

```
# First step
TKWm <- aggregate(TKW ~ Block + Genotype, data = TKW, mean)
#Second step
mod2step <- lm(TKW ~ Genotype + Block, data = TKWm)
anova(mod2step)
## Analysis of Variance Table
##
## Response: TKW
## Df Sum Sq Mean Sq F value Pr(>F)
## Genotype 29 2408.24 83.043 9.0522 9.943e-13 ***
## Block 2 36.78 18.390 2.0046 0.1439
## Residuals 58 532.08 9.174
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
emmeans(mod2step, ~Genotype)
## Genotype emmean SE df lower.CL upper.CL
## Achille 34.7 1.75 58 31.2 38.2
## Alemanno 41.6 1.75 58 38.1 45.1
## AncoMarzio 32.7 1.75 58 29.2 36.2
## Arnacoris 24.5 1.75 58 21.0 28.0
## Casanova 38.6 1.75 58 35.1 42.1
## Chiara 30.0 1.75 58 26.5 33.5
## Ciccio 33.5 1.75 58 30.0 37.0
## Ciclope 37.4 1.75 58 33.9 40.9
## Claudio 40.7 1.75 58 37.2 44.2
## Creso 37.8 1.75 58 34.3 41.3
## Dario 30.9 1.75 58 27.4 34.4
## Duilio 32.1 1.75 58 28.6 35.6
## Dylan 36.5 1.75 58 33.0 40.0
## Imhotep 34.8 1.75 58 31.3 38.3
## Iride 30.8 1.75 58 27.3 34.3
## Isildur 26.0 1.75 58 22.5 29.5
## K26 33.5 1.75 58 30.0 37.0
## Latinur 37.5 1.75 58 34.0 41.0
## Liberdur 22.8 1.75 58 19.3 26.3
## Meridiano 30.8 1.75 58 27.3 34.3
## Neolatino 36.9 1.75 58 33.4 40.4
## Normanno 36.8 1.75 58 33.3 40.3
## PR22D40 26.2 1.75 58 22.7 29.7
## PR22D89 44.0 1.75 58 40.5 47.5
## Principe 37.2 1.75 58 33.7 40.7
## Saragolla 35.9 1.75 58 32.4 39.4
## Sfinge 40.5 1.75 58 37.0 44.0
## Simeto 41.1 1.75 58 37.6 44.6
## Solex 34.6 1.75 58 31.1 38.1
## Virgilio 29.5 1.75 58 26.0 33.0
##
## Results are averaged over the levels of: Block
## Confidence level used: 0.95
pairwise <- as.data.frame(pairs(emmeans(mod2step, ~Genotype)))
sum(pairwise$p.value < 0.05)
## [1] 91
```

We see that the results are totally the same as with a mixed model fit, although all Mean Squares in ANOVA are fractions of those obtained by mixed model analysis.

**Please, remember that this simple solution is only feasible when we have the same number of subsamples per each plot**.

Thanks for reading and happy coding!

Andrea Onofri

Department of Agricultural, Food and Environmental Sciences

University of Perugia (Italy)

andrea.onofri@unipg.it

# References

- Fisher, R.A., 1937. The design of experiments, 2nd ed. Oliver and Boyd, Edinburgh, UK.
- Hurlbert, S.H., 1984. Pseudoreplication and the design of ecological experiments. Ecological Monographs 54, 187–211.