Yield stability is a key aspect in the selection of crop genotypes. Its definition is not entirely straightforward (see, for example, Annichiarico, 2002), but, in simple terms, it refers to the ability of a crop to maintain its yield potential across different environments, helping farmers safeguard their income. Several statistical indicators of stability have been proposed (see, e.g., Mohammadi, 2008). In this post, however, I will focus on the so-called environmental variance, which represents the portion of phenotypic variance attributable to environmental (non-genetic) factors and is measured as the overall variance across environments for each genotype.
This index is usually considered a static measure of stability, meaning that low values are typical of genotypes whose yield levels are only slightly affected by changes in the environmental conditions.
One possible way to calculate the environmental variance has been proposed by Piepho (1999), making use of mixed models with ‘unusual’ variance-covariance matrices. In developing this post, I will closely follow the coding proposed by the aforementioned author.
As an example, let’s consider the data of a multi-environment study, wherein eight winter wheat genotypes were compared an a randomised complete block design with three replicates, which was repeated in seven years in Central Italy. The resulting dataset is available as ‘WinterWheat’ in ‘statforbiology’, which is the accompanying package for this blog. Information on how to download and install the ‘statforbiology’ package are given in this page.
The first code snippet loads the ‘statforbiology’ package and other necessary packages. Afterwards, it loads the ‘WinterWheat’ dataset by using the getAgroData() function and transforms the ‘Year’, ‘Genotype’ and ‘Block’ variables into factors. Please, note that this dataset shows the data for eight genotypes, but the model that we want to fit requires that the number of environments is higher than the number of genotypes. Therefore, we have to make a subset, removing some (e.g., three) genotypes.
library(nlme)
library(statforbiology)
library(dplyr)
WinterWheat <- getAgroData("WinterWheat")
WinterWheat <- WinterWheat[WinterWheat$Genotype != "SIMETO" &
WinterWheat$Genotype != "SOLEX" &
WinterWheat$Genotype != "SANCARLO",]
WinterWheat[,c(1:3, 5)] <- lapply(WinterWheat[,c(1:3,5)], factor)
head(WinterWheat, 10)
## Plot Block Genotype Yield Year
## 1 2 1 COLOSSEO 6.73 1996
## 2 110 2 COLOSSEO 6.96 1996
## 3 181 3 COLOSSEO 5.35 1996
## 4 2 1 COLOSSEO 6.26 1997
## 5 110 2 COLOSSEO 7.01 1997
## 6 181 3 COLOSSEO 6.11 1997
## 7 17 1 COLOSSEO 6.75 1998
## 8 110 2 COLOSSEO 6.82 1998
## 9 256 3 COLOSSEO 6.52 1998
## 10 18 1 COLOSSEO 7.18 1999
Dealing with the above dataset, a good candidate model for data analyses is the following:
\[y_{ijk} = \mu + g_i + r_{jk} + h_{ij} + \varepsilon_{ijk}\]
where \(y_{ijk}\) is yield (or other trait) for the \(k\)th block, \(i\)th genotype and \(j\)th environment, \(\mu\) is the intercept, \(g_i\) is the effect for the ith genotype, \(r_{jk}\) is the effect for the \(k\)th block in the \(j\)th environment, \(h_{ij}\) is a random deviation from the expected yield for the \(i\)^th genotype in the \(j\)^th environment and \(\varepsilon_{ijk}\) is the residual variability of yield between plots, within each environment and block.
We usually assume that \(r_{jk}\) and \(\varepsilon_{ijk}\) are independent and normally distributed, with variances equal to \(\sigma^2_r\) and \(\sigma^2_e\), respectively. Such assumptions may be questioned, but we will not do it now, for the sake of simplicity.
Let’s concentrate on \(h_{ij}\), which is the peculiar element of this model. Normally, a model for multi-environment experiments, in addition to the genotype effect, would contain the environmental effect (that is common to all genotypes) and the genotype by environment interaction (that is specific to each genotype by environment combination). In this environmental variance model, these two effects (environment plus interaction) are joined together, to obtain the random element \(h\), which represents the whole variability of genotype yields across environments. We assume this element as normally distributed with variance-covariance matrix equal to \(\Omega\).
It is reasonable to expect that the genotypes will have different variances across environments (heteroscedasticity), which can be interpreted as static stability measures (‘environmental variances’; hence the name ‘environmental variance model’). Furthermore, it is reasonable that, if an environment is good for one genotype, it may also be good for other genotypes, so that yields in each environment are correlated, although the correlations can be different for each couple of genotypes. To reflect our expectations, the \(\Omega\) matrix needs to be totally unstructured, with the only constraint that it is positive definite.
Fitting in one step
Piepho (1999) has shown how the above model can be coded by using SAS and I translated his code into R. Random effects should be specified as a list, by using the ‘Year’ as the nesting factor (Galecki and Burzykowski, 2013). In order to specify a totally unstructured variance-covariance matrix for the genotypes within years, I used the ‘pdMat’ construct pdSymm(). This model is rather complex and may take several minutes to converge, when the number of genotypes is very high.
EnvVarMod <- lme(Yield ~ Genotype,
random = list(Year = pdSymm(~Genotype - 1),
Block = ~ 1),
control = list(opt = "optim", maxIter = 100),
data=WinterWheat)
The environmental variances, together with the correlations, can be easily obtained by using the ‘VarCorr()’ function (see below). In practice, we have five variance components (one per genotype) showing that Colosseo was the most stable genotype, while Duilio was the least stable one. The yield levels of the genotypes in different environments are more or less correlated, with correlation coefficients ranging from 0.977 (Duilio vs Iride) to 0.840 (Creso vs Duilio).
# Environmental Variances and correlations
VarCorr(EnvVarMod)
## Variance StdDev Corr
## Year = pdSymm(Genotype - 1)
## GenotypeCOLOSSEO 0.48863386 0.6990235 GCOLOS GCRESO GDUILI GGRAZI
## GenotypeCRESO 0.70293687 0.8384133 0.969
## GenotypeDUILIO 2.36529182 1.5379505 0.840 0.840
## GenotypeGRAZIA 1.17844507 1.0855621 0.843 0.767 0.945
## GenotypeIRIDE 1.23023386 1.1091591 0.859 0.893 0.977 0.896
## Block = pdLogChol(1)
## (Intercept) 0.03231338 0.1797592
## Residual 0.12952019 0.3598891
Unfortunately, the output of ‘lme()’ is rather difficult top read and standard errors are missing for variance parameters. The ‘lmeInfo’ package can be used to retrieve variance components estimates (via the extract_varcomp() function) and their variance-covariance matrix (via the varcomp_vcov() function). The standard errors can be retrieved as the square root of the diagonal elements of these latter variance-covariance matrix (see code box below).
# Retrieving the variance components from the lme object
library(lmeInfo)
varcomp <- extract_varcomp(EnvVarMod, vector = TRUE)
vcovMat <- varcomp_vcov(EnvVarMod)
SEs <- sqrt(diag(vcovMat))
data.frame(varcomp, SEs)
## varcomp SEs
## Tau.Year.Year.var(GenotypeCOLOSSEO) 0.48863386 0.31341161
## Tau.Year.Year.cov(GenotypeCRESO,GenotypeCOLOSSEO) 0.56818152 0.35265479
## Tau.Year.Year.var(GenotypeCRESO) 0.70293687 0.43709597
## Tau.Year.Year.cov(GenotypeDUILIO,GenotypeCOLOSSEO) 0.90313965 0.59835845
## Tau.Year.Year.cov(GenotypeDUILIO,GenotypeCRESO) 1.08269708 0.71029178
## Tau.Year.Year.var(GenotypeDUILIO) 2.36529182 1.39678126
## Tau.Year.Year.cov(GenotypeGRAZIA,GenotypeCOLOSSEO) 0.64007465 0.42673312
## Tau.Year.Year.cov(GenotypeGRAZIA,GenotypeCRESO) 0.69825554 0.48918816
## Tau.Year.Year.cov(GenotypeGRAZIA,GenotypeDUILIO) 1.57733378 0.95776281
## Tau.Year.Year.var(GenotypeGRAZIA) 1.17844507 0.71158820
## Tau.Year.Year.cov(GenotypeIRIDE,GenotypeCOLOSSEO) 0.66615784 0.43881261
## Tau.Year.Year.cov(GenotypeIRIDE,GenotypeCRESO) 0.83080714 0.52923899
## Tau.Year.Year.cov(GenotypeIRIDE,GenotypeDUILIO) 1.66657461 0.99335782
## Tau.Year.Year.cov(GenotypeIRIDE,GenotypeGRAZIA) 1.07884979 0.67948762
## Tau.Year.Year.var(GenotypeIRIDE) 1.23023386 0.74148575
## Tau.Block.Block.var((Intercept)) 0.03231338 0.02254209
## sigma_sq 0.12952019 0.02447701
If we want to inspect the full 5 \(\times\) 5 variance–covariance matrix for the random effects, we cannot rely on the usual getVarCov() function, as it is not implemented for models with multiple levels of nesting. Instead, we need to reconstruct this matrix from the ‘varcomp’ object shown in the previous box, as illustrated in the box below.
# Rebuilding the variance-covariance matrix
vals <- varcomp[1:15]
res <- matrix(0, 5, 5)
cont <- 1
for(i in 1:5){
for(j in 1:i){
res[i, j] <- vals[cont]
cont <- cont + 1
}
}
res[upper.tri(res)] <- res[lower.tri(res)]
# Variance-covariance matrix
res
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.4886339 0.5681815 0.9031396 0.6661578 0.8308071
## [2,] 0.5681815 0.7029369 0.6400747 1.0826971 1.5773338
## [3,] 0.9031396 1.0826971 2.3652918 0.6982555 1.6665746
## [4,] 0.6400747 0.6982555 1.5773338 1.1784451 1.0788498
## [5,] 0.6661578 0.8308071 1.6665746 1.0788498 1.2302339
r
# Correlation matrix
cov2cor(res)
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1.0000000 0.9694762 0.8400802 0.8778710 1.0715553
## [2,] 0.9694762 1.0000000 0.4963981 1.1895813 1.6961787
## [3,] 0.8400802 0.8396658 1.0000000 0.4182321 0.9769865
## [4,] 0.8434982 0.7671876 0.9447710 1.0000000 0.8960092
## [5,] 0.8591946 0.8934047 0.9769865 0.8960092 1.0000000
Fitting in two steps
In his original paper, Piepho (1999) also gave SAS code to analyse the means of the ‘genotype x environment’ combinations. Indeed, agronomists and plant breeders often adopt a two-steps fitting procedure: in the first step, the means across blocks are calculated for all genotypes in all environments. In the second step, these means are used to fit an environmental variance model. This two-step process is less demanding in terms of computer resources and it is correct whenever the experiments are equireplicated, with no missing ‘genotype x environment’ combinations. Furthermore, we need to be able to assume similar variances within all experiments.
I would also like to give an example of this two-step analysis method. In the first step, we can use the group_by() and summarise() functions in ‘dplyr’:
# First step: calcolate the means of replicates within environments
WinterWheatM <- WinterWheat |>
group_by(Genotype, Year) |>
summarise(Yield = mean(Yield))
## `summarise()` has grouped output by 'Genotype'. You can override using the
## `.groups` argument.
r
head(WinterWheatM)
## # A tibble: 6 × 3
## # Groups: Genotype [1]
## Genotype Year Yield
## <fct> <fct> <dbl>
## 1 COLOSSEO 1996 6.35
## 2 COLOSSEO 1997 6.46
## 3 COLOSSEO 1998 6.70
## 4 COLOSSEO 1999 6.98
## 5 COLOSSEO 2000 6.44
## 6 COLOSSEO 2001 7.07
Once we have retrieved the means for genotypes in all years, we can fit a simplified model:
\[y_{ij} = \mu + g_i + a_{ij}\]
where \(y_{ij}\) is the mean yield for the \(i\)-th genotype in the \(j\)-th environment and \(a_{ij}\) is the residual term, which includes the genotype x environment random interaction, the block x environment random interaction and the residual error term.
In this model we have only one random effect (\(a_{ij}\)) and, therefore, this is a fixed linear model. However, we need to model the variance-covariance matrix of residuals (\(R\)), by adopting a totally unstructured form. Please, note that, when working with raw data, we have modelled \(\Omega\), i.e. the variance-covariance matrix for the random effects. I have used the ‘gls()’ function, together with the ‘weights’ and ‘correlation’ arguments. See the code below.
#Second step
envVarModM <- gls(Yield ~ Genotype,
data = WinterWheatM,
weights = varIdent(form=~1|Genotype),
correlation = corSymm(form=~1|Year))
summary(envVarModM)
## Generalized least squares fit by REML
## Model: Yield ~ Genotype
## Data: WinterWheatM
## AIC BIC logLik
## 84.36176 112.3857 -22.18088
##
## Correlation Structure: General
## Formula: ~1 | Year
## Parameter estimate(s):
## Correlation:
## 1 2 3 4
## 2 0.947
## 3 0.809 0.815
## 4 0.816 0.736 0.921
## 5 0.817 0.866 0.952 0.869
## Variance function:
## Structure: Different standard deviations per stratum
## Formula: ~1 | Genotype
## Parameter estimates:
## COLOSSEO CRESO DUILIO GRAZIA IRIDE
## 1.000000 1.189654 2.143715 1.528849 1.560622
##
## Coefficients:
## Value Std.Error t-value p-value
## (Intercept) 6.413333 0.2742310 23.386613 0.0000
## GenotypeCRESO -0.439524 0.1107465 -3.968739 0.0004
## GenotypeDUILIO 0.178571 0.3999795 0.446451 0.6585
## GenotypeGRAZIA -0.330952 0.2518269 -1.314206 0.1987
## GenotypeIRIDE 0.281905 0.2580727 1.092346 0.2834
##
## Correlation:
## (Intr) GCRESO GDUILI GGRAZI
## GenotypeCRESO 0.312
## GenotypeDUILIO 0.503 0.371
## GenotypeGRAZIA 0.269 -0.095 0.774
## GenotypeIRIDE 0.292 0.545 0.857 0.638
##
## Standardized residuals:
## Min Q1 Med Q3 Max
## -2.0949714 -0.5742607 0.1661351 0.8068883 1.3395017
##
## Residual standard error: 0.7255469
## Degrees of freedom: 35 total; 30 residual
The variance-covariance matrix for residuals can be obtained using the usual getVarCov() function in the ‘nlme’ package.
R <- getVarCov(envVarModM)
R
## Marginal variance covariance matrix
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.52642 0.59280 0.91285 0.65647 0.67116
## [2,] 0.59280 0.74503 1.09440 0.70421 0.84652
## [3,] 0.91285 1.09440 2.41920 1.58850 1.67700
## [4,] 0.65647 0.70421 1.58850 1.23040 1.09160
## [5,] 0.67116 0.84652 1.67700 1.09160 1.28210
## Standard Deviations: 0.72555 0.86315 1.5554 1.1093 1.1323
As the design is perfectly balanced, the diagonal elements of the above matrix correspond to the variances of genotypes across environments:
tapply(WinterWheatM$Yield, WinterWheatM$Genotype, var)
## COLOSSEO CRESO DUILIO GRAZIA IRIDE
## 0.5264185 0.7450275 2.4191624 1.2304397 1.2821143
Strictly speaking, the diagonal elements of the above matrix do not correspond to the environmental variances as calculated with one-step analyses, as they also contain the within-experiment and within block random variability, which would need to be separately estimated during the first step.
Thanks for reading—and don’t forget to check out my new book below!
Andrea Onofri
Department of Agricultural, Food and Environmental Sciences
University of Perugia (Italy)
Send comments to: andrea.onofri@unipg.it
Andrea Onofri
Department of Agricultural, Food and Environmental Sciences
University of Perugia (Italy)
References
- Gałecki, A., Burzykowski, T., 2013. Linear mixed-effects models using R: a step-by-step approach. Springer, Berlin.
- Muhammad Yaseen, Kent M. Eskridge and Ghulam Murtaza (2018). stability: Stability Analysis of Genotype by Environment Interaction (GEI). R package version 0.5.0. https://CRAN.R-project.org/package=stability
- Piepho, H.-P., 1999. Stability Analysis Using the SAS System. Agronomy Journal 91, 154–160.
- Pinheiro, J.C., Bates, D.M., 2000. Mixed-Effects Models in S and S-Plus, Springer-Verlag Inc. ed. Springer-Verlag Inc., New York.
This post was originally written on 20 August 2019; revised on December 2025
