Yield stability is a fundamental aspect of the selection of crop genotypes. Its definition is rather complex (see, for example, Annichiarico, 2002), but, in simple terms, it represents the ability of a crop to maintain its potential yield level across environments, which helps farmers preserve their income. Several statistical indicators of stability exist (see, e.g., Mohammadi, 2008) and, in this post, I would like to concentrate on the so-called stability variance, that is, for a specific genotype, the amount of yield variability across different environments, after correcting for the additive effects of each environment, which are common to all genotypes under investigation (Shukla, 1972).

In order to understand the real meaning of this index and how to calculate it, let’s consider the data of a multi-environment study, wherein eight winter wheat genotypes were compared an a randomised complete block design with three replicates, which was repeated in seven years in Central Italy. The resulting dataset is available as ‘WinterWheat’ in ‘statforbiology’, which is the accompanying package for this blog. Information on how to download and install the ‘statforbiology’ package are given in this page.

The first code snippet loads the ‘statforbiology’ package and other necessary packages. Afterwards, it loads the ‘WinterWheat’ dataset by using the getAgroData() function and transforms the ‘Year’, ‘Genotype’ and ‘Block’ variables into factors.

library(nlme)
library(statforbiology)
library(dplyr)
WinterWheat <- getAgroData("WinterWheat")
WinterWheat[,c(1:3, 5)] <- lapply(WinterWheat[,c(1:3,5)], factor)
head(WinterWheat)
##   Plot Block Genotype Yield Year
## 1    2     1 COLOSSEO  6.73 1996
## 2  110     2 COLOSSEO  6.96 1996
## 3  181     3 COLOSSEO  5.35 1996
## 4    2     1 COLOSSEO  6.26 1997
## 5  110     2 COLOSSEO  7.01 1997
## 6  181     3 COLOSSEO  6.11 1997

In this dataset, each year is regarded as an ‘environment’ in itself and the environment effect should be considered as random: we select the years, but we cannot control the weather conditions. Accordingly, a good candidate model for data analyses is:

\[y_{ijk} = \mu + \gamma_{kj} + g_i + e_j + ge_{ij} + \varepsilon_{ijk}\]

where \(y\) is yield (or other trait) for the \(k\)^th block, \(i\)^th genotype and \(j\)^th environment, \(\mu\) is the intercept, \(\gamma\) is the effect of the \(k\)^th block in the \(j\)^th environment, \(g\) is the effect of the \(i\)^th genotype, \(e\) is the effect of the \(j\)^th environment, \(ge\) is the interaction effect of the \(i\)^th genotype and \(j\)^th environment, while \(\varepsilon\) is the residual random term, which is assumed as normal with a mean of 0 and a variance of \(\sigma^2\). The block effect, the environment effect and the ‘genotype x environment’ interaction are also random with means equal to 0 and variances respectively equal to \(\sigma^2_{\gamma}\), \(\sigma^2_{e}\) and \(\sigma^2_{ge}\).

Fitting in one step

Due to the presence of multiple random effects and a fixed effect (the genotype), this is a Linear Mixed Model (LMM) and we can fit it by using the lme() function in the ‘nlme’ package (Pinheiro and Bates, 2000). Crossed random effects requires the use of ‘pdMat’ constructs as shown by Pinheiro and Bates (2000; see pages 162 and 163) and by Galecki and Burzykowski (2013). In detail, random effects are specified as a list, including the year effect (Year = pdIdent(~ 1)), the blocks within years effect (Year = pdIdent(~ Block - 1)) and the genotypes within years effect (Year = pdIdent(~ Genotype - 1)):

model.mix <- lme(Yield ~ Genotype, 
                 random = list(Year = pdIdent(~ 1),
                               Year = pdIdent(~ Block - 1),
                               Year = pdIdent(~ Genotype - 1)),
                 data=WinterWheat)
VarCorr(model.mix)
##                  Variance              StdDev   
## Year =           pdIdent(1)                     
## (Intercept)      1.07314330            1.0359263
## Year =           pdIdent(Block - 1)             
## Block1           0.01641784            0.1281321
## Block2           0.01641784            0.1281321
## Block3           0.01641784            0.1281321
## Year =           pdIdent(Genotype - 1)          
## GenotypeCOLOSSEO 0.17034106            0.4127240
## GenotypeCRESO    0.17034106            0.4127240
## GenotypeDUILIO   0.17034106            0.4127240
## GenotypeGRAZIA   0.17034106            0.4127240
## GenotypeIRIDE    0.17034106            0.4127240
## GenotypeSANCARLO 0.17034106            0.4127240
## GenotypeSIMETO   0.17034106            0.4127240
## GenotypeSOLEX    0.17034106            0.4127240
## Residual         0.14880375            0.3857509

This model postulates that, apart from genotypic effects, the variability of yield across environments relates to environmental effects (that are common to all genotypes) and to the genotype by environment interaction effects, that are specific to each genotype, although, in this model, they are constrained to assume the same variance component for all genotypes, that is \(\sigma^2_{ge} = 0.170\) (block effects are uninteresting, because they occur within each environment). We can allow this variance component to assume a different value for each i^th genotype ( \(\sigma^2_{ge.i}\)), by replacing pdIdent() with pdDiag(), as shown in the box below. The same model can be coded in a simpler manner by removing the pdIdent() constructs, which makes it amenable for the next step (see below).

model.mix2 <- lme(Yield ~ Genotype, 
                 random = list(Year = pdIdent(~ 1),
                               Year = pdIdent(~ Block - 1),
                               Year = pdDiag(~ Genotype - 1)),
                 data=WinterWheat)
VarCorr(model.mix2)
##                  Variance             StdDev   
## Year =           pdIdent(1)                    
## (Intercept)      0.86590899           0.9305423
## Year =           pdIdent(Block - 1)            
## Block1           0.01641753           0.1281309
## Block2           0.01641753           0.1281309
## Block3           0.01641753           0.1281309
## Year =           pdDiag(Genotype - 1)          
## GenotypeCOLOSSEO 0.10427421           0.3229152
## GenotypeCRESO    0.09589078           0.3096624
## GenotypeDUILIO   0.47613558           0.6900258
## GenotypeGRAZIA   0.15286442           0.3909788
## GenotypeIRIDE    0.11860669           0.3443932
## GenotypeSANCARLO 0.02574748           0.1604602
## GenotypeSIMETO   0.42997759           0.6557268
## GenotypeSOLEX    0.06713453           0.2591033
## Residual         0.14880422           0.3857515
#
# Same model with simpler coding
model.mix3 <- lme(Yield ~ Genotype, 
                 random = list(Year = ~ 1,
                               Year = pdDiag(~ Genotype - 1),
                               Block = ~ 1),
                 data=WinterWheat)
logLik(model.mix3); logLik(model.mix2)
## 'log Lik.' -134.8419 (df=19)
## 'log Lik.' -134.8419 (df=19)

Let’s focus on \(\sigma^2_{ge.i}\); for each genotype, these values represent the amount of variability across environments that is additional to the ovarall environmental variability \(\sigma^2_{e}\). In other words, these values correspond to the Shukla’s stability variances and allow the classification of some genotypes as stable (e.g. Sancarlo, Solex and Creso) and others as unstable (e.g. Duilio and Simeto).

Extracting variance components with their standard errors from ‘lme’ objects can be rather complex. I therefore suggest using the ‘lmeInfo’ package, which allows retrival of the variance component estimates (via the extract_varcomp() function) and their variance-covariance matrix (via the varcomp_vcov() function). The standard errors can be retrieved as the square root of the diagonal elements of these latter variance-covariance matrix (see code box below).

It is possible to verify that the use of the extract_varcomp() and varcomp_vcov() functions returns an error when the ‘pdIdent()’ construct is used to specify random effects. Consequently, in the box below, I have used ‘model.mix3’ instead of ‘model.mix2’.

library(lmeInfo)
varcomp <- extract_varcomp(model.mix3, vector = TRUE)
vcovMat <- varcomp_vcov(model.mix3)
SEs <- sqrt(diag(vcovMat))
data.frame(varcomp, SEs)
##                                          varcomp        SEs
## Tau.Year.Year.var((Intercept))        0.86590902 0.51466815
## Tau.Year.1.Year.var(GenotypeCOLOSSEO) 0.10427419 0.10220109
## Tau.Year.1.Year.var(GenotypeCRESO)    0.09589090 0.09749719
## Tau.Year.1.Year.var(GenotypeDUILIO)   0.47613525 0.31536662
## Tau.Year.1.Year.var(GenotypeGRAZIA)   0.15286411 0.12970671
## Tau.Year.1.Year.var(GenotypeIRIDE)    0.11860659 0.11027783
## Tau.Year.1.Year.var(GenotypeSANCARLO) 0.02574755 0.05952790
## Tau.Year.1.Year.var(GenotypeSIMETO)   0.42997712 0.28877333
## Tau.Year.1.Year.var(GenotypeSOLEX)    0.06713394 0.08153260
## Tau.Block.Block.var((Intercept))      0.01641752 0.01349968
## sigma_sq                              0.14880428 0.02125775

Fitting in two steps

Sometimes, agronomists and plant breeders may prefer to adopt a two-steps fitting procedure: in the first step, the means across blocks are calculated for all genotypes in all environments. In the second step, these means are used to fit a stability variance model. This two-step process is less demanding in terms of computer resources and it is correct whenever the experiments are equireplicated, with no missing ‘genotype x environment’ combinations and when the variances within each experiment are homogeneous.

I would also like to give an example of this two-step analysis method. In the first step, we can use the group_by() and summarise() functions in the ‘dplyr’ package:

# First step
WinterWheatM <- WinterWheat |>
  group_by(Genotype, Year) |>
  summarise(Yield = mean(Yield))
head(WinterWheatM)
## # A tibble: 6 × 3
## # Groups:   Genotype [1]
##   Genotype Year  Yield
##   <fct>    <fct> <dbl>
## 1 COLOSSEO 1996   6.35
## 2 COLOSSEO 1997   6.46
## 3 COLOSSEO 1998   6.70
## 4 COLOSSEO 1999   6.98
## 5 COLOSSEO 2000   6.44
## 6 COLOSSEO 2001   7.07

Once we have retrieved the means for genotypes in all years, we can fit a stability variance model, by allowing different variances of model residuals within each year (heteroscedastic model). The ‘weights’ argument can be used, together with the varIdent() variance function, to allow for a different variance for each genotype. The code to retrieve the within-year variances is not obvious, unfortunately, but you can use the following snippet as a guidance.

# Second step
model.mixM <- lme(Yield ~ Genotype, 
                  random = ~ 1|Year, data = WinterWheatM,
                  weights = varIdent(form=~1|Genotype))

# Retrieving the variance components
vF <- model.mixM$modelStruct$varStruct
sdRatios <- c(1, coef(vF, unconstrained = F))
names(sdRatios)[1] <- "COLOSSEO"
scalePar <- model.mixM$sigma
sigma2i <- (scalePar * sdRatios)^2
sigma2i
##   COLOSSEO      CRESO     DUILIO     GRAZIA      IRIDE   SANCARLO     SIMETO 
## 0.15387857 0.14548985 0.52571780 0.20246664 0.16820264 0.07535112 0.47958756 
##      SOLEX 
## 0.11673900
# > sigma2i
#   COLOSSEO      CRESO     DUILIO     GRAZIA      IRIDE   SANCARLO     SIMETO      SOLEX 
# 0.15387857 0.14548985 0.52571780 0.20246664 0.16820264 0.07535112 0.47958756 0.11673900

In the previous code, the ‘sigma2i’ components do not represent the stability variances, as we still need to remove the within-experiment error (\(\sigma^2\)), which can only be estimated from the whole dataset, at the first step. Indeed, if we take the estimate of 0.1488 (see one of the previous code boxes), divide by three (the number of blocks) and subtract from ‘sigma2i’, we get:

sigma2i - model.mix2$sigma^2/3
##   COLOSSEO      CRESO     DUILIO     GRAZIA      IRIDE   SANCARLO     SIMETO 
## 0.10427717 0.09588844 0.47611639 0.15286523 0.11860124 0.02574972 0.42998615 
##      SOLEX 
## 0.06713759

which are the stability variances, very similar to those obtained with the one-step analyses of the whole dataset.

Retrieving the standard errors is slightly more complex. I suggest that you use the ‘lmeInfo’ package and the two functions extract_varcomp() and varcomp_vcov(), as shown earlier in this post.

varcomp <- extract_varcomp(model.mixM, vector = TRUE)
vcovMat <- varcomp_vcov(model.mixM)
SEs <- sqrt(diag(vcovMat))
data.frame(varcomp, SEs)
##                                  varcomp       SEs
## Tau.Year.Year.var((Intercept)) 0.8714082 0.5146640
## var_params1                    0.9723604 0.4617926
## var_params2                    1.8483630 0.8277835
## var_params3                    1.1470640 0.5315447
## var_params4                    1.0455079 0.4905674
## var_params5                    0.6997708 0.3674039
## var_params6                    1.7654071 0.7921500
## var_params7                    0.8710015 0.4235189
## sigma_sq                       0.1538786 0.1019570

Now, we need to focus on the output above, where the variance parameters are expressed as standard deviations relative to the first genotype in alphabetical order. For this first genotype, the standard deviation is equal to the square root of the variance component, which is \(0.1538786\) and is labeled as ‘sigma_sq’ in the output above. To obtain the standard deviations of each genotype across environments, we need to multiply each variance parameter by the square root of \(0.1538786\). Then, we square this product to obtain the ‘sigma2i’ values. The delta method can be used to approximate the standard errors, as implemented in the gnlht() function from the ‘statforbiology’ package.

funcList <- list(~(var_params1*sqrt(sigma_sq))^2,
                 ~(var_params2*sqrt(sigma_sq))^2,
                 ~(var_params3*sqrt(sigma_sq))^2,
                 ~(var_params4*sqrt(sigma_sq))^2,
                 ~(var_params5*sqrt(sigma_sq))^2,
                 ~(var_params6*sqrt(sigma_sq))^2,
                 ~(var_params7*sqrt(sigma_sq))^2)
statforbiology::gnlht(varcomp, func = funcList, vcov. = vcovMat)
##                               Form   Estimate         SE  Z-value    p-value
## 1 (var_params1 * sqrt(sigma_sq))^2 0.14548985 0.09723811 1.496223 0.13459568
## 2 (var_params2 * sqrt(sigma_sq))^2 0.52571780 0.31527624 1.667483 0.09541838
## 3 (var_params3 * sqrt(sigma_sq))^2 0.20246664 0.12951377 1.563283 0.11798613
## 4 (var_params4 * sqrt(sigma_sq))^2 0.16820264 0.11004704 1.528461 0.12639804
## 5 (var_params5 * sqrt(sigma_sq))^2 0.07535112 0.05910593 1.274849 0.20236280
## 6 (var_params6 * sqrt(sigma_sq))^2 0.47958756 0.28869169 1.661245 0.09666430
## 7 (var_params7 * sqrt(sigma_sq))^2 0.11673900 0.08122625 1.437208 0.15065898

If you do not want to bother with the above expressions to get standard errors, both the models in this post can be fitted by using the mmes() function in the ‘sommer’ package. The code is shown below:

library(sommer)
## Loading required package: Matrix
## Loading required package: crayon
## Loading required package: enhancer
# One step fitting
mix <- mmes(Yield ~ Genotype,
            random = ~ Year + vsm(dsm(Genotype), ism(Genotype:Year)) + Block:Year,
            data = WinterWheat, dateWarning = FALSE, verbose = FALSE)
summary(mix)$varcomp
##                                    VarComp  VarCompSE    Zratio Constraint
## Year:mu:mu                      0.86628918 0.51497510 1.6821962   Positive
## Genotype:Year:COLOSSEO:COLOSSEO 0.10451279 0.10241046 1.0205284   Positive
## Genotype:Year:CRESO:CRESO       0.09643250 0.09798682 0.9841374   Positive
## Genotype:Year:DUILIO:DUILIO     0.47576524 0.31509345 1.5099179   Positive
## Genotype:Year:GRAZIA:GRAZIA     0.15217832 0.12910744 1.1786952   Positive
## Genotype:Year:IRIDE:IRIDE       0.11851832 0.11023154 1.0751762   Positive
## Genotype:Year:SANCARLO:SANCARLO 0.02606658 0.05980810 0.4358369   Positive
## Genotype:Year:SIMETO:SIMETO     0.42950035 0.28838436 1.4893330   Positive
## Genotype:Year:SOLEX:SOLEX       0.06672651 0.08118598 0.8218969   Positive
## Block:Year:mu:mu                0.01641743 0.01349963 1.2161396   Positive
## units:mu:mu                     0.14880399 0.02125770 7.0000041   Positive
#
# Two steps fitting
mix <- mmes(Yield ~ Genotype, 
            random = ~ Year,
            rcov = ~ vsm(dsm(Genotype), ism(units)),
            data = WinterWheatM, dateWarning = FALSE, verbose = FALSE)
summary(mix)$varcomp
##                            VarComp  VarCompSE   Zratio Constraint
## Year:mu:mu              0.87108440 0.51427505 1.693810   Positive
## units:COLOSSEO:COLOSSEO 0.15379669 0.10185279 1.509990   Positive
## units:CRESO:CRESO       0.14574601 0.09746468 1.495373   Positive
## units:DUILIO:DUILIO     0.52602532 0.31566060 1.666427   Positive
## units:GRAZIA:GRAZIA     0.20203161 0.12912133 1.564665   Positive
## units:IRIDE:IRIDE       0.16832583 0.11019926 1.527468   Positive
## units:SANCARLO:SANCARLO 0.07563096 0.05936087 1.274088   Positive
## units:SIMETO:SIMETO     0.47968170 0.28885650 1.660623   Positive
## units:SOLEX:SOLEX       0.11633450 0.08085461 1.438811   Positive

Thanks for reading—and don’t forget to check out my new book below!

Andrea Onofri
Department of Agricultural, Food and Environmental Sciences
University of Perugia (Italy)
Send comments to: andrea.onofri@unipg.it

References

Annichiarico, P., 2002. Genotype x Environment Interactions - Challenges and Opportunities for Plant Breeding and Cultivar Recommendations. Plant Production and protection paper, Food & Agriculture Organization of the United Nations (FAO), Roma.
Gałecki, A., Burzykowski, T., 2013. Linear mixed-effects models using R: a step-by-step approach. Springer, Berlin.
Mohammadi, R., Amri, A., 2008. Comparison of parametric and non-parametric methods for selecting stable and adapted durum wheat genotypes in varibale environments. Euphytica 159, 419–432.
Piepho, H.-P., 1999. Stability Analysis Using the SAS System. Agronomy Journal 91, 154–160.
Pinheiro, J.C., Bates, D.M., 2000. Mixed-Effects Models in S and S-Plus, Springer-Verlag Inc. ed. Springer-Verlag Inc., New York.
Shukla, G.K., 1972. Some statistical aspects of partitioning genotype-environmental components of variability. Heredity 29, 237–245.

Originally written on 6 June 2019; revised on December 2025

Using `lme()` to fit the Stability Variance mixed model to genotype experiments

Fitting in one step

Fitting in two steps

References

Tags

Recent posts

Testing for interactions in nonlinear regression

Fitting time-to-event models with an environmental covariate

Correcting for multiplicity in the 'emmeans' package

How do we combine errors, in biology? The delta method

Using `lme()` to fit the Environmental Variance mixed model to genotype experiments

Archives