Together with some colleagues from the plant breeding group, we have just published a new paper, where we presented a bunch of R functions to analyse the data from diallel experiments. The paper is titled ‘Linear models for diallel crosses: a review with R functions’ and it is published in the ‘Theoretical and Applied Genetics’ Journal. If you are interested, you can take a look here at this link.
Diallel experiments are based on a set of possible crosses between some homozygous (inbred) lines. For example, if we have the male lines A, B and C and the female lines A, B and C (same lines used, alternatively, as male and female), we would have the following selfed parents: AA, BB and CC and the following crosses: AB, AC, BC. In some instances, we might also have the reciprocals BA, CA and CB. Selfed parents and crosses are compared on a Randomised Complete Block Design, usually replicated across seasons and/or locations.
For these diallel experiments, six main diallel models are available in literature, to quantify genetic effects, such as general combining ability (GCA), specific combining ability (SCA), reciprocal (maternal) effects and heterosis. If you are an expert in plant breeding, you do not need any other explanation; if you are not an expert, well… you are like me: we only need to know that these effects are determined as linear combinations of means for crosses, means for selfed parents and reciprocals. However, as I recently discovered, fitting diallel models to experimental data from diallel experiments is a relevant task for plant breeders.
When I started dealing with diallel models, I was very surprised by the fact that they are often presented as separate entities, to be fitted by using specialised software; indeed, to the eyes of a biostatistician, it would appear that all diallel models are only different parameterisations of the same general linear model (Mohring et al., 2011). Therefore, it seemed to me very strange that we could not fit diallel models by simply using the
lm() function in R and related platform.
A deeper diving in this subject showed me that the main implementation problem was that certain effects, such as the GCA effect, require the definition of unconventional design matrices, which were not yet available in R. Indeed, the packages ‘asreml-R’ and ‘sommer’ permit, e.g., the overlay of design matrices (function
and() in ‘asreml’ and
overlay() in ‘sommer’), which is useful to code GCA effects, but none of the two packages played well with the
lm() function in R. Therefore, together with Niccolò and Luigi, we decided to enhance the
model.matrix() function in R, building a handful of new R functions, aimed at producing the correct design matrices for all types of diallel models. All these functions are available within the ‘lmDiallel’ package, which is available on gitHub; it can be installed by using the ‘install_github()’ function, as available in the ‘devtools’ package. Therefore, if necessary, install this package first. The code is as follows:
install.packages("devtools") # Only at first instance library(devtools) install_github("OnofriAndreaPG/lmDiallel")
The core functions for ‘lmDiallel’ are named after the corresponding genetic effects, i.e.:
GCA() (general combining ability),
tSCA() (total Specific Combining Ability),
RGCA() (reciprocal general combining ability),
RSCA() (reciprocal specific combining ability),
REC() (RECiprocal effects = RGCA + RSCA),
DD() (Dominance Deviation),
MDD() (Mean Dominance Deviation),
H.BAR() (Average Heterosis),
Hi() (Average hetorosis for one parent),
VEi() (Variety Effect),
SP() (effect of Selfed Parents) and
GCAC() (GCA for parents in their crosses). The usage of these functions is very simple. For example, let’s assume that we have the two variables ‘Par1’ and ‘Par2’ in a dataset, to represent the two parental lines (father and mother); the GCA effect is coded as:
while the SCA effect is coded as:
By using these R functions as building blocks, we can fit all diallel models inside the
lme() functions. For example, the following line of code fits a diallel model containing the GCA and SCA effects, to the data contained in the ‘df’ dataframe:
lm(yield ~ GCA(Par1, Par2) + SCA(Par1, Par2), data = df)
Similarly, the effect of reciprocals and random blocks can be introduced by the following code:
lme(yield ~ GCA(Par1, Par2) + SCA(Par1, Par2) + REC(Par1, Par2), random = ~1|Block, data = df)
The model building process outlined above is clearly rooted in the frame of general linear models, although we recognise that plant breeders usually refer to certain relevant parameterisations of diallel models by using the name of the authors. In this respect, it is very common to use the terms “HAYMAN1”, “GRIFFING1”, “GRIFFING2”, “HAYMAN2”, “GE2” and “GE3” to refer to the main six diallel models available in literature (see Hayman, 1954; Griffing, 1956; Gardner and Eberhart, 1966). Although these models can be built and fit by using the above method, we thought it might be useful to simplify the whole process. For this reason, we also built a wrapper function named
lm.diallel(), which can be used in the very same fashion as
lm(). The syntax is:
lm.diallel(formula, Block, Env, data, fct)
where ‘formula’ uses the regular R syntax to specify the response variable and the two variables for parentals (e.g., Yield ~ Par1 + Par2). The two arguments ‘Block’ and ‘Env’ are used to specify optional variables, coding for blocks and environments, respectively. The argument ‘data’ is a ‘dataframe’ where to look for explanatory variables. Finally, ‘fct’ is a string variable coding for the selected model, i.e. “HAYMAN1”, “GRIFFING1”, “GRIFFING2”, “HAYMAN2”, “GE2”, “GE3”, according to the existing literature.
We have also built the
predict() methods for ‘lm.diallel’ objects, in order to obey to some peculiar aspects of diallel models.
In our paper (‘Linear models for diallel crosses: a review with R functions’) we have reviewed diallel models and gave examples on how they can be fitted with our new package ‘lmDiallel’. We have also shown how the facilities we provide can be used to fit random effects diallel models with ‘jags’. We intend to provide a more lengthy documentation for our package in a coming series of posts; thus, if you are interested, please, stay tuned.
I believe that increasing the usability of existing packages that have gained a wide popularity may be an advantageous programming strategy, compared to the usual strategy of building brand new platforms. From the point of view of the developer, it is efficient, as it requires a minor programming effort. From the point of view of the users (professionals, technicians and students), it is handy to be put in the conditions of making statistical analyses, without the need of learning new softwares and/or languages and/or syntaxes. Due to its open-source nature, the R environment is often overwhelming for users, that are confused by the extremely wide availability of alternative methods to perform the same task. In this regard, a programming strategy aimed at supporting some existing reference platforms might help build a more comfortable environment for statistical analyses.
Thanks for reading and, please, stay tuned! If you have comments, please, drop me a line at the email address below. Best wishes,
Department of Agricultural, Food and Environmental Sciences
University of Perugia (Italy)
Borgo XX Giugno 74
I-06121 - PERUGIA email@example.com
- Covarrubias-Pazaran G (2016) Genome-assisted prediction of quantitative traits using the R package sommer. PLoS ONE 11:e0156744.
- Gardner CO, Eberhart SA (1966) Analysis and interpretation of the variety cross diallel and related populations. Biometrics 22:439–452.
- Gilmoure A, Gogel BJ, Cullis BR, Whelam SJ, Thompson R (2015) ASReml user guide release 4.1 structural specification. VSN International Ltd, Hemel Hempstead, HP1 1ES, UK
- Griffing B (1956) Concept of general and specific combining ability in relation to diallel crossing systems. Aust J Biol Sci 9:463–493
- Möhring J, Melchinger AE, Piepho HP (2011) REML-based diallel analysis. Crop Sci 51:470–478.